Your cart is currently empty!
Voice-Triggered Conversational Assistant with Full-Screen Audio Visualization
This Flask-based application listens continuously in the browser for trigger words, sends captured speech to a local language model for processing, and plays the AI-generated response with real-time audio visualization. By leveraging browser-based speech recognition, system-level text-to-speech (eSpeak NG), and a local LLM integration (e.g., Ollama), it creates an immersive, hands-free conversational experience complete with dynamic full-screen frequency bars.
Description
Installation & Setup
- Clone or Download the Code
- Place the files in a directory of your choice (e.g.,
voice_triggered_app/
).
- Place the files in a directory of your choice (e.g.,
- Install Dependencies
- Make sure Flask is installed:
- Verify eSpeak NG is installed on your system and is available from the command line (
espeak-ng --version
).
- Local Language Model Configuration
- If using Ollama, ensure it’s installed and working.
- Edit
generate_ai_response(prompt)
in your code if you need to change the command for a different local LLM or a different model name.
- Run the Application
- Navigate to the code directory and run:
- By default, Flask runs on
http://127.0.0.1:5000
.
Usage Instructions
- Open the Web Interface
- In your browser, navigate to the Flask server’s address (e.g.,
http://127.0.0.1:5000
).
- In your browser, navigate to the Flask server’s address (e.g.,
- Continuous Speech Recognition
- The browser automatically starts listening for speech via the Web Speech API.
- The recognized text is visible in your dev console for debugging (
Interim
vs.Final
transcripts).
- Trigger Words
- A set of predefined words (e.g., “coeus,” “koeus,” “coy-us,” etc.) is monitored.
- Once a trigger word is detected, the user’s final transcript is sent to the Flask endpoint (
/get_response
).
- Receiving AI Response
- The local LLM generates a response via
subprocess.run()
. - eSpeak NG converts the textual response into
.wav
audio.
- The local LLM generates a response via
- Audio Playback & Visualization
- The app streams the audio file and plays it automatically in the browser.
- A dynamic spectrum visualizer (bars in a canvas) animates in sync with the audio, creating a full-screen effect.
- Deleting Audio
- When the TTS completes playback, the
.wav
file is automatically deleted from the server to conserve space.
- When the TTS completes playback, the
- Concurrent Access
- If another request is still processing, new requests will receive a 503 with a friendly message to “try again.”
Exiting the Application
- Stop Flask: Press
Ctrl + C
in the terminal window where the Flask app is running. - Disable Speech Recognition: Simply close your browser tab or stop the server.
Additional Tips
- Browser Compatibility:
webkitSpeechRecognition
is supported in Chrome-based browsers. For others, or if you need cross-browser solutions, consider third-party libraries or alternative approaches. - Model Changes: If you switch from Ollama to another LLM, ensure the
generate_ai_response
function’ssubprocess.run
arguments match the required CLI usage. - Performance Tuning: Increase or decrease
analyser.fftSize
(inindex.html
) to change the visualizer resolution. - Security: Consider using HTTPS in production and restricting access if you plan to deploy publicly.
Enjoy experimenting with voice-triggered AI interactions and audio visualizations in your project! This application provides a foundation for building immersive, hands-free experiences powered by local language models and real-time speech technology.