Name: Voice-Triggered Conversational Assistant with Full-Screen Audio Visualization
SKU: 433
Availability: InStock

Description

Installation & Setup

Clone or Download the Code
- Place the files in a directory of your choice (e.g., voice_triggered_app/).
Install Dependencies
- Make sure Flask is installed:
  
  pip install flask
- Verify eSpeak NG is installed on your system and is available from the command line (espeak-ng --version).
Local Language Model Configuration
- If using Ollama, ensure it’s installed and working.
- Edit generate_ai_response(prompt) in your code if you need to change the command for a different local LLM or a different model name.
Run the Application
- Navigate to the code directory and run:
  
  python app_v9.py
- By default, Flask runs on http://127.0.0.1:5000.

Usage Instructions

Open the Web Interface
- In your browser, navigate to the Flask server’s address (e.g., http://127.0.0.1:5000).
Continuous Speech Recognition
- The browser automatically starts listening for speech via the Web Speech API.
- The recognized text is visible in your dev console for debugging (Interim vs. Final transcripts).
Trigger Words
- A set of predefined words (e.g., “coeus,” “koeus,” “coy-us,” etc.) is monitored.
- Once a trigger word is detected, the user’s final transcript is sent to the Flask endpoint (/get_response).
Receiving AI Response
- The local LLM generates a response via subprocess.run().
- eSpeak NG converts the textual response into .wav audio.
Audio Playback & Visualization
- The app streams the audio file and plays it automatically in the browser.
- A dynamic spectrum visualizer (bars in a canvas) animates in sync with the audio, creating a full-screen effect.
Deleting Audio
- When the TTS completes playback, the .wav file is automatically deleted from the server to conserve space.
Concurrent Access
- If another request is still processing, new requests will receive a 503 with a friendly message to “try again.”

Exiting the Application

Stop Flask: Press Ctrl + C in the terminal window where the Flask app is running.
Disable Speech Recognition: Simply close your browser tab or stop the server.

Additional Tips

Browser Compatibility: webkitSpeechRecognition is supported in Chrome-based browsers. For others, or if you need cross-browser solutions, consider third-party libraries or alternative approaches.
Model Changes: If you switch from Ollama to another LLM, ensure the generate_ai_response function’s subprocess.run arguments match the required CLI usage.
Performance Tuning: Increase or decrease analyser.fftSize (in index.html) to change the visualizer resolution.
Security: Consider using HTTPS in production and restricting access if you plan to deploy publicly.

Enjoy experimenting with voice-triggered AI interactions and audio visualizations in your project! This application provides a foundation for building immersive, hands-free experiences powered by local language models and real-time speech technology.