Skip to content

Sayjad21/Nebula

Repository files navigation

Nebula AI Assistant

Nebula is a multimodal AI assistant that supports:

  • Text-to-speech (TTS) using ElevenLabs and Google gTTS
  • Speech-to-text (STT) using Groq
  • Image analysis using Groq Vision models
  • Webcam capture
  • Integration with LangChain and Gemini

Features

  • Text-to-Speech: Converts text to speech using ElevenLabs or Google gTTS.
  • Speech-to-Text: Transcribes audio using Groq's API.
  • Image Analysis: Captures webcam images and analyzes them with Groq Vision models.
  • Web UI: Gradio-based interface for easy interaction.

Setup

1. Clone the repository

git clone https://github.com/yourusername/Nebula.git
cd Nebula

2. Install dependencies

pip install -r requirements.txt

You also need FFmpeg installed and available at:

C:\ffmpeg-2025-08-07-git-fa458c7243-full_build\bin

Or update the paths in the code to match your FFmpeg installation.

3. Configure API Keys

DO NOT PUT SECRETS IN GIT!

Create a file named .env (or config.py if you prefer, but make sure it's in .gitignore) with the following content:

GROQ_API_KEY=your_groq_api_key
GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

Or, if using config.py:

groq_api_key = "your_groq_api_key"
gemini_api_key = "your_gemini_api_key"
elevenlabs_api_key = "your_elevenlabs_api_key"

4. Run the application

python main.py

Or run individual modules for testing:

python text_to_speech.py
python speech_to_text.py
python tools.py

Notes

  • Make sure your webcam and microphone are connected and accessible.
  • For Windows users, FFmpeg path must be set correctly in the code.
  • All secret keys must be kept out of version control.

File Structure

Nebula/
├── ai_agent.py
├── config.py         # (should NOT be committed)
├── main.py
├── requirements.txt
├── speech_to_text.py
├── test.py
├── text_to_speech.py
├── tools.py
├── .gitignore
└── README.md

License

MIT License


**Never commit your API keys or secrets to

About

Nebula is your witty, voice-powered AI sidekick, blending Groq’s whisper-large-v3 for lightning-fast speech-to-text, ElevenLabs for rich, natural TTS, and a Gradio-powered interface that pairs a live webcam feed with an interactive chat—making every conversation seamless, responsive, and just a little futuristic.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages