An AI-powered system that automatically generates animated educational videos from text prompts using Google ADK agents, Manim animations, and ElevenLabs narration.
Status: Private Repository | Proprietary Project
- System Architecture
- Prerequisites
- Installation
- Configuration
- Running the Agent
- How It Works
- Project Structure
- Troubleshooting
- Roadmap
- Support
- Contributing
- License
The pipeline consists of three main agent groups working in sequence:
- Manages the entire pipeline flow
- Coordinates between transcript and video generation
- Handles state management across all sub-agents
- Will manage watermark agent in future updates
Topic Researcher β Content Structurer β Script Writer β Speech Formatter
β
Audio Transcriber β ElevenLabs TTS
β
Scene Summary
Video Agent β [Manim Code Gen, Render Checker, Scene Validator]
β
Concatenator (combines all scenes + audio + music)
β
[Future: Watermark Agent β Final branded video]
- Python 3.8+ installed and in PATH
- FFmpeg for video/audio processing
- Manim Community Edition for animations
- API Keys for:
- Google ADK/Gemini
- ElevenLabs (TTS)
- Groq (STT)
git clone <your-repo-url>
cd educational-video-pipeline# Windows
python -m venv venv
venv\Scripts\activate
# Mac/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txt- Windows: Download from ffmpeg.org, extract, and add to PATH
- Mac:
brew install ffmpeg - Linux:
sudo apt install ffmpeg(Ubuntu/Debian) orsudo yum install ffmpeg(Fedora)
# Should be installed via requirements.txt, but if issues arise:
pip install manim# Check FFmpeg
ffmpeg -version
# Check Manim
manim --version
# Check Python packages
python -c "import elevenlabs, groq, manim; print('All packages installed!')"Create a .env file in the project root:
# API Keys
ELEVENLABS_API_KEY=your_elevenlabs_key_here
GROQ_API_KEY=your_groq_key_here
GOOGLE_API_KEY=your_google_key_here # If using Google servicesEdit join.py and update the Manim executable path:
# Line ~41 in join.py
MANIM_EXECUTABLE = r"C:\path\to\your\manim.exe" # Windows
# or
MANIM_EXECUTABLE = "/usr/local/bin/manim" # Mac/LinuxPlace your background music file at:
C:\Users\leg\Documents\elearn\agent\music\background.mp3
Or update the path in the concatenate agent instruction.
python join.pyWhen prompted, paste your educational video request. Format:
Create a [duration]-minute [level] educational video about [topic].
Include [specific requirements].
Example:
Create a 3-minute beginner-level educational video about photosynthesis.
Include colorful animations and simple analogies for middle school students.
The pipeline will:
- Research the topic (30-60 seconds)
- Generate script and narration (1-2 minutes)
- Create and render animations (2-5 minutes per scene)
- Concatenate final video (30 seconds)
Total time: 5-15 minutes depending on video length
The completed video will be in:
final_videos/educational_video_[timestamp].mp4
- Topic Researcher - Searches for comprehensive information
- Content Structurer - Creates timed educational outline
- Script Writer - Converts outline to conversational script
- Speech Formatter - Adds pauses and timing for TTS
- ElevenLabs TTS - Generates professional narration
- Audio Transcriber - Creates timestamped transcription
- Scene Summary - Breaks content into animated scenes
- Video Agent coordinates the creation of each scene:
- Manim Code Gen - Generates animation code
- Render Checker - Executes and monitors rendering
- Scene Validator - Ensures timing matches narration
- Concatenator - Combines all scenes with audio and background music
- [Future] Watermark Agent - Adds branding and outro
educational-video-pipeline/
βββ join.py # Main application orchestrator
βββ prompt.py # Agent instructions and prompts
βββ stt_tools.py # Speech-to-text functionality
βββ tts_tool.py # Text-to-speech functionality
βββ .env # API keys (create from .env.example)
βββ requirements.txt # Python dependencies
βββ generated_audio/ # TTS output files
βββ transcriptions/ # STT transcriptions
βββ media/ # Manim render files
βββ final_videos/ # Completed videos
βββ music/
βββ background.mp3 # Background music
- Verify Manim installation:
manim --version - Update
MANIM_EXECUTABLEpath injoin.py - Try full path:
C:\Python39\Scripts\manim.exe
- Check
.envfile has valid API keys - Wait 30 seconds and retry
- The system has automatic retry logic (3 attempts)
- Ensure FFmpeg is in system PATH
- Test with:
ffmpeg -version - Restart terminal after PATH updates
- System automatically adjusts timing Β±0.5 seconds
- For larger mismatches, scenes are regenerated
- Maximum 30 attempts per scene
# Reinstall all dependencies
pip install --upgrade -r requirements.txt
# For specific package issues
pip uninstall [package_name]
pip install [package_name]To see detailed logs, run with:
python join.py > debug.log 2>&1Edit tts_tool.py to change voices:
voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel (default)
# Other options:
# "EXAVITQu4vr4xnSDxMaL" - Bella
# "ErXwobaYiN019PkySvjV" - AntoniModify prompts in prompt.py to change animation preferences
- Volume: Adjust
music_volumein concatenate agent (default: 0.15) - File: Update path in concatenate agent instruction
- OpenAI TTS Integration - Add support for OpenAI's text-to-speech models
- Google Cloud TTS - Integrate Google's WaveNet voices
- Amazon Polly - Add AWS Polly for more voice variety
- Voice Cloning - Support for custom voice cloning with ElevenLabs
- Multi-language Support - Enable video generation in multiple languages
-
Watermark Agent - New automated branding agent
Concatenator β Watermark Agent β Final OutputFeatures:
- Auto-detect video dimensions and add appropriate watermark
- Support for image (PNG/SVG) and video watermarks
- Configurable position (corner/center/custom)
- Fade in/out animations
- Duration control (full video or last X seconds)
Example workflow:
# The agent will automatically: 1. Take the concatenated video 2. Apply watermark based on config 3. Add outro video if specified 4. Output final branded video
-
Outro Integration
- Pre-made outro videos with smooth transitions
- Dynamic text overlay (channel name, social links)
- Subscribe button animations
- End screen templates matching video style
- Quality Presets - Platform-specific optimization
- YouTube (1080p/4K horizontal)
- TikTok/Shorts (9:16 vertical)
- Instagram Reels (9:16 with safe zones)
- Twitter/X video specifications
- Interactive Elements - Clickable areas in videos
- Multiple Video Formats - Support for vertical videos (Shorts/Reels)
- Batch Processing - Generate multiple videos from CSV input
- Cloud Deployment - Deploy pipeline to cloud services
- Web Interface - Browser-based UI for easier access
π¨ Working on: Watermark Agent implementation for automatic branding
For issues or questions:
- Check the Troubleshooting section
- Review error messages in the console
- Ensure all prerequisites are properly installed
This is a private repository. Contributions are limited to authorized team members only.
This is a private repository. All rights reserved. No part of this software may be reproduced, distributed, or transmitted without prior written permission.
Built with β€οΈ using Google ADK, Manim, ElevenLabs, and Groq