Skip to content

twelve2five/edupipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Educational Video Pipeline

An AI-powered system that automatically generates animated educational videos from text prompts using Google ADK agents, Manim animations, and ElevenLabs narration.

Status: Private Repository | Proprietary Project

πŸ“‹ Table of Contents

πŸ—οΈ System Architecture

The pipeline consists of three main agent groups working in sequence:

1. Coordinator Agent (Main Orchestrator)

  • Manages the entire pipeline flow
  • Coordinates between transcript and video generation
  • Handles state management across all sub-agents
  • Will manage watermark agent in future updates

2. Transcript Generation Pipeline (7 Sequential Agents)

Topic Researcher β†’ Content Structurer β†’ Script Writer β†’ Speech Formatter 
    ↓
Audio Transcriber ← ElevenLabs TTS
    ↓
Scene Summary

3. Video Generation Pipeline (3 Parallel Agents)

Video Agent β†’ [Manim Code Gen, Render Checker, Scene Validator]
    ↓
Concatenator (combines all scenes + audio + music)
    ↓
[Future: Watermark Agent β†’ Final branded video]

πŸ“¦ Prerequisites

  • Python 3.8+ installed and in PATH
  • FFmpeg for video/audio processing
  • Manim Community Edition for animations
  • API Keys for:
    • Google ADK/Gemini
    • ElevenLabs (TTS)
    • Groq (STT)

πŸš€ Installation

Step 1: Clone the Repository

git clone <your-repo-url>
cd educational-video-pipeline

Step 2: Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# Mac/Linux
python3 -m venv venv
source venv/bin/activate

Step 3: Install Python Dependencies

pip install -r requirements.txt

Step 4: Install External Dependencies

FFmpeg

  • Windows: Download from ffmpeg.org, extract, and add to PATH
  • Mac: brew install ffmpeg
  • Linux: sudo apt install ffmpeg (Ubuntu/Debian) or sudo yum install ffmpeg (Fedora)

Manim

# Should be installed via requirements.txt, but if issues arise:
pip install manim

Step 5: Verify Installations

# Check FFmpeg
ffmpeg -version

# Check Manim
manim --version

# Check Python packages
python -c "import elevenlabs, groq, manim; print('All packages installed!')"

βš™οΈ Configuration

1. Create Environment File

Create a .env file in the project root:

# API Keys
ELEVENLABS_API_KEY=your_elevenlabs_key_here
GROQ_API_KEY=your_groq_key_here
GOOGLE_API_KEY=your_google_key_here  # If using Google services

2. Update Manim Path

Edit join.py and update the Manim executable path:

# Line ~41 in join.py
MANIM_EXECUTABLE = r"C:\path\to\your\manim.exe"  # Windows
# or
MANIM_EXECUTABLE = "/usr/local/bin/manim"  # Mac/Linux

3. Prepare Background Music

Place your background music file at:

C:\Users\leg\Documents\elearn\agent\music\background.mp3

Or update the path in the concatenate agent instruction.

🎯 Running the Agent

1. Start the Main Application

python join.py

2. Input Your Request

When prompted, paste your educational video request. Format:

Create a [duration]-minute [level] educational video about [topic].
Include [specific requirements].

Example:
Create a 3-minute beginner-level educational video about photosynthesis.
Include colorful animations and simple analogies for middle school students.

3. Wait for Processing

The pipeline will:

  1. Research the topic (30-60 seconds)
  2. Generate script and narration (1-2 minutes)
  3. Create and render animations (2-5 minutes per scene)
  4. Concatenate final video (30 seconds)

Total time: 5-15 minutes depending on video length

4. Find Your Video

The completed video will be in:

final_videos/educational_video_[timestamp].mp4

πŸ”„ How It Works

Phase 1: Content Generation

  1. Topic Researcher - Searches for comprehensive information
  2. Content Structurer - Creates timed educational outline
  3. Script Writer - Converts outline to conversational script
  4. Speech Formatter - Adds pauses and timing for TTS
  5. ElevenLabs TTS - Generates professional narration
  6. Audio Transcriber - Creates timestamped transcription
  7. Scene Summary - Breaks content into animated scenes

Phase 2: Video Generation

  1. Video Agent coordinates the creation of each scene:
    • Manim Code Gen - Generates animation code
    • Render Checker - Executes and monitors rendering
    • Scene Validator - Ensures timing matches narration

Phase 3: Final Assembly

  1. Concatenator - Combines all scenes with audio and background music
  2. [Future] Watermark Agent - Adds branding and outro

πŸ“ Project Structure

educational-video-pipeline/
β”œβ”€β”€ join.py              # Main application orchestrator
β”œβ”€β”€ prompt.py            # Agent instructions and prompts
β”œβ”€β”€ stt_tools.py         # Speech-to-text functionality
β”œβ”€β”€ tts_tool.py          # Text-to-speech functionality
β”œβ”€β”€ .env                 # API keys (create from .env.example)
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ generated_audio/     # TTS output files
β”œβ”€β”€ transcriptions/      # STT transcriptions
β”œβ”€β”€ media/              # Manim render files
β”œβ”€β”€ final_videos/       # Completed videos
└── music/
    └── background.mp3   # Background music

πŸ”§ Troubleshooting

Common Issues

1. "Manim not found"

  • Verify Manim installation: manim --version
  • Update MANIM_EXECUTABLE path in join.py
  • Try full path: C:\Python39\Scripts\manim.exe

2. "API Error" or Rate Limits

  • Check .env file has valid API keys
  • Wait 30 seconds and retry
  • The system has automatic retry logic (3 attempts)

3. "FFmpeg not found"

  • Ensure FFmpeg is in system PATH
  • Test with: ffmpeg -version
  • Restart terminal after PATH updates

4. Duration Mismatch

  • System automatically adjusts timing Β±0.5 seconds
  • For larger mismatches, scenes are regenerated
  • Maximum 30 attempts per scene

5. Import Errors

# Reinstall all dependencies
pip install --upgrade -r requirements.txt

# For specific package issues
pip uninstall [package_name]
pip install [package_name]

Debug Mode

To see detailed logs, run with:

python join.py > debug.log 2>&1

🎨 Customization

Voice Options

Edit tts_tool.py to change voices:

voice_id = "21m00Tcm4TlvDq8ikWAM"  # Rachel (default)
# Other options:
# "EXAVITQu4vr4xnSDxMaL" - Bella
# "ErXwobaYiN019PkySvjV" - Antoni

Animation Styles

Modify prompts in prompt.py to change animation preferences

Background Music

  • Volume: Adjust music_volume in concatenate agent (default: 0.15)
  • File: Update path in concatenate agent instruction

πŸ—ΊοΈ Roadmap

Upcoming Features

Phase 1: Enhanced TTS Options (Q1 2025)

  • OpenAI TTS Integration - Add support for OpenAI's text-to-speech models
  • Google Cloud TTS - Integrate Google's WaveNet voices
  • Amazon Polly - Add AWS Polly for more voice variety
  • Voice Cloning - Support for custom voice cloning with ElevenLabs
  • Multi-language Support - Enable video generation in multiple languages

Phase 2: Watermark Agent (Q2 2025)

  • Watermark Agent - New automated branding agent

    Concatenator β†’ Watermark Agent β†’ Final Output
    

    Features:

    • Auto-detect video dimensions and add appropriate watermark
    • Support for image (PNG/SVG) and video watermarks
    • Configurable position (corner/center/custom)
    • Fade in/out animations
    • Duration control (full video or last X seconds)

    Example workflow:

    # The agent will automatically:
    1. Take the concatenated video
    2. Apply watermark based on config
    3. Add outro video if specified
    4. Output final branded video
  • Outro Integration

    • Pre-made outro videos with smooth transitions
    • Dynamic text overlay (channel name, social links)
    • Subscribe button animations
    • End screen templates matching video style

Phase 3: Advanced Features (Q3-Q4 2025)

  • Quality Presets - Platform-specific optimization
    • YouTube (1080p/4K horizontal)
    • TikTok/Shorts (9:16 vertical)
    • Instagram Reels (9:16 with safe zones)
    • Twitter/X video specifications
  • Interactive Elements - Clickable areas in videos
  • Multiple Video Formats - Support for vertical videos (Shorts/Reels)
  • Batch Processing - Generate multiple videos from CSV input
  • Cloud Deployment - Deploy pipeline to cloud services
  • Web Interface - Browser-based UI for easier access

Current Development

πŸ”¨ Working on: Watermark Agent implementation for automatic branding

πŸ“ž Support

For issues or questions:

  1. Check the Troubleshooting section
  2. Review error messages in the console
  3. Ensure all prerequisites are properly installed

πŸ”’ Contributing

This is a private repository. Contributions are limited to authorized team members only.

πŸ“„ License

This is a private repository. All rights reserved. No part of this software may be reproduced, distributed, or transmitted without prior written permission.


Built with ❀️ using Google ADK, Manim, ElevenLabs, and Groq

About

A multi agent architecture that creates educational video from natural language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages