Skip to content

TonyAqqad/Development

Repository files navigation

Voice Agents for GoHighLevel

An open-ended, multi-model AI voice agent system for easy testing and launching voice agents into GoHighLevel (GHL).

Overview

This project provides a complete platform for creating, testing, and deploying AI-powered voice agents that integrate seamlessly with GoHighLevel's CRM and automation platform.

Key Features

  • Multi-Model AI Support: Works with Claude, GPT, Gemini, and custom models
  • Real-Time Voice Processing: Sub-800ms voice-to-voice latency
  • GHL Integration: Native integration with GoHighLevel API and Voice AI Custom Actions
  • Model Context Protocol (MCP): Industry-standard integration framework
  • Smart Cost Optimization: Intelligent routing to balance quality and cost
  • Production-Ready Security: OAuth 2.1, TLS 1.3, comprehensive audit logging
  • Open Architecture: Extensible and customizable for any use case

Quick Start

Prerequisites

  • Python 3.11+
  • PostgreSQL 15+
  • Redis 7+
  • Docker & Docker Compose (recommended)

Installation

# Clone repository
git clone https://github.com/yourusername/voice-agents-ghl.git
cd voice-agents-ghl

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Run database migrations
alembic upgrade head

# Start development server
uvicorn backend.api.main:app --reload

Docker Setup

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Documentation

Getting Started

  1. Development Plan - Complete project overview and roadmap
  2. MCP Documentation - Model Context Protocol integration guide
  3. API Documentation - Interactive API docs (when server running)

Architecture

  • Backend: FastAPI (Python 3.11+) with async/WebSocket support
  • Database: PostgreSQL with pgvector for embeddings
  • Cache: Redis for session management
  • Voice Processing: Deepgram (STT), ElevenLabs/Cartesia (TTS)
  • LLM Integration: Claude, GPT, Gemini with smart routing
  • Integration: Model Context Protocol (MCP) for external services

Key Components

voice-agents-ghl/
├── backend/                    # FastAPI backend
│   ├── api/                   # API routes
│   ├── services/              # Business logic
│   │   ├── llm/              # Multi-model LLM integration
│   │   ├── voice/            # Voice processing (STT/TTS)
│   │   ├── agent_service.py  # Voice agent logic
│   │   └── mcp_service.py    # MCP client
│   ├── models/               # Data models
│   └── database/             # Database & migrations
├── mcp-servers/              # MCP servers
│   ├── ghl-server/          # GoHighLevel integration
│   └── voice-server/        # Voice processing tools
├── docs/                     # Documentation
│   └── mcp/                 # MCP documentation
└── tests/                    # Tests

Features

Voice Agent Capabilities

  • Natural Conversations: Human-like voice interactions with sub-800ms latency
  • Context Awareness: Maintains conversation history and context
  • Multi-Model Intelligence: Routes to best LLM based on task complexity
  • Real-Time Data Access: Accesses GHL contacts, appointments, and more during calls

GoHighLevel Integration

  • Contact Management: Create, update, search contacts
  • Appointment Booking: Check availability, book appointments
  • Voice AI Custom Actions: Real-time webhook calls during live conversations
  • Bidirectional Webhooks: React to GHL events (contacts, appointments, opportunities)
  • Custom Fields: Access and update GHL custom fields

Multi-Model AI

  • Claude 3.5 Sonnet/4: Complex reasoning and code generation
  • GPT-4o/5: Structured outputs and multi-step debugging
  • Gemini 2.5 Flash: Cost-effective, fast responses
  • Local Models: Ollama, vLLM for privacy-sensitive use cases

Cost Optimization: Smart routing can reduce LLM costs by 60-80% while maintaining quality

Security

  • Authentication: OAuth 2.1, JWT tokens
  • Encryption: TLS 1.3 (transport), AES-256 (at rest)
  • Input Validation: Comprehensive sanitization and validation
  • Audit Logging: Complete audit trail of all operations
  • Rate Limiting: Protection against abuse
  • Compliance: GDPR, HIPAA, PCI DSS ready

Performance

Target Metrics

  • Voice-to-Voice Latency: <800ms (industry standard 2025)
  • Uptime: 99.9% monthly
  • Concurrent Calls: 100+ per instance
  • Error Rate: <0.1%

Benchmarks

Component Target Achieved
STT Response <500ms TBD
LLM TTFT <500ms TBD
TTS Generation <300ms TBD
Total Latency <800ms TBD

Cost Estimates

Per 1000 Conversations

Configuration LLM Cost Voice Cost Total Per Conv
All Claude + ElevenLabs $126 $159 $285 $0.28
Smart Routing + Deepgram $77 $16 $93 $0.09
Aggressive Optimization $30 $16 $46 $0.05

Recommendation: Smart routing balances quality and cost

Development Roadmap

  • Phase 0: Research & Planning (Completed)
  • Phase 1: Foundation (Weeks 1-2)
  • Phase 2: LLM Integration (Weeks 2-3)
  • Phase 3: Voice Processing (Weeks 3-4)
  • Phase 4: GHL Integration (Weeks 4-5)
  • Phase 5: MCP Integration (Weeks 5-6)
  • Phase 6: Security & Testing (Weeks 6-7)
  • Phase 7: Deployment (Weeks 7-8)

See: DEVELOPMENT_PLAN.md for detailed roadmap

API Examples

Create Voice Agent

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/api/v1/agents",
        json={
            "name": "Sales Assistant",
            "description": "Handles sales inquiries",
            "system_prompt": "You are a helpful sales assistant...",
            "voice_config": {
                "voice_id": "21m00Tcm4TlvDq8ikWAM",
                "speed": 1.0,
                "language": "en-US"
            },
            "model_preferences": {
                "primary": "claude-3-5-sonnet-20241022",
                "fallback": "gemini-2.5-flash"
            }
        },
        headers={"Authorization": "Bearer YOUR_TOKEN"}
    )
    agent = response.json()

Start Conversation (WebSocket)

import asyncio
import websockets

async def voice_conversation():
    uri = "ws://localhost:8000/ws/conversations/new"

    async with websockets.connect(uri) as websocket:
        # Initialize
        await websocket.send(json.dumps({
            "type": "init",
            "agent_id": "agent-123"
        }))

        # Stream audio
        await websocket.send(json.dumps({
            "type": "audio_chunk",
            "data": base64_audio_data
        }))

        # Receive response
        response = await websocket.recv()
        print(response)

asyncio.run(voice_conversation())

Configuration

Environment Variables

# API Configuration
API_PORT=8000
API_HOST=0.0.0.0

# Database
DATABASE_URL=postgresql://user:password@localhost/voice_agents
REDIS_URL=redis://localhost:6379

# LLM APIs
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...

# Voice Services
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...

# GoHighLevel
GHL_API_KEY=...
GHL_LOCATION_ID=...
GHL_WEBHOOK_SECRET=...

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=backend tests/

# Run specific test file
pytest tests/test_mcp_service.py -v

# Run integration tests
pytest tests/integration/ -v

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Write tests
  5. Run tests and linting
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Acknowledgments

  • Anthropic - Claude API and MCP framework
  • OpenAI - GPT models
  • Google - Gemini models
  • GoHighLevel - CRM platform and API
  • FastAPI - Web framework
  • Deepgram - Voice processing
  • ElevenLabs - Text-to-speech

Research Sources

This project is built on extensive research of 2025 best practices:

  • Voice agent architecture patterns
  • Multi-model AI integration
  • Model Context Protocol (MCP)
  • Security standards (OAuth 2.1, TLS 1.3)
  • GoHighLevel API capabilities
  • Performance optimization techniques

See: DEVELOPMENT_PLAN.md for complete research findings


Status: Planning Phase Complete - Ready for Implementation Version: 0.1.0 Last Updated: 2025-11-08


Built with ❤️ for the GHL community

About

Developer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •