Micdrop website | Documentation
Micdrop is a set of open source Typescript packages to build real-time voice conversations with AI agents. It handles all the complexities on the browser and server side (microphone, speaker, VAD, network communication, etc) and provides ready-to-use implementations for various AI providers.
@micdrop/client- Browser library handling microphone input, audio playback, and real-time communication@micdrop/server- Server implementation for audio streaming and AI integration orchestration
@micdrop/openai- OpenAI integration providing LLM agent and speech-to-text capabilities@micdrop/ai-sdk- AI SDK agent compatible with a lot of LLM providers.@micdrop/elevenlabs- ElevenLabs text-to-speech integration with streaming support@micdrop/cartesia- Cartesia text-to-speech integration for real-time voice synthesis@micdrop/mistral- Mistral AI agent integration for conversation handling@micdrop/gladia- Gladia speech-to-text integration for audio transcription
@micdrop/react- React hooks for Micdrop
demo-client- Example web application with React.demo-server- Example server with fastify.
See the author Godefroy de Compreignac talking about Micdrop and voice AI in this video:
While real-time multimodal models (voice-to-voice) offer impressive capabilities, they often come with limitations in terms of customization and cost. Micdrop takes a different approach by:
- π― Allowing you to choose the best-in-class API for each component:
- Select specific voices from TTS providers
- Use different LLMs optimized for your use case
- Pick STT engines suited for specific languages/accents
- π° Reducing costs by letting you:
- Use more cost-effective API providers
- Mix open source and commercial solutions
- Control exactly when APIs are called
- π§ Providing granular control over the conversation flow
- π Supporting a wider range of languages and voices through specialized providers
This modular approach gives you the flexibility to build voice applications that are both powerful and cost-effective.
- ποΈ Microphone handling with:
- Streaming support
- Voice Activity Detection (VAD)
- π Advanced audio playback with:
- Streaming support
- Device selection and control
- π WebSocket communication
- π¦ AI implementations provided for OpenAI, ElevenLabs, Mistral, Gladia, and more
- π Bring your own AI components (framework agnostic)
- Large Language Models (LLM)
- Text-to-Speech (TTS)
- Speech-to-Text (STT)
For detailed development instructions, including how to build, test, and publish packages, please see DEVELOPMENT.md.
MIT License - see the LICENSE file for details
Originally developed for Raconte.ai and open sourced by Lonestone (GitHub)
