Optimize the flow of AI
FluxAI is a cost optimization and observability platform for AWS Bedrock that helps companies reduce their LLM expenses by 30-50% through intelligent caching, smart routing, and real-time analytics.
| Component | Status | Documentation |
|---|---|---|
| API Gateway | β Complete | Technical Spec |
| Semantic Cache | β Complete | Implementation Guide |
| Cost Calculator | β Complete | Calculator Guide |
| Observability | β Complete | Observability Guide |
| Dashboard | β Complete | Dashboard Guide |
| Multi-Model Router | π Documented | Router Implementation |
- SEMANTIC_CACHE_SUMMARY.md - Complete summary of semantic cache implementation
- COST_CALCULATOR_IMPLEMENTATION.md - Cost calculator implementation details
- OBSERVABILITY_IMPLEMENTATION.md - Observability system implementation summary
Click the image above to watch our 3-minute demo showing FluxAI in action
FluxAI is a drop-in optimization layer that sits between your applications and AWS Bedrock, providing intelligent cost reduction, performance optimization, and complete observability for your LLM operations.
- π° Reduce Costs by 30-50%: Semantic caching and smart routing automatically optimize your Bedrock spending
- π Complete Visibility: Real-time cost tracking, model performance metrics, and usage analytics
- β‘ Improve Performance: Intelligent model selection and request optimization
- π Enterprise Ready: SOC 2 compliance roadmap, RBAC, audit logs, and SSO integration
βββββββββββββββββββββββββββββββββββββββ
β Customer Applications β
β (APIs, Chatbots, AI Agents) β
βββββββββββββββ¬ββββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββββββββββββββββ
β FluxAI Gateway β
β Auth | Rate Limit | Cost Track β
βββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β β β
ββββββββββ ββββββββ ββββββββββββ
βSemanticβ βSmart β βDashboard β
β Cache β βRouterβ βAnalytics β
ββββββββββ ββββββββ ββββββββββββ
β β β
ββββββββββΌβββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β AWS Bedrock API β
β (Claude, Llama, Titan, Mistral) β
βββββββββββββββββββββββββββββββββββββββ
- π― API Gateway: Drop-in replacement for Bedrock API with authentication and rate limiting
- π° Cost Tracking: Real-time cost calculation per request with detailed analytics
- π§ Semantic Caching: 30-50% cost reduction through intelligent response caching using AWS Bedrock Titan Embeddings
- π Smart Routing: Cost, latency, or capability-based model selection
- π Analytics Dashboard: Beautiful real-time metrics and cost insights with Streamlit
- π Cost Alerts: Threshold notifications and anomaly detection
- π Observability: Complete monitoring with Prometheus, OpenTelemetry, and distributed tracing
- Quick Start Guide - Get up and running in 5 minutes
- Technical Specification - Complete system architecture and design
- Implementation Guide - Development roadmap and code examples
- Getting Started (Detailed) - Step-by-step setup instructions
- Docker Deployment - Complete Docker and Docker Compose guide
- CI/CD Pipeline - GitHub Actions workflows for security scanning and Docker publishing
Core Features:
- Semantic Cache Implementation - How the semantic caching system works, performance characteristics, and cost savings analysis
- Cost Calculator Guide - Real-time cost tracking, savings analysis, and optimization recommendations
- Multi-Model Router - Intelligent model selection based on cost, latency, or capabilities
- Observability System - Comprehensive monitoring with Prometheus metrics, OpenTelemetry tracing, and structured logging
- Dashboard Guide - Interactive Streamlit dashboard for real-time monitoring and analytics
- Dashboard Quick Reference - Quick reference guide for daily dashboard usage
Implementation Summaries:
- Semantic Cache Summary - Complete implementation summary with files created and testing checklist
- Cost Calculator Summary - Implementation details, features, and next steps
- Observability Summary - Full observability system implementation with metrics, tracing, and logging
- OpenAPI Documentation: Available at
/docswhen running the server - Cache API:
GET /v1/cache/stats,DELETE /v1/cache - Bedrock API:
POST /v1/bedrock/invoke,POST /v1/bedrock/invoke/stream - Analytics API:
GET /v1/analytics/cost - Metrics API:
GET /metrics(Prometheus format)
- Testing Checklist - Comprehensive testing guide to verify all components are working correctly
- Python 3.11 or higher
- Docker and Docker Compose (for Redis, Prometheus, PostgreSQL)
- AWS Account with Bedrock access
- AWS credentials configured
# 1. Clone the repository
git clone https://github.com/yourusername/fluxai.git
cd fluxai
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure environment
cp .env.example .env
# Edit .env with your AWS credentials and settings
# 4. Start infrastructure services (Redis, Prometheus, PostgreSQL)
docker-compose up -d
# 5. Run the FluxAI Gateway
uvicorn app.main:app --reload
# 6. View API documentation
# Open http://localhost:8000/docs in your browser
# 7. Launch observability dashboard (optional)
# Windows PowerShell:
.\start-dashboard.ps1
# Linux/macOS:
./start-dashboard.sh
# Or manually:
streamlit run dashboard/app.pySee GETTING_STARTED.md for detailed setup instructions.
100,000 requests/month Γ $0.0165 per request = $1,650/month
60,000 Bedrock requests Γ $0.0165 = $990
40,000 cache hits Γ $0.00005 = $2
Total: $992/month
Savings: $658/month (40% reduction)
Annual Savings: $7,896
The semantic cache uses AWS Bedrock Titan Embeddings to identify similar queries and return cached responses, providing massive cost savings with minimal latency impact.
Learn more in SEMANTIC_CACHE.md.
See LICENSE file for details.
