Awesome Local AI

Curated list of the best open-source tools to run, fine-tune, and build with LLMs 100% locally in 2025–2026
No cloud · No API keys · No censorship — 152 tools with descriptions and growing

Star this repo to keep the ultimate local-AI toolbox at hand → updated weekly

One-Click Runners & Installers (15)

Ollama – One-command runner for Llama 3, Gemma, Mistral, etc.
LM Studio – Beautiful GUI for discovering and chatting with local models
GPT4All – Fully offline chat with 100+ quantized models
Jan – Open-source ChatGPT alternative that runs locally
Llama.cpp – High-performance C++ inference engine (GGUF)
text-generation-webui – Feature-rich web UI with LoRAs and extensions
AnythingLLM – Local RAG + document chat workspace
PrivateGPT – Offline Q&A over your documents
KoboldCpp – Single-file GGUF runner with KoboldAI API
Pinokio – One-click browser installer for AI apps
LocalAI – OpenAI API drop-in replacement for local models
Faraday.dev – Desktop character chat with local models
Tabby – Self-hosted GitHub Copilot alternative
Cortex – Embeddable multi-engine runner
LMDeploy – Model compression and deployment toolkit

Desktop & Web UIs (22)

Open WebUI – Official gorgeous frontend for Ollama
LobeChat – Modern multi-model chat UI with local backends
Chainlit – Build conversational AI apps fast
Gradio – Instant web demos for any model
LoLLMS WebUI – All-in-one local LLM interface
SillyTavern – Advanced roleplay chat UI
LibreChat – Multi-provider chat with local support
Continue.dev – Local VSCode Copilot
Aider – Terminal pair programmer with git integration
Open Interpreter – Run code and control your computer locally
ComfyUI – Node-based Stable Diffusion workflow
InvokeAI – Creative image generation UI
Fooocus – Simplified high-quality image generation
Draw Things – macOS/iOS Stable Diffusion app
Msty – Minimalist local chat app
LlamaGPT – Self-hosted chat on Umbrel
Text Generation UI – Versatile text gen web UI
Chatbot UI – Clean self-hosted ChatGPT-like interface
HuggingChat – Self-hosted version of HF chat
Taskyon – Vue3-based local-first chat UI
QA-Pilot – Interactive repo/file chat
Shell-Pilot – LLM-powered shell scripting

Agent Frameworks (fully local) (25)

CrewAI – Multi-agent orchestration framework
AutoGen – Microsoft conversational multi-agent system
LangGraph – Stateful multi-actor applications
BabyAGI – Task-driven autonomous agent
Auto-GPT – Experimental autonomous GPT agent
GPT Engineer – Generate codebases from specifications
MetaGPT – Multi-agent software company simulation
SuperAGI – Infrastructure for autonomous agents
Devon – Open-source AI software engineer
Open Interpreter – Natural language code execution
Aider – Git-aware pair programmer
Langflow – Visual LLM app builder
Flowise – Drag-and-drop LLM flows (self-hosted)
Dify – Open-source LLM app builder (self-hosted)
Haystack – End-to-end NLP pipelines
LlamaIndex – Data framework for LLM applications
Bisheng – Low-code agent builder
Taskweaver – Code-first agent framework
XAgent – Autonomous agent with tools
ChatDev – Collaborative software development agents
GodMode – Prompt chaining for complex tasks
SmolAgents – Lightweight agent framework
Camel-AI – Communicative agents for role-playing
AgentGPT – Browser-based autonomous agents (local mode)
PrivateGPT – Local agent for document querying

RAG & Vector Databases (14)

Chroma – Lightweight embedded vector database
Weaviate – Open-source vector search engine
Qdrant – High-performance filtered vector search
LanceDB – Serverless vector DB on Parquet
Milvus – Scalable open-source vector database
Faiss – Facebook similarity search library
Pinecone – Self-hosted vector database
Vespa – Big data serving with vector search
Typesense – Typo-tolerant search with vectors
Redis Vector Library – In-memory vector similarity
PGVector – Postgres vector extension
DuckDB – In-process OLAP with vector support
SurrealDB – Multi-model DB with vector indexing
Zilliz – Cloud-native vector platform (open components)

Fine-tuning & Quantization (18)

Axolotl – YAML-driven LoRA/QLoRA fine-tuning
Unsloth – 2× faster fine-tuning on consumer GPUs
LLaMA-Factory – Web UI for efficient fine-tuning
AutoGPTQ – GPTQ/AWQ quantization toolkit
PEFT – Parameter-efficient fine-tuning methods
TRL – RLHF, DPO, PPO training
Lit-GPT – Lightweight fine-tuning with PyTorch Lightning
OpenRLHF – Scalable RLHF framework
DeepSpeed – Deep learning optimization library
Colossal-AI – Large model training system
Megatron-LM – Efficient transformer training
BMTrain – Communication-efficient training
FSDP – Fully Sharded Data Parallel
LoRAX – Multi-LoRA serving
BitsAndBytes – 8-bit optimizers and quantization
GPTQ-for-LLaMa – 4-bit LLaMA quantization
ExLlama – Fast LLaMA inference with quantization
ExLlamaV2 – Optimized quantized inference

Voice & Multimodal (local) (16)

Whisper.cpp – Fast local speech-to-text
Coqui TTS – Neural text-to-speech synthesis
OpenVoice – Instant voice cloning
Silero Models – Pre-trained TTS/STT models
LLaVA – Vision + text multimodal chat
Moondream2 – Compact vision-language model
Bark – Text-to-audio with voice cloning
Audiocraft – Music and audio generation
RVC WebUI – Voice conversion
Tortoise TTS – High-quality multi-voice TTS
VALL-E X – Zero-shot TTS from short audio
Piper TTS – Fast neural TTS
OpenTTS – Multi-speaker TTS
Kosmos-2 – Grounded image-text model
ImageBind – Multimodal embedding across 6 modalities
CLIP – Contrastive language-image pretraining

Inference Engines & Backends (22)

vLLM – High-throughput serving with PagedAttention
TensorRT-LLM – NVIDIA-optimized low-latency inference
ExLlamaV2 – Blazing-fast quantized inference
SGLang – Structured generation language
MLX – Apple Silicon-native framework
MLC LLM – Universal deployment engine
llama.cpp – Lightweight C++ inference
ONNX Runtime – Cross-platform ML accelerator
OpenVINO – Intel-optimized inference
TVM – End-to-end optimizing compiler
GGML – Tensor library for ML
CTranslate2 – Fast inference engine
FasterTransformer – NVIDIA transformer decoder
TurboTransformers – Kernel fusion inference
LightLLM – Unified inference framework
DeepSpeed-Inference – Optimized transformer kernels
FlexFlow – Distributed deep learning
Ray Serve – Scalable model serving
BentoML – ML model serving framework
Triton Inference Server – Multi-framework serving
OpenPPL – Neural network inference engine
llama.rs – Rust bindings for llama.cpp

Contribute

Found something missing? → Open a PR! Let’s get to 200+ together

Last updated: December 1, 2025
Made with ❤️ by @ethicals7s

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Local AI

One-Click Runners & Installers (15)

Desktop & Web UIs (22)

Agent Frameworks (fully local) (25)

RAG & Vector Databases (14)

Fine-tuning & Quantization (18)

Voice & Multimodal (local) (16)

Inference Engines & Backends (22)

Contribute

About

Uh oh!

Releases

Packages

License

ethicals7s/awesome-local-ai

Folders and files

Latest commit

History

Repository files navigation

Awesome Local AI

One-Click Runners & Installers (15)

Desktop & Web UIs (22)

Agent Frameworks (fully local) (25)

RAG & Vector Databases (14)

Fine-tuning & Quantization (18)

Voice & Multimodal (local) (16)

Inference Engines & Backends (22)

Contribute

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages