Curated list of the best open-source tools to run, fine-tune, and build with LLMs 100% locally in 2025–2026
No cloud · No API keys · No censorship — 152 tools with descriptions and growing
Star this repo to keep the ultimate local-AI toolbox at hand → updated weekly
- Ollama – One-command runner for Llama 3, Gemma, Mistral, etc.
- LM Studio – Beautiful GUI for discovering and chatting with local models
- GPT4All – Fully offline chat with 100+ quantized models
- Jan – Open-source ChatGPT alternative that runs locally
- Llama.cpp – High-performance C++ inference engine (GGUF)
- text-generation-webui – Feature-rich web UI with LoRAs and extensions
- AnythingLLM – Local RAG + document chat workspace
- PrivateGPT – Offline Q&A over your documents
- KoboldCpp – Single-file GGUF runner with KoboldAI API
- Pinokio – One-click browser installer for AI apps
- LocalAI – OpenAI API drop-in replacement for local models
- Faraday.dev – Desktop character chat with local models
- Tabby – Self-hosted GitHub Copilot alternative
- Cortex – Embeddable multi-engine runner
- LMDeploy – Model compression and deployment toolkit
- Open WebUI – Official gorgeous frontend for Ollama
- LobeChat – Modern multi-model chat UI with local backends
- Chainlit – Build conversational AI apps fast
- Gradio – Instant web demos for any model
- LoLLMS WebUI – All-in-one local LLM interface
- SillyTavern – Advanced roleplay chat UI
- LibreChat – Multi-provider chat with local support
- Continue.dev – Local VSCode Copilot
- Aider – Terminal pair programmer with git integration
- Open Interpreter – Run code and control your computer locally
- ComfyUI – Node-based Stable Diffusion workflow
- InvokeAI – Creative image generation UI
- Fooocus – Simplified high-quality image generation
- Draw Things – macOS/iOS Stable Diffusion app
- Msty – Minimalist local chat app
- LlamaGPT – Self-hosted chat on Umbrel
- Text Generation UI – Versatile text gen web UI
- Chatbot UI – Clean self-hosted ChatGPT-like interface
- HuggingChat – Self-hosted version of HF chat
- Taskyon – Vue3-based local-first chat UI
- QA-Pilot – Interactive repo/file chat
- Shell-Pilot – LLM-powered shell scripting
- CrewAI – Multi-agent orchestration framework
- AutoGen – Microsoft conversational multi-agent system
- LangGraph – Stateful multi-actor applications
- BabyAGI – Task-driven autonomous agent
- Auto-GPT – Experimental autonomous GPT agent
- GPT Engineer – Generate codebases from specifications
- MetaGPT – Multi-agent software company simulation
- SuperAGI – Infrastructure for autonomous agents
- Devon – Open-source AI software engineer
- Open Interpreter – Natural language code execution
- Aider – Git-aware pair programmer
- Langflow – Visual LLM app builder
- Flowise – Drag-and-drop LLM flows (self-hosted)
- Dify – Open-source LLM app builder (self-hosted)
- Haystack – End-to-end NLP pipelines
- LlamaIndex – Data framework for LLM applications
- Bisheng – Low-code agent builder
- Taskweaver – Code-first agent framework
- XAgent – Autonomous agent with tools
- ChatDev – Collaborative software development agents
- GodMode – Prompt chaining for complex tasks
- SmolAgents – Lightweight agent framework
- Camel-AI – Communicative agents for role-playing
- AgentGPT – Browser-based autonomous agents (local mode)
- PrivateGPT – Local agent for document querying
- Chroma – Lightweight embedded vector database
- Weaviate – Open-source vector search engine
- Qdrant – High-performance filtered vector search
- LanceDB – Serverless vector DB on Parquet
- Milvus – Scalable open-source vector database
- Faiss – Facebook similarity search library
- Pinecone – Self-hosted vector database
- Vespa – Big data serving with vector search
- Typesense – Typo-tolerant search with vectors
- Redis Vector Library – In-memory vector similarity
- PGVector – Postgres vector extension
- DuckDB – In-process OLAP with vector support
- SurrealDB – Multi-model DB with vector indexing
- Zilliz – Cloud-native vector platform (open components)
- Axolotl – YAML-driven LoRA/QLoRA fine-tuning
- Unsloth – 2× faster fine-tuning on consumer GPUs
- LLaMA-Factory – Web UI for efficient fine-tuning
- AutoGPTQ – GPTQ/AWQ quantization toolkit
- PEFT – Parameter-efficient fine-tuning methods
- TRL – RLHF, DPO, PPO training
- Lit-GPT – Lightweight fine-tuning with PyTorch Lightning
- OpenRLHF – Scalable RLHF framework
- DeepSpeed – Deep learning optimization library
- Colossal-AI – Large model training system
- Megatron-LM – Efficient transformer training
- BMTrain – Communication-efficient training
- FSDP – Fully Sharded Data Parallel
- LoRAX – Multi-LoRA serving
- BitsAndBytes – 8-bit optimizers and quantization
- GPTQ-for-LLaMa – 4-bit LLaMA quantization
- ExLlama – Fast LLaMA inference with quantization
- ExLlamaV2 – Optimized quantized inference
- Whisper.cpp – Fast local speech-to-text
- Coqui TTS – Neural text-to-speech synthesis
- OpenVoice – Instant voice cloning
- Silero Models – Pre-trained TTS/STT models
- LLaVA – Vision + text multimodal chat
- Moondream2 – Compact vision-language model
- Bark – Text-to-audio with voice cloning
- Audiocraft – Music and audio generation
- RVC WebUI – Voice conversion
- Tortoise TTS – High-quality multi-voice TTS
- VALL-E X – Zero-shot TTS from short audio
- Piper TTS – Fast neural TTS
- OpenTTS – Multi-speaker TTS
- Kosmos-2 – Grounded image-text model
- ImageBind – Multimodal embedding across 6 modalities
- CLIP – Contrastive language-image pretraining
- vLLM – High-throughput serving with PagedAttention
- TensorRT-LLM – NVIDIA-optimized low-latency inference
- ExLlamaV2 – Blazing-fast quantized inference
- SGLang – Structured generation language
- MLX – Apple Silicon-native framework
- MLC LLM – Universal deployment engine
- llama.cpp – Lightweight C++ inference
- ONNX Runtime – Cross-platform ML accelerator
- OpenVINO – Intel-optimized inference
- TVM – End-to-end optimizing compiler
- GGML – Tensor library for ML
- CTranslate2 – Fast inference engine
- FasterTransformer – NVIDIA transformer decoder
- TurboTransformers – Kernel fusion inference
- LightLLM – Unified inference framework
- DeepSpeed-Inference – Optimized transformer kernels
- FlexFlow – Distributed deep learning
- Ray Serve – Scalable model serving
- BentoML – ML model serving framework
- Triton Inference Server – Multi-framework serving
- OpenPPL – Neural network inference engine
- llama.rs – Rust bindings for llama.cpp
Found something missing? → Open a PR! Let’s get to 200+ together
Last updated: December 1, 2025
Made with ❤️ by @ethicals7s