llama
Here are 49 public repositories matching this topic...
High-speed Large Language Model Serving for Local Deployment
-
Updated
Aug 2, 2025 - C++
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK. Python, C# and JS(WASM) bindings available.
-
Updated
Nov 2, 2025 - C++
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
-
Updated
Nov 3, 2025 - C++
Fast Multimodal LLM on Mobile Devices
-
Updated
Nov 3, 2025 - C++
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
-
Updated
Nov 3, 2025 - C++
A highly optimized LLM inference acceleration engine for Llama and its variants.
-
Updated
Jul 10, 2025 - C++
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
-
Updated
Sep 13, 2025 - C++
A high-performance inference system for large language models, designed for production environments.
-
Updated
Nov 3, 2025 - C++
-
Updated
Sep 18, 2025 - C++
CPU inference for the DeepSeek family of large language models in C++
-
Updated
Oct 2, 2025 - C++
An Unreal Engine plugin for LLM/GenAI models & MCP UE5 server. Supports Claude Desktop App, Windsurf & Cursor, also includes OpenAI's GPT 5, Deepseek V3.1, Claude Sonnet 4 APIs and Grok 4, with plans to add Gemini, audio & realtime APIs soon. UnrealMCP is also here!! Automatic blueprint and scene generation from AI!!
-
Updated
Sep 6, 2025 - C++
Modern, Header-only C++ bindings for the Ollama API.
-
Updated
Oct 20, 2025 - C++
LLaVA server (llama.cpp).
-
Updated
Oct 20, 2023 - C++
Improve this page
Add a description, image, and links to the llama topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama topic, visit your repo's landing page and select "manage topics."