llama

Star

Here are 49 public repositories matching this topic...

LostRuins / koboldcpp

Star

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

llama language-model gemma mistral koboldai llm llamacpp ggml koboldcpp gguf

Updated Nov 2, 2025
C++

SJTU-IPADS / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Aug 2, 2025
C++

Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK. Python, C# and JS(WASM) bindings available.

raspberry-pi machine-learning webassembly wasm llama whisper mistral onnx tinyml stable-diffusion yolov8

Updated Nov 2, 2025
C++

tenstorrent / tt-metal

Star

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

ai metal gpu accelerator cuda llama kernels scale-out llm stable-diffusion deepseek tenstorrent video-gen img-gen

Updated Nov 3, 2025
C++

UbiquitousLearning / mllm

Star

Fast Multimodal LLM on Mobile Devices

llama multimodal large-language-models

Updated Nov 3, 2025
C++

alibaba / rtp-llm

Star

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Nov 3, 2025
C++

zhihu / ZhiLight

Star

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda pytorch llama gpt inference-engine model-serving llm llm-serving llm-inference deepseek-r1

Updated Jul 10, 2025
C++

andrewkchan / yalm

Star

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

machine-learning cpp cuda llama mistral inference-engine llm llamacpp llm-inference

Updated Sep 13, 2025
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Nov 3, 2025
C++

intel / xFasterTransformer

Star

intel inference transformer xeon llama model-serving llm chatglm qwen

Updated Sep 18, 2025
C++

FastFlowLM / FastFlowLM

Star

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

amd llama npu llm deepseek

Updated Nov 2, 2025
C++

andrewkchan / deepseek.cpp

Star

CPU inference for the DeepSeek family of large language models in C++

machine-learning cpp transformers llama llm llm-inference deepseek

Updated Oct 2, 2025
C++

KolosalAI / Kolosal

Sponsor

Star

Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.

Updated May 22, 2025
C++

prajwalshettydev / UnrealGenAISupport

Sponsor

Star

An Unreal Engine plugin for LLM/GenAI models & MCP UE5 server. Supports Claude Desktop App, Windsurf & Cursor, also includes OpenAI's GPT 5, Deepseek V3.1, Claude Sonnet 4 APIs and Grok 4, with plans to add Gemini, audio & realtime APIs soon. UnrealMCP is also here!! Automatic blueprint and scene generation from AI!!