claude-code claude-proxy anthropic-api databricks-llm openrouter-integration ollama-local llama-cpp azure-openai azure-anthropic mcp-server prompt-caching token-optimization ai-coding-assistant llm-proxy self-hosted-ai git-automation code-generation developer-tools ci-cd-automation llm-gateway cost-reduction multi-provider-llm
MCP โข Git Tools โข Repo Intelligence โข Prompt Caching โข Workspace Automation
โญ Star on GitHub ยท
๐ Documentation ยท
๐ Source Code
Lynkr is an open-source Claude Code-compatible backend proxy that lets you run the Claude Code CLI and Claude-style tools directly against Databricks, Azure, OpenRouter, Ollama, and llama.cpp instead of the default Anthropic cloud.
It enables full repo-aware LLM workflows:
This makes Databricks and other providers a first-class environment for AI-assisted software development, LLM agents, automated refactoring, debugging, and ML/ETL workflow exploration.
/v1/messages)Emulates Anthropicโs backend so the Claude Code CLI works without modification.
Supports Databricks-hosted Claude Sonnet / Haiku models, or any LLM served from Databricks.
Route Claude Code requests into Azureโs /anthropic/v1/messages endpoint.
Connect to Azure OpenAI deployments (GPT-4o, etc.) with full tool calling support.
Access GPT-4o, Claude, Gemini, Llama, and more through a single unified API with full tool calling support.
Run any GGUF model locally with maximum performance using llama.cppโs optimized C++ inference engine.
Auto-discovers MCP manifests and exposes them as tools for smart workflows.
CLAUDE.md, Symbol Index, Cross-file analysisLynkr builds a repo index using SQLite + Tree-sitter for rich context.
Commit, push, diff, stage, generate release notes, etc.
Reuses identical prompts to reduce cost + latency.
Task tracker, file I/O, test runner, index rebuild, etc.
Tools can execute on the Claude Code CLI side instead of the server, enabling local file operations and commands.
Automatic extraction and retrieval of conversation memories using surprise-based filtering, FTS5 semantic search, and multi-signal ranking.
Add custom tools, policies, or backend adapters.
Claude Code is exceptionally usefulโbut it only communicates with Anthropicโs hosted backend.
This means:
โ You canโt point Claude Code at Databricks LLMs
โ You canโt run Claude workflows locally, offline, or in secure contexts
โ MCP tools must be managed manually
โ You donโt control caching, policies, logs, or backend behavior
Lynkr is a Claude Code-compatible backend that sits between the CLI and your actual model provider.
Claude Code CLI
โ
Lynkr Proxy
โ
Databricks / Azure Anthropic / OpenRouter / Ollama / llama.cpp / MCP / Tools
This enables:
Claude Code CLI
โ (HTTP POST /v1/messages)
Lynkr Proxy (Node.js + Express)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Orchestrator (Agent Loop) โ
โ โโ Tool Execution Pipeline โ
โ โโ Long-Term Memory System โ
โ โโ MCP Registry + Sandbox โ
โ โโ Prompt Cache (LRU + TTL) โ
โ โโ Session Store (SQLite) โ
โ โโ Repo Indexer (Tree-sitter) โ
โ โโ Policy Engine โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Databricks / Azure Anthropic / OpenRouter / Ollama / llama.cpp
graph TB
A[Claude Code CLI] -->|HTTP POST /v1/messages| B[Lynkr Proxy Server]
B --> C{Middleware Stack}
C -->|Load Shedding| D{Load OK?}
D -->|Yes| E[Request Logging]
D -->|No| Z1[503 Service Unavailable]
E --> F[Metrics Collection]
F --> G[Input Validation]
G --> H[Orchestrator]
H --> I{Check Prompt Cache}
I -->|Cache Hit| J[Return Cached Response]
I -->|Cache Miss| K{Determine Provider}
K -->|Simple 0-2 tools| L[Ollama Local]
K -->|Moderate 3-14 tools| M[OpenRouter / Azure]
K -->|Complex 15+ tools| N[Databricks]
L --> O[Circuit Breaker Check]
M --> O
N --> O
O -->|Closed| P{Provider API}
O -->|Open| Z2[Fallback Provider]
P -->|Databricks| Q1[Databricks API]
P -->|OpenRouter| Q2[OpenRouter API]
P -->|Ollama| Q3[Ollama Local]
P -->|Azure| Q4[Azure Anthropic API]
P -->|llama.cpp| Q5[llama.cpp Server]
Q1 --> R[Response Processing]
Q2 --> R
Q3 --> R
Q4 --> R
Q5 --> R
Z2 --> R
R --> S[Format Conversion]
S --> T[Cache Response]
T --> U[Update Metrics]
U --> V[Return to Client]
J --> V
style B fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
style H fill:#7b68ee,stroke:#333,stroke-width:2px,color:#fff
style K fill:#f39c12,stroke:#333,stroke-width:2px
style P fill:#2ecc71,stroke:#333,stroke-width:2px,color:#fff
Key directories:
src/api โ Claude-compatible API proxysrc/orchestrator โ LLM agent runtime loopsrc/memory โ Long-term memory system (Titans-inspired)src/mcp โ Model Context Protocol toolingsrc/tools โ Git, diff, test, tasks, fs toolssrc/cache โ prompt caching backendsrc/indexer โ repo intelligencenpm install -g lynkr
lynkr start
brew tap vishalveerareddy123/lynkr
brew install vishalveerareddy123/lynkr/lynkr
git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr
npm install
npm start
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://<workspace>.cloud.databricks.com
DATABRICKS_API_KEY=<personal-access-token>
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
WORKSPACE_ROOT=/path/to/your/repo
PORT=8080
MODEL_PROVIDER=azure-anthropic
AZURE_ANTHROPIC_ENDPOINT=https://<resource>.services.ai.azure.com/anthropic/v1/messages
AZURE_ANTHROPIC_API_KEY=<api-key>
AZURE_ANTHROPIC_VERSION=2023-06-01
WORKSPACE_ROOT=/path/to/repo
PORT=8080
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_API_KEY=<api-key>
AZURE_OPENAI_DEPLOYMENT=gpt-4o
PORT=8080
What is OpenRouter?
OpenRouter provides unified access to 100+ AI models (GPT-4o, Claude, Gemini, Llama, etc.) through a single API. Benefits:
Configuration:
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-... # Get from https://openrouter.ai/keys
OPENROUTER_MODEL=openai/gpt-4o-mini # See https://openrouter.ai/models
OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
PORT=8080
WORKSPACE_ROOT=/path/to/your/repo
Popular Models:
openai/gpt-4o-mini โ Fast, affordable ($0.15/$0.60 per 1M)anthropic/claude-3.5-sonnet โ Claudeโs best reasoninggoogle/gemini-pro-1.5 โ Large context windowmeta-llama/llama-3.1-70b-instruct โ Open-source LlamaSee https://openrouter.ai/models for complete list.
Getting Started:
What is llama.cpp?
llama.cpp is a high-performance C++ inference engine for running GGUF models locally. Benefits:
Configuration:
MODEL_PROVIDER=llamacpp
LLAMACPP_ENDPOINT=http://localhost:8080 # llama-server default port
LLAMACPP_MODEL=qwen2.5-coder-7b # Model name (for logging)
LLAMACPP_TIMEOUT_MS=120000 # Request timeout
PORT=8080
WORKSPACE_ROOT=/path/to/your/repo
Setup Steps:
# 1. Build llama.cpp (or download pre-built binary)
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make
# 2. Download a GGUF model (example: Qwen2.5-Coder)
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
# 3. Start llama-server
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
# 4. Verify server is running
curl http://localhost:8080/health
llama.cpp vs Ollama:
| Feature | Ollama | llama.cpp |
|---|---|---|
| Setup | Easy (app) | Manual (compile/download) |
| Model Format | Ollama-specific | Any GGUF model |
| Performance | Good | Excellent |
| Memory Usage | Higher | Lower (quantization) |
| API | Custom | OpenAI-compatible |
| Flexibility | Limited models | Any GGUF from HuggingFace |
Choose llama.cpp when you need maximum performance, specific quantization options, or GGUF models not available in Ollama.
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=dummy
Then:
claude chat
claude diff
claude review
claude apply
Everything routes through your configured model provider (Databricks, Azure, OpenRouter, Ollama, or llama.cpp).
Lynkr uses Tree-sitter and SQLite to analyze your workspace:
It generates a structured CLAUDE.md so the model always has context.
Lynkr includes a sophisticated long-term memory system inspired by Googleโs Titans architecture, enabling persistent learning across conversations without model retraining.
The memory system automatically:
Memories are scored 0.0-1.0 based on five factors:
Only memories exceeding the surprise threshold (default 0.3) are stored, preventing redundancy.
Uses SQLiteโs full-text search with Porter stemming for keyword-based semantic search:
Ranks memories using weighted combination:
All features are enabled by default with sensible defaults:
# Core Settings
MEMORY_ENABLED=true # Master switch
MEMORY_RETRIEVAL_LIMIT=5 # Memories per request
MEMORY_SURPRISE_THRESHOLD=0.3 # Novelty filter (0.0-1.0)
# Lifecycle Management
MEMORY_MAX_AGE_DAYS=90 # Auto-delete old memories
MEMORY_MAX_COUNT=10000 # Maximum total memories
MEMORY_DECAY_ENABLED=true # Enable importance decay
MEMORY_DECAY_HALF_LIFE=30 # Days for 50% importance decay
# Retrieval Behavior
MEMORY_INCLUDE_GLOBAL=true # Include cross-session memories
MEMORY_INJECTION_FORMAT=system # Where to inject (system/assistant_preamble)
MEMORY_EXTRACTION_ENABLED=true # Auto-extract from responses
Exceeds all targets:
The system works automatically - no manual intervention needed:
# First conversation
User: "I prefer Python for data processing"
Assistant: "I'll remember that you prefer Python..."
# System extracts: [preference] "prefer Python for data processing" (surprise: 0.85)
# Later conversation (same or different session)
User: "Write a script to process this CSV"
# System retrieves: [preference] "prefer Python for data processing"
Assistant: "I'll write a Python script using pandas..."
memories: Core memory storage (content, type, importance, surprise_score)memories_fts: FTS5 full-text search index (auto-synced via triggers)memory_entities: Entity tracking for novelty detectionmemory_embeddings: Optional vector storage (Phase 3, not yet used)memory_associations: Memory graph relationships (Phase 5, not yet used)Explicit memory management tools available:
memory_search - Search long-term memories by querymemory_add - Manually add important factsmemory_forget - Remove memories matching querymemory_stats - View memory statisticsEnable by exposing tools to the model (configurable in orchestrator).
Lynkr includes an LRU+TTL prompt cache.
Configure:
PROMPT_CACHE_ENABLED=true
PROMPT_CACHE_TTL_MS=300000
PROMPT_CACHE_MAX_ENTRIES=64
Lynkr automatically discovers MCP manifests from:
~/.claude/mcp
or directories defined via:
MCP_MANIFEST_DIRS
MCP tools become available inside the Claude Code environment, including:
Optional sandboxing uses Docker or OCI runtimes.
Lynkr includes a full suite of Git operations:
workspace_git_statusworkspace_git_diffworkspace_git_stageworkspace_git_commitworkspace_git_pushworkspace_git_pullPolicies:
POLICY_GIT_ALLOW_PUSHPOLICY_GIT_REQUIRE_TESTSPOLICY_GIT_TEST_COMMANDExample:
Disallow push unless tests pass? Set
POLICY_GIT_REQUIRE_TESTS=true.
Lynkr supports client-side tool execution, enabling tools to execute on the Claude Code CLI machine instead of the proxy server.
Enable passthrough mode:
export TOOL_EXECUTION_MODE=client
npm start
How it works:
tool_use blockstool_use blocks and executes locallytool_result blocks back in the next requestBenefits:
Use cases:
Configuration:
TOOL_EXECUTION_MODE=server โ Tools run on proxy (default)TOOL_EXECUTION_MODE=client โ Tools run on CLI sideTOOL_EXECUTION_MODE=passthrough โ Alias for client modecurl http://localhost:8080/v1/messages \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-proxy",
"messages": [{ "role": "user", "content": "Rebuild the index." }],
"tool_choice": {
"type": "function",
"function": { "name": "workspace_index_rebuild" }
}
}'
Lynkrโs agentic architecture is inspired by the Autonomous Cognitive Entity (ACE) Framework, specifically implementing the Reflector pattern to enable self-improving capabilities.
Reflector agent analyzes the transcript to extract โskillsโ and optimize future performance.The Reflector (src/agents/reflector.js) is an introspective component that analyzes:
This โworking natureโ allows Lynkr to not just execute commands, but to learn from interaction, continuously refining its internal heuristics for tool selection and planning.
TOOL_EXECUTION_MODE=client/passthrough) โ Tools can execute on the Claude Code CLI side, enabling local file operations, commands, and access to local credentialstool_use block generationAgentic AI Systems:
Long-Term Memory & RAG:
โญ Star this repository to show your support and help others discover Lynkr!
Help spread the word about Lynkr:
Reduce your Claude Code costs by 60-80% today:
If you use Databricks, Azure Anthropic, OpenRouter, Ollama, or llama.cpp and want rich Claude Code workflows with massive cost savings, Lynkr gives you the control, flexibility, and extensibility you need.
Feel free to open issues, contribute tools, integrate with MCP servers, or help us improve the documentation!