Lynkr

Lynkr - Production-Ready Claude Code Proxy with Multi-Provider Support, MCP Integration & Token Optimization

Lynkr is an open-source, production-ready Claude Code proxy that enables the Claude Code CLI to work with any LLM provider (Databricks, OpenRouter, Ollama, Azure, OpenAI, llama.cpp) without losing Anthropic backend features. It features MCP server orchestration, Git workflows, repo intelligence, workspace tools, prompt caching, and 60-80% token optimization for cost-effective LLM-powered development.

๐Ÿ”– Keywords

claude-code claude-proxy anthropic-api databricks-llm openrouter-integration ollama-local llama-cpp azure-openai azure-anthropic mcp-server prompt-caching token-optimization ai-coding-assistant llm-proxy self-hosted-ai git-automation code-generation developer-tools ci-cd-automation llm-gateway cost-reduction multi-provider-llm


Lynkr

MCP โ€ข Git Tools โ€ข Repo Intelligence โ€ข Prompt Caching โ€ข Workspace Automation

โญ Star on GitHub ยท
๐Ÿ“˜ Documentation ยท
๐Ÿ™ Source Code


๐Ÿš€ What is Lynkr?

Lynkr is an open-source Claude Code-compatible backend proxy that lets you run the Claude Code CLI and Claude-style tools directly against Databricks, Azure, OpenRouter, Ollama, and llama.cpp instead of the default Anthropic cloud.

It enables full repo-aware LLM workflows:

This makes Databricks and other providers a first-class environment for AI-assisted software development, LLM agents, automated refactoring, debugging, and ML/ETL workflow exploration.


๐ŸŒŸ Key Features (SEO Summary)

โœ” Claude Code-compatible API (/v1/messages)

Emulates Anthropicโ€™s backend so the Claude Code CLI works without modification.

โœ” Works with Databricks LLM Serving

Supports Databricks-hosted Claude Sonnet / Haiku models, or any LLM served from Databricks.

โœ” Supports Azure Anthropic models

Route Claude Code requests into Azureโ€™s /anthropic/v1/messages endpoint.

โœ” Supports Azure OpenAI models

Connect to Azure OpenAI deployments (GPT-4o, etc.) with full tool calling support.

โœ” Supports OpenRouter (100+ models)

Access GPT-4o, Claude, Gemini, Llama, and more through a single unified API with full tool calling support.

โœ” Supports llama.cpp (Local GGUF Models)

Run any GGUF model locally with maximum performance using llama.cppโ€™s optimized C++ inference engine.

โœ” Full Model Context Protocol (MCP) integration

Auto-discovers MCP manifests and exposes them as tools for smart workflows.

โœ” Repo Intelligence: CLAUDE.md, Symbol Index, Cross-file analysis

Lynkr builds a repo index using SQLite + Tree-sitter for rich context.

โœ” Git Tools and Workflow Automation

Commit, push, diff, stage, generate release notes, etc.

โœ” Prompt Caching (LRU + TTL)

Reuses identical prompts to reduce cost + latency.

โœ” Workspace Tools

Task tracker, file I/O, test runner, index rebuild, etc.

โœ” Client-Side Tool Execution (Passthrough Mode)

Tools can execute on the Claude Code CLI side instead of the server, enabling local file operations and commands.

โœ” Titans-Inspired Long-Term Memory System

Automatic extraction and retrieval of conversation memories using surprise-based filtering, FTS5 semantic search, and multi-signal ranking.

โœ” Fully extensible Node.js architecture

Add custom tools, policies, or backend adapters.


๐Ÿ“š Table of Contents


๐Ÿงฉ What Lynkr Solves

The Problem

Claude Code is exceptionally usefulโ€”but it only communicates with Anthropicโ€™s hosted backend.

This means:

โŒ You canโ€™t point Claude Code at Databricks LLMs
โŒ You canโ€™t run Claude workflows locally, offline, or in secure contexts
โŒ MCP tools must be managed manually
โŒ You donโ€™t control caching, policies, logs, or backend behavior

The Solution: Lynkr

Lynkr is a Claude Code-compatible backend that sits between the CLI and your actual model provider.


Claude Code CLI
โ†“
Lynkr Proxy
โ†“
Databricks / Azure Anthropic / OpenRouter / Ollama / llama.cpp / MCP / Tools

This enables:


๐Ÿ— Architecture Overview


Claude Code CLI
โ†“  (HTTP POST /v1/messages)
Lynkr Proxy (Node.js + Express)
โ†“
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โ”‚  Orchestrator (Agent Loop)          โ”‚
โ”‚  โ”œโ”€ Tool Execution Pipeline         โ”‚
โ”‚  โ”œโ”€ Long-Term Memory System         โ”‚
โ”‚  โ”œโ”€ MCP Registry + Sandbox          โ”‚
โ”‚  โ”œโ”€ Prompt Cache (LRU + TTL)        โ”‚
โ”‚  โ”œโ”€ Session Store (SQLite)          โ”‚
โ”‚  โ”œโ”€ Repo Indexer (Tree-sitter)      โ”‚
โ”‚  โ”œโ”€ Policy Engine                   โ”‚
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โ†“
Databricks / Azure Anthropic / OpenRouter / Ollama / llama.cpp

Request Flow Visualization

graph TB
    A[Claude Code CLI] -->|HTTP POST /v1/messages| B[Lynkr Proxy Server]
    B --> C{Middleware Stack}
    C -->|Load Shedding| D{Load OK?}
    D -->|Yes| E[Request Logging]
    D -->|No| Z1[503 Service Unavailable]
    E --> F[Metrics Collection]
    F --> G[Input Validation]
    G --> H[Orchestrator]

    H --> I{Check Prompt Cache}
    I -->|Cache Hit| J[Return Cached Response]
    I -->|Cache Miss| K{Determine Provider}

    K -->|Simple 0-2 tools| L[Ollama Local]
    K -->|Moderate 3-14 tools| M[OpenRouter / Azure]
    K -->|Complex 15+ tools| N[Databricks]

    L --> O[Circuit Breaker Check]
    M --> O
    N --> O

    O -->|Closed| P{Provider API}
    O -->|Open| Z2[Fallback Provider]

    P -->|Databricks| Q1[Databricks API]
    P -->|OpenRouter| Q2[OpenRouter API]
    P -->|Ollama| Q3[Ollama Local]
    P -->|Azure| Q4[Azure Anthropic API]
    P -->|llama.cpp| Q5[llama.cpp Server]

    Q1 --> R[Response Processing]
    Q2 --> R
    Q3 --> R
    Q4 --> R
    Q5 --> R
    Z2 --> R

    R --> S[Format Conversion]
    S --> T[Cache Response]
    T --> U[Update Metrics]
    U --> V[Return to Client]
    J --> V

    style B fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
    style H fill:#7b68ee,stroke:#333,stroke-width:2px,color:#fff
    style K fill:#f39c12,stroke:#333,stroke-width:2px
    style P fill:#2ecc71,stroke:#333,stroke-width:2px,color:#fff

Key directories:


โš™ Getting Started: Installation & Setup Guide

npm install -g lynkr
lynkr start

Homebrew

brew tap vishalveerareddy123/lynkr
brew install vishalveerareddy123/lynkr/lynkr

From source

git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr
npm install
npm start

๐Ÿ”ง Configuration Guide for Multi-Provider Support (Databricks, Azure, OpenRouter, Ollama, llama.cpp)

Databricks Setup

MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://<workspace>.cloud.databricks.com
DATABRICKS_API_KEY=<personal-access-token>
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
WORKSPACE_ROOT=/path/to/your/repo
PORT=8080

Azure Anthropic Setup

MODEL_PROVIDER=azure-anthropic
AZURE_ANTHROPIC_ENDPOINT=https://<resource>.services.ai.azure.com/anthropic/v1/messages
AZURE_ANTHROPIC_API_KEY=<api-key>
AZURE_ANTHROPIC_VERSION=2023-06-01
WORKSPACE_ROOT=/path/to/repo
PORT=8080

Azure OpenAI Setup

MODEL_PROVIDER=azure-openai
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_API_KEY=<api-key>
AZURE_OPENAI_DEPLOYMENT=gpt-4o
PORT=8080

OpenRouter Setup

What is OpenRouter?

OpenRouter provides unified access to 100+ AI models (GPT-4o, Claude, Gemini, Llama, etc.) through a single API. Benefits:

Configuration:

MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-...                                    # Get from https://openrouter.ai/keys
OPENROUTER_MODEL=openai/gpt-4o-mini                                # See https://openrouter.ai/models
OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
PORT=8080
WORKSPACE_ROOT=/path/to/your/repo

Popular Models:

See https://openrouter.ai/models for complete list.

Getting Started:

  1. Visit https://openrouter.ai
  2. Sign in with GitHub/Google/email
  3. Create API key at https://openrouter.ai/keys
  4. Add credits (minimum $5)
  5. Configure Lynkr as shown above

llama.cpp Setup

What is llama.cpp?

llama.cpp is a high-performance C++ inference engine for running GGUF models locally. Benefits:

Configuration:

MODEL_PROVIDER=llamacpp
LLAMACPP_ENDPOINT=http://localhost:8080    # llama-server default port
LLAMACPP_MODEL=qwen2.5-coder-7b            # Model name (for logging)
LLAMACPP_TIMEOUT_MS=120000                 # Request timeout
PORT=8080
WORKSPACE_ROOT=/path/to/your/repo

Setup Steps:

# 1. Build llama.cpp (or download pre-built binary)
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# 2. Download a GGUF model (example: Qwen2.5-Coder)
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf

# 3. Start llama-server
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080

# 4. Verify server is running
curl http://localhost:8080/health

llama.cpp vs Ollama:

Feature Ollama llama.cpp
Setup Easy (app) Manual (compile/download)
Model Format Ollama-specific Any GGUF model
Performance Good Excellent
Memory Usage Higher Lower (quantization)
API Custom OpenAI-compatible
Flexibility Limited models Any GGUF from HuggingFace

Choose llama.cpp when you need maximum performance, specific quantization options, or GGUF models not available in Ollama.


๐Ÿ’ฌ Using Lynkr With Claude Code CLI

export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=dummy

Then:

claude chat
claude diff
claude review
claude apply

Everything routes through your configured model provider (Databricks, Azure, OpenRouter, Ollama, or llama.cpp).


๐Ÿง  Repo Intelligence & Indexing

Lynkr uses Tree-sitter and SQLite to analyze your workspace:

It generates a structured CLAUDE.md so the model always has context.


๐Ÿง  Long-Term Memory System (Titans-Inspired)

Lynkr includes a sophisticated long-term memory system inspired by Googleโ€™s Titans architecture, enabling persistent learning across conversations without model retraining.

How It Works

The memory system automatically:

  1. Extracts important information from conversations (preferences, decisions, facts, entities, relationships)
  2. Filters using surprise-based scoring to store only novel/important information
  3. Retrieves relevant memories using multi-signal ranking (recency + importance + relevance)
  4. Injects top memories into each request for contextual continuity

Key Features

๐ŸŽฏ Surprise-Based Memory Updates (Titans Core Innovation)

Memories are scored 0.0-1.0 based on five factors:

Only memories exceeding the surprise threshold (default 0.3) are stored, preventing redundancy.

Uses SQLiteโ€™s full-text search with Porter stemming for keyword-based semantic search:

๐Ÿ“Š Multi-Signal Retrieval

Ranks memories using weighted combination:

๐Ÿ—‚๏ธ Memory Types

Configuration

All features are enabled by default with sensible defaults:

# Core Settings
MEMORY_ENABLED=true                    # Master switch
MEMORY_RETRIEVAL_LIMIT=5               # Memories per request
MEMORY_SURPRISE_THRESHOLD=0.3          # Novelty filter (0.0-1.0)

# Lifecycle Management
MEMORY_MAX_AGE_DAYS=90                 # Auto-delete old memories
MEMORY_MAX_COUNT=10000                 # Maximum total memories
MEMORY_DECAY_ENABLED=true              # Enable importance decay
MEMORY_DECAY_HALF_LIFE=30              # Days for 50% importance decay

# Retrieval Behavior
MEMORY_INCLUDE_GLOBAL=true             # Include cross-session memories
MEMORY_INJECTION_FORMAT=system         # Where to inject (system/assistant_preamble)
MEMORY_EXTRACTION_ENABLED=true         # Auto-extract from responses

Performance

Exceeds all targets:

Example Usage

The system works automatically - no manual intervention needed:

# First conversation
User: "I prefer Python for data processing"
Assistant: "I'll remember that you prefer Python..."
# System extracts: [preference] "prefer Python for data processing" (surprise: 0.85)

# Later conversation (same or different session)
User: "Write a script to process this CSV"
# System retrieves: [preference] "prefer Python for data processing"
Assistant: "I'll write a Python script using pandas..."

Database Tables

Memory Tools (Optional)

Explicit memory management tools available:

Enable by exposing tools to the model (configurable in orchestrator).


โšก Prompt Caching

Lynkr includes an LRU+TTL prompt cache.

Benefits:

Configure:

PROMPT_CACHE_ENABLED=true
PROMPT_CACHE_TTL_MS=300000
PROMPT_CACHE_MAX_ENTRIES=64

๐Ÿงฉ Model Context Protocol (MCP)

Lynkr automatically discovers MCP manifests from:

~/.claude/mcp

or directories defined via:

MCP_MANIFEST_DIRS

MCP tools become available inside the Claude Code environment, including:

Optional sandboxing uses Docker or OCI runtimes.


๐Ÿ”ง Git Tools

Lynkr includes a full suite of Git operations:

Policies:

Example:

Disallow push unless tests pass? Set POLICY_GIT_REQUIRE_TESTS=true.


๐Ÿ”„ Client-Side Tool Execution (Passthrough Mode)

Lynkr supports client-side tool execution, enabling tools to execute on the Claude Code CLI machine instead of the proxy server.

Enable passthrough mode:

export TOOL_EXECUTION_MODE=client
npm start

How it works:

  1. Model generates tool calls (from Databricks/OpenRouter/Ollama/llama.cpp)
  2. Proxy converts to Anthropic format with tool_use blocks
  3. Claude Code CLI receives tool_use blocks and executes locally
  4. CLI sends tool_result blocks back in the next request
  5. Proxy forwards complete conversation back to the model

Benefits:

Use cases:

Configuration:


๐Ÿงช API Example (Index Rebuild)

curl http://localhost:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-proxy",
    "messages": [{ "role": "user", "content": "Rebuild the index." }],
    "tool_choice": {
      "type": "function",
      "function": { "name": "workspace_index_rebuild" }
    }
  }'

๐Ÿค– ACE Framework Working Nature

Lynkrโ€™s agentic architecture is inspired by the Autonomous Cognitive Entity (ACE) Framework, specifically implementing the Reflector pattern to enable self-improving capabilities.

The Agentic Loop

  1. Input Processing: The Orchestrator receives natural language intent from the user.
  2. Execution (Agent Model): The system executes tools (Git, Search, File Ops) to achieve the goal.
  3. Reflection (Reflector Role): After execution types, the Reflector agent analyzes the transcript to extract โ€œskillsโ€ and optimize future performance.

The Reflector

The Reflector (src/agents/reflector.js) is an introspective component that analyzes:

This โ€œworking natureโ€ allows Lynkr to not just execute commands, but to learn from interaction, continuously refining its internal heuristics for tool selection and planning.


๐Ÿ›ฃ Roadmap

โœ… Recently Completed (December 2025)

๐Ÿ”ฎ Future Features


๐Ÿ“š References & Further Reading

Academic & Technical Resources

Agentic AI Systems:

Long-Term Memory & RAG:

Official Documentation


๐ŸŒŸ Community & Adoption

Get Involved

โญ Star this repository to show your support and help others discover Lynkr!

GitHub stars

Support & Resources

Share Lynkr

Help spread the word about Lynkr:

Why Developers Choose Lynkr


๐Ÿ”— Links


๐Ÿš€ Ready to Get Started?

Reduce your Claude Code costs by 60-80% today:

  1. โญ Star this repo to show support and stay updated
  2. ๐Ÿ“– Install Lynkr and configure your preferred provider
  3. ๐Ÿ’ฌ Join the Discussion for community support
  4. ๐Ÿ› Report Issues to help improve Lynkr

If you use Databricks, Azure Anthropic, OpenRouter, Ollama, or llama.cpp and want rich Claude Code workflows with massive cost savings, Lynkr gives you the control, flexibility, and extensibility you need.

Feel free to open issues, contribute tools, integrate with MCP servers, or help us improve the documentation!