🔬 Key Innovation - Single-Agent KG Reasoning:
- ✅ Single-Agent Architecture: Replaces complex multi-module workflows with unified LLM agent
- ✅ Schema-Agnostic KG Server: Works across different knowledge graphs (Freebase, Wikidata, Temporal KGs)
- ✅ Cross-KG Transferability: Plug-and-play capability - train once, transfer anywhere
- ✅ Efficiency Gains: ~83% of tokens from KG retrieval, only ~13% from reasoning generation
- ✅ GRPO Training: Group Relative Policy Optimization for stable multi-turn learning
⚡ System Architecture:
- KG Retreival Server: 4 basic operations (get_tail_relations, get_head_relations, get_tail_entities, get_head_entities)
- Single Agent: Unified reasoning and retrieval in one LLM with special tokens
- Multi-turn Interaction: Up to 7 turns of KG exploration per question
- Lightweight Design: No separate retriever, reranker, or planning modules
📊 Performance Results:
- CWQ Dataset: Improved performance over vanilla baselines and prior KG-RAG methods
- WebQSP Dataset: Strong performance with cross-dataset transferability
- Efficiency Gains: ~1680x computational cost vs naive calculation, but highly effective
- Transferability: Models trained on one KG work on different KG schemas
🚀 Current Status: KG-R1 system with comprehensive evaluation framework and LLM-as-judge factuality evaluation for Knowledge Graph Question Answering.
Answer the given question. You can interact with the knowledge graph through the following actions:
- get_tail_relations(entity): Get relations where entity is the subject
- get_head_relations(entity): Get relations where entity is the object
- get_tail_entities(entity, relation): Get objects for entity-relation pairs
- get_head_entities(entity, relation): Get subjects for relation-entity pairs
Use <search>action_name(arguments)</search> to query the KG. Results appear in <information></information>.
Reason with <think></think> tags. Provide final answer in <answer></answer> tags.
Question: {question}
Base URL: http://127.0.0.1:8001/retrieve
Core Operations:
- get_tail_relations(entity): Find all relations where entity is the head/subject
- get_head_relations(entity): Find all relations where entity is the tail/object
- get_tail_entities(entity, relation): Get tail entities for head-relation pairs
- get_head_entities(entity, relation): Get head entities for relation-tail pairs
KG-R1 enables iterative exploration:
- Initial Question Analysis → Identify key entities
- KG Exploration → Multi-turn relation and entity discovery (up to 7 turns)
- Answer Synthesis → Combine retrieved knowledge for final answer
Semantic factuality evaluation using GPT-based judge for accurate answer assessment beyond exact string matching.
KG-R1 extends the Search-R1 framework to knowledge graph-augmented reasoning, replacing traditional document retrieval with structured knowledge graph operations. This creates a more efficient and transferable agentic KG-RAG system.
Built upon veRL, KG-R1 provides a unified single-agent architecture that learns to reason and interact with knowledge graphs through reinforcement learning. The system achieves both computational efficiency (~83% tokens from KG retrieval vs ~13% from reasoning) and cross-KG transferability.
We support different RL methods (PPO, GRPO), different LLMs (Qwen2.5, Llama3, etc), and different knowledge graph schemas (Freebase, Wikidata, temporal KGs) with a plug-and-play design.
Paper: link1, link2; Model and data: link; Twitter thread: link; Full experiment log: prelim; v0.1; v0.2; v0.3. Details about these logs and methods can be find here.
Key Innovation: KG-R1 replaces complex multi-module workflows with a single unified agent that learns to reason and retrieve through reinforcement learning, achieving both efficiency and transferability across different knowledge graph schemas.
- [2025.11] Implemented cross-KG transferability testing on multiple knowledge graphs
- [2025.11] Released KG-R1 codebase with GRPO training and multi-turn KG reasoning
- [2025.11] Added comprehensive evaluation framework with LLM-as-judge for factuality assessment
- [2025.11] Developed schema-agnostic KG server with 4 basic operations
- Installation
- Quick start
- KG-R1 Results
- Inference
- Use your own dataset
- Use your own knowledge graph
- Features
- Acknowledge
- Citations
conda create -n kgr1 python=3.9
conda activate kgr1
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
# verl
pip install -e .
# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb
# Additional dependencies for KG processing
pip install fastapi uvicorn requests aiohttp
pip install networkx # for knowledge graph operationsThe KG-R1 system requires a knowledge graph server for retrieval operations.
conda create -n kg_server python=3.10
conda activate kg_server
# Core dependencies for KG server
pip install fastapi uvicorn pydantic requests
pip install transformers datasets huggingface_hub
pip install networkx pandas pyarrow
# For efficient KG processing
pip install numpy scipyTrain a KG-R1 agent on ComplexWebQuestions (CWQ) dataset using GRPO (Group Relative Policy Optimization). See Figure 2 for the multi-turn interaction process.
Set up conda environment and download datasets
(1) Create and activate the KG-R1 environment
conda create -n kgr1 python=3.9
conda activate kgr1
# Install PyTorch with CUDA support
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# Install vLLM for efficient inference
pip3 install vllm==0.6.3
# Install veRL framework
pip install -e .
# Install flash attention for efficient training
pip3 install flash-attn --no-build-isolation
pip install wandb
# Additional dependencies for KG processing
pip install fastapi uvicorn requests aiohttp networkx(2) Initialize datasets using initialize.py
# This script downloads and prepares KG-QA datasets
python initialize.py
# This creates the data_kg/ directory with:
# - CWQ: ComplexWebQuestions dataset with Freebase knowledge subgraphs
# - WebQSP: WebQuestionsSP dataset with Freebase knowledge subgraphs
# - Search-augmented initial entities for both datasetsThe data_kg/ directory structure after initialization:
data_kg/
├── CWQ/ # CWQ dataset files
│ ├── entities.txt, relations.txt # KG vocabulary
│ ├── train_simple.json, dev_simple.json # QA pairs
│ └── word_emb_300d.npy # Entity embeddings
├── cwq_search_augmented_initial_entities/ # Processed CWQ data
│ ├── train.parquet, dev.parquet, test.parquet
├── webqsp/ # WebQSP dataset files
│ ├── entities.txt, relations.txt
│ ├── train_simple.json, test_simple.json
│ └── word_emb_300d.npy
└── webqsp_search_augmented_initial_entities/ # Processed WebQSP data
├── train.parquet, test.parquet
(3) Create KG server environment (optional but recommended)
conda create -n kg_server python=3.10
conda activate kg_server
# Core dependencies for KG server
pip install fastapi uvicorn pydantic requests
pip install transformers datasets huggingface_hub
pip install networkx pandas pyarrow numpy scipyTrain KG-R1 agent using reinforcement learning
Note: We will provide the HuggingFace model weights pretrained (backbone: Qwen2.5-3B) later.
(1) Launch the KG retrieval server
conda activate kg_server
# Start KG server on port 8001 (provides 4 basic KG operations)
python kg_r1/search/server.py --port 8001 --data_dir data_kg(2) Run KG-R1 training with GRPO
conda activate kgr1
# Train Qwen2.5-3B with 7-turn KG reasoning on CWQ
bash train_grpo_kg_qwen_3b_cwq_f1_turn7.sh
# Or train on WebQSP:
# bash train_grpo_kg_qwen_3b_webqsp_f1_turn7.shTraining configurations available:
train_grpo_kg_qwen_3b_cwq_f1_turn5.sh- CWQ with 5 turnstrain_grpo_kg_qwen_3b_cwq_f1_turn7.sh- CWQ with 7 turns (recommended)train_grpo_kg_qwen_3b_webqsp_f1_turn7.sh- WebQSP with 7 turns
Expected training time: ~8-12 hours on 4x A100 GPUs for full training
Expected Results: 70.9 F1 / 73.8 Hit@1 on CWQ with single 3B model (see Performance Results)
Two inference options: (1) Local checkpoint or (2) HuggingFace models
Evaluate your locally trained model:
# (1) Launch the KG retrieval server
conda activate kg_server
python kg_r1/search/server.py --port 8001 --data_dir data_kg
# (2) Run inference with your trained checkpoint
conda activate kgr1
bash eval_scripts/kg_r1_eval_main/eval_qwen_3b_cwq_f1_turn7_local.sh \
/path/to/your/checkpoint \
cwq # or webqsp
# Example:
# bash eval_scripts/kg_r1_eval_main/eval_qwen_3b_cwq_f1_turn7_local.sh \
# verl_checkpoints/cwq-KG-r1-grpo-qwen2.5-3b-it_f1_turn7 \
# cwqEvaluate pre-trained models directly from HuggingFace (no local training needed):
# (1) Launch the KG retrieval server
conda activate kg_server
python kg_r1/search/server.py --port 8001 --data_dir data_kg
# (2) Run inference with HuggingFace model
conda activate kgr1
bash eval_scripts/kg_r1_eval_main/eval_qwen_3b_cwq_f1_turn7_hf.sh \
JinyeopSong/KG-R1_test \ # Specify HF model
cwq # Dataset to evaluate
# Or simply use defaults:
# bash eval_scripts/kg_r1_eval_main/eval_qwen_3b_cwq_f1_turn7_hf.shAvailable Evaluation Scripts:
eval_qwen_3b_cwq_f1_turn7_local.sh- Evaluate local checkpointeval_qwen_3b_cwq_f1_turn7_hf.sh- Evaluate HuggingFace model
Available HuggingFace Models:
JinyeopSong/KG-R1_test- Testing model (placeholder)- More models coming soon! (Links not ready yet)
Inference outputs:
- Pass@K evaluation results (K=1,2,3,4)
- Detailed reasoning traces with KG exploration steps
- Exact match and F1 scores
- Per-sample analysis in JSONL format
Main Results on WebQSP and CWQ:
| Method | Model | Modules | WebQSP F1/Hit@1 | CWQ F1/Hit@1 | Efficiency (Total/Gen) |
|---|---|---|---|---|---|
| Vanilla | Qwen2.5-3B-it | 1 | 29.4 / 46.6 | 16.6 / 21.1 | 95-104 / 30-42 |
| COT | Qwen2.5-3B-it | 1 | 30.6 / 47.6 | 17.3 / 21.4 | 131-140 / 192-216 |
| RoG | LLaMA2-7B-it | 2 | 70.8 / 85.7 | 56.2 / 62.6 | 1.1-1.2K / 266-295 |
| ToG 2.0 | GPT-3.5 | 5 | 74.5 / 77.8 | 65.8 / 68.9 | 3.8-39K / 605-650 |
| ReKnoS | GPT-4o-mini | 3 | 73.7 / 81.1 | 64.7 / 66.8 | 3.1-4.1K / 617-752 |
| 🔥 KG-R1 (1 run) | Qwen2.5-3B-it | 1 | 77.5 / 84.7 | 70.9 / 73.8 | 3.2-3.3K / 302-377 |
| 🔥 KG-R1 (3 runs) | Qwen2.5-3B-it | 1 | 85.8 / 91.7 | 81.0 / 83.9 | 9.7-10K / 906-1.1K |
Zero-shot transfer across different KG schemas (no retraining required):
| Training KG | SimpleQA | GrailQA | T-REx | QALD-10en | MultiTQ | Average |
|---|---|---|---|---|---|---|
| Vanilla Baseline | 13.7 / 13.7 | 15.9 / 15.9 | 24.4 / 24.4 | 23.8 / 23.8 | 2.2 / 5.4 | 19.4 / 19.8 |
| KG-R1 (WebQSP) | 59.1 / 59.1 | 32.8 / 38.5 | 80.5 / 84.5 | 51.9 / 53.4 | 21.6 / 31.4 | 64.0 / 68.3 |
| KG-R1 (CWQ) | 64.6 / 64.7 | 42.8 / 50.2 | 81.3 / 85.6 | 55.9 / 57.7 | 27.1 / 38.9 | 67.2 / 72.1 |
| KG-R1 (3 runs) | 73.1 / 73.1 | 52.8 / 61.0 | 86.8 / 91.5 | 63.9 / 65.5 | 36.2 / 48.4 | 74.1 / 79.4 |
- 🎯 Strong Performance: Competitive results on CWQ and WebQSP benchmarks
- ⚡ Computational Efficiency: Single-agent vs multi-module workflows (1 vs 2-5 modules)
- 🔄 Cross-KG Transfer: 64-74% average F1 across 5 different KG schemas
- 💡 Training Efficiency: 3B model achieves competitive performance
(1) Launch the KG retrieval server.
conda activate kg_server
python kg_r1/search/server.py --port 8001 --data_dir data_kg(2) Run KG-R1 inference.
conda activate kgr1
python infer_kg_r1.py --checkpoint verl_checkpoints/your_trained_modelYou can modify the question parameter to test different knowledge graph questions. The model will interactively explore the KG using the 4 basic operations and provide reasoning traces.
For each knowledge graph question-answer sample, it should be a dictionary containing:
data = {
"data_source": "your_kg_dataset",
"original_query": question,
"target_text": answer,
"query_entities": ["entity1", "entity2"], # Initial entities
"query_id": unique_id,
"split": "train/test/dev"
}Your knowledge graph should provide the following structure:
# Entity-relation-entity triples
kg_data = {
"entities": {"entity_id": "human_readable_name"},
"relations": {"relation_id": "human_readable_name"},
"triples": [
["head_entity_id", "relation_id", "tail_entity_id"],
# ... more triples
]
}You can refer to scripts/data_kg/process_datasets.py for concrete data processing examples for CWQ and WebQSP datasets.
To use your own knowledge graph, you need to set up the KG server with your data:
- Prepare your KG data in the required format (see above)
- Start the KG server with your data directory:
# Your KG data should be organized as:
# your_kg_data/
# ├── entities.json
# ├── relations.json
# ├── train_simple.json
# └── test_simple.json
python kg_r1/search/server.py --port 8001 --data_dir your_kg_data- Configure your training script to point to your KG server:
# In your training script, update:
actor_rollout_ref.rollout.search.search_url="http://127.0.0.1:8001/retrieve"The KG server supports the 4 basic operations:
get_tail_relations(entity): Find relations where entity is the subjectget_head_relations(entity): Find relations where entity is the objectget_tail_entities(entity, relation): Get tail entities for head-relation pairsget_head_entities(entity, relation): Get head entities for relation-tail pairs
KG-R1 supports different types of knowledge graphs with a schema-agnostic design. The system works with:
- Freebase-style KGs: Entity-centric with rich relations
- Wikidata KGs: Property-based knowledge representation
- Temporal KGs: Time-aware knowledge graphs
- Domain-specific KGs: Custom knowledge graphs for specific domains
The main philosophy is to launch a KG server separately from the RL training pipeline, providing a clean API interface.
The LLM agent calls the KG server through the search API at http://127.0.0.1:8001/retrieve.
You can refer to kg_r1/search/server.py for the complete KG server implementation, which includes:
- FastAPI server: RESTful API for KG operations
- Concurrent processing: ThreadPoolExecutor for handling multiple requests
- Action routing: Dispatches requests to appropriate KG operations
- Error handling: Robust error handling for malformed queries
KG-R1's key advantage is cross-KG transferability. Models trained on one KG can transfer to different KG schemas without retraining, enabling plug-and-play usage.
If you use KG-R1 in your research, citation information will be provided upon publication.

