DWDW2
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 22 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 25 additions & 25 deletions b/‎README.md‎
Lines changed: 25 additions & 25 deletions
diff --git a/‎examples/batch_eval.py‎
Lines changed: 17 additions & 21 deletions b/‎examples/batch_eval.py‎
Lines changed: 17 additions & 21 deletions
diff --git a/‎examples/generate_query.py‎
Lines changed: 4 additions & 5 deletions b/‎examples/generate_query.py‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎examples/lightrag_azure_openai_demo.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/lightrag_azure_openai_demo.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/lightrag_bedrock_demo.py‎
Lines changed: 4 additions & 9 deletions b/‎examples/lightrag_bedrock_demo.py‎
Lines changed: 4 additions & 9 deletions
@@ -1,4 +1,5 @@
 __pycache__
 *.egg-info
 dickens/
-book.txt
+book.txt
+lightrag-dev/
@@ -0,0 +1,22 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: requirements-txt-fixer
+
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.6.4
+    hooks:
+      - id: ruff-format
+      - id: ruff
+        args: [--fix]
+
+
+  - repo: https://github.com/mgedmin/check-manifest
+    rev: "0.49"
+    hooks:
+      - id: check-manifest
+        stages: [manual]
@@ -16,16 +16,16 @@
         <a href="https://pypi.org/project/lightrag-hku/"><img src="https://img.shields.io/pypi/v/lightrag-hku.svg"></a>
         <a href="https://pepy.tech/project/lightrag-hku"><img src="https://static.pepy.tech/badge/lightrag-hku/month"></a>
     </p>
-    
+
 This repository hosts the code of LightRAG. The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
 ![请添加图片描述](https://i-blog.csdnimg.cn/direct/b2aaf634151b4706892693ffb43d9093.png)
 </div>
 
-## 🎉 News 
+## 🎉 News
 - [x] [2024.10.18]🎯🎯📢📢We’ve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author!
 - [x] [2024.10.17]🎯🎯📢📢We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! 🎉🎉
-- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)! 
-- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)! 
+- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
+- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
 
 ## Install
 
@@ -92,7 +92,7 @@ print(rag.query("What are the top themes in this story?", param=QueryParam(mode=
 <details>
 <summary> Using Open AI-like APIs </summary>
 
-LightRAG also support Open AI-like chat/embeddings APIs:
+LightRAG also supports Open AI-like chat/embeddings APIs:
 ```python
 async def llm_model_func(
     prompt, system_prompt=None, history_messages=[], **kwargs
@@ -129,7 +129,7 @@ rag = LightRAG(
 
 <details>
 <summary> Using Hugging Face Models </summary>
-     
+
 If you want to use Hugging Face models, you only need to set LightRAG as follows:
 ```python
 from lightrag.llm import hf_model_complete, hf_embedding
@@ -145,7 +145,7 @@ rag = LightRAG(
         embedding_dim=384,
         max_token_size=5000,
         func=lambda texts: hf_embedding(
-            texts, 
+            texts,
             tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
             embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
         )
@@ -157,7 +157,7 @@ rag = LightRAG(
 <details>
 <summary> Using Ollama Models </summary>
 If you want to use Ollama models, you only need to set LightRAG as follows:
-     
+
 ```python
 from lightrag.llm import ollama_model_complete, ollama_embedding
 
@@ -171,7 +171,7 @@ rag = LightRAG(
         embedding_dim=768,
         max_token_size=8192,
         func=lambda texts: ollama_embedding(
-            texts, 
+            texts,
             embed_model="nomic-embed-text"
         )
     ),
@@ -196,14 +196,14 @@ with open("./newText.txt") as f:
 ```
 ## Evaluation
 ### Dataset
-The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
+The dataset used in LightRAG can be downloaded from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
 
 ### Generate Query
-LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
+LightRAG uses the following prompt to generate high-level queries, with the corresponding code in `example/generate_query.py`.
 
 <details>
 <summary> Prompt </summary>
-     
+
 ```python
 Given the following description of a dataset:
 
@@ -228,18 +228,18 @@ Output the results in the following structure:
     ...
 ```
 </details>
- 
+
  ### Batch Eval
 To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
 
 <details>
 <summary> Prompt </summary>
-     
+
 ```python
 ---Role---
 You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
 ---Goal---
-You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**. 
+You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
 
 - **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
 - **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
@@ -303,15 +303,15 @@ Output your evaluation in the following JSON format:
 | **Empowerment**       | 36.69%                  | **63.31%**             | 45.09%                | **54.91%**             | 42.81%                | **57.19%**             | **52.94%**            | 47.06%                |
 | **Overall**           | 43.62%                  | **56.38%**             | 45.98%                | **54.02%**             | 45.70%                | **54.30%**             | **51.86%**            | 48.14%                |
 
-## Reproduce 
+## Reproduce
 All the code can be found in the `./reproduce` directory.
 
 ### Step-0 Extract Unique Contexts
 First, we need to extract unique contexts in the datasets.
 
 <details>
 <summary> Code </summary>
-     
+
 ```python
 def extract_unique_contexts(input_directory, output_directory):
 
@@ -370,12 +370,12 @@ For the extracted contexts, we insert them into the LightRAG system.
 
 <details>
 <summary> Code </summary>
-     
+
 ```python
 def insert_text(rag, file_path):
     with open(file_path, mode='r') as f:
         unique_contexts = json.load(f)
-    
+
     retries = 0
     max_retries = 3
     while retries < max_retries:
@@ -393,11 +393,11 @@ def insert_text(rag, file_path):
 
 ### Step-2 Generate Queries
 
-We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
+We extract tokens from the first and the second half of each context in the dataset, then combine them as dataset descriptions to generate queries.
 
 <details>
 <summary> Code </summary>
-     
+
 ```python
 tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
 
@@ -410,7 +410,7 @@ def get_summary(context, tot_tokens=2000):
 
     summary_tokens = start_tokens + end_tokens
     summary = tokenizer.convert_tokens_to_string(summary_tokens)
-    
+
     return summary
 ```
 </details>
@@ -420,12 +420,12 @@ For the queries generated in Step-2, we will extract them and query LightRAG.
 
 <details>
 <summary> Code </summary>
-     
+
 ```python
 def extract_queries(file_path):
     with open(file_path, 'r') as f:
         data = f.read()
-    
+
     data = data.replace('**', '')
 
     queries = re.findall(r'- Question \d+: (.+)', data)
@@ -479,7 +479,7 @@ def extract_queries(file_path):
 
 ```python
 @article{guo2024lightrag,
-title={LightRAG: Simple and Fast Retrieval-Augmented Generation}, 
+title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
 author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
 year={2024},
 eprint={2410.05779},
 
@@ -1,4 +1,3 @@
-import os
 import re
 import json
 import jsonlines
@@ -9,28 +8,28 @@
 def batch_eval(query_file, result1_file, result2_file, output_file_path):
     client = OpenAI()
 
-    with open(query_file, 'r') as f:
+    with open(query_file, "r") as f:
         data = f.read()
 
-    queries = re.findall(r'- Question \d+: (.+)', data)
+    queries = re.findall(r"- Question \d+: (.+)", data)
 
-    with open(result1_file, 'r') as f:
+    with open(result1_file, "r") as f:
         answers1 = json.load(f)
-    answers1 = [i['result'] for i in answers1]
+    answers1 = [i["result"] for i in answers1]
 
-    with open(result2_file, 'r') as f:
+    with open(result2_file, "r") as f:
         answers2 = json.load(f)
-    answers2 = [i['result'] for i in answers2]
+    answers2 = [i["result"] for i in answers2]
 
     requests = []
     for i, (query, answer1, answer2) in enumerate(zip(queries, answers1, answers2)):
-        sys_prompt = f"""
+        sys_prompt = """
         ---Role---
         You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
         """
 
         prompt = f"""
-        You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**. 
+        You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
 
         - **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
         - **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
@@ -69,7 +68,6 @@ def batch_eval(query_file, result1_file, result2_file, output_file_path):
         }}
         """
 
-        
         request_data = {
             "custom_id": f"request-{i+1}",
             "method": "POST",
@@ -78,35 +76,33 @@ def batch_eval(query_file, result1_file, result2_file, output_file_path):
                 "model": "gpt-4o-mini",
                 "messages": [
                     {"role": "system", "content": sys_prompt},
-                    {"role": "user", "content": prompt}
+                    {"role": "user", "content": prompt},
                 ],
-            }
+            },
         }
-        
+
         requests.append(request_data)
 
-    with jsonlines.open(output_file_path, mode='w') as writer:
+    with jsonlines.open(output_file_path, mode="w") as writer:
         for request in requests:
             writer.write(request)
 
     print(f"Batch API requests written to {output_file_path}")
 
     batch_input_file = client.files.create(
-        file=open(output_file_path, "rb"),
-        purpose="batch"
+        file=open(output_file_path, "rb"), purpose="batch"
     )
     batch_input_file_id = batch_input_file.id
 
     batch = client.batches.create(
         input_file_id=batch_input_file_id,
         endpoint="/v1/chat/completions",
         completion_window="24h",
-        metadata={
-            "description": "nightly eval job"
-        }
+        metadata={"description": "nightly eval job"},
     )
 
-    print(f'Batch {batch.id} has been created.')
+    print(f"Batch {batch.id} has been created.")
+
 
 if __name__ == "__main__":
-    batch_eval()
+    batch_eval()
@@ -1,9 +1,8 @@
-import os
-
 from openai import OpenAI
 
 # os.environ["OPENAI_API_KEY"] = ""
 
+
 def openai_complete_if_cache(
     model="gpt-4o-mini", prompt=None, system_prompt=None, history_messages=[], **kwargs
 ) -> str:
@@ -47,10 +46,10 @@ def openai_complete_if_cache(
         ...
     """
 
-    result = openai_complete_if_cache(model='gpt-4o-mini', prompt=prompt)
+    result = openai_complete_if_cache(model="gpt-4o-mini", prompt=prompt)
 
-    file_path = f"./queries.txt"
+    file_path = "./queries.txt"
     with open(file_path, "w") as file:
         file.write(result)
 
-    print(f"Queries written to {file_path}")
+    print(f"Queries written to {file_path}")
@@ -122,4 +122,4 @@ async def test_funcs():
 print(rag.query(query_text, param=QueryParam(mode="global")))
 
 print("\nResult (Hybrid):")
-print(rag.query(query_text, param=QueryParam(mode="hybrid")))
+print(rag.query(query_text, param=QueryParam(mode="hybrid")))
@@ -20,22 +20,17 @@
     llm_model_func=bedrock_complete,
     llm_model_name="Anthropic Claude 3 Haiku // Amazon Bedrock",
     embedding_func=EmbeddingFunc(
-        embedding_dim=1024,
-        max_token_size=8192,
-        func=bedrock_embedding
-    )
+        embedding_dim=1024, max_token_size=8192, func=bedrock_embedding
+    ),
 )
 
-with open("./book.txt", 'r', encoding='utf-8') as f:
+with open("./book.txt", "r", encoding="utf-8") as f:
     rag.insert(f.read())
 
 for mode in ["naive", "local", "global", "hybrid"]:
     print("\n+-" + "-" * len(mode) + "-+")
     print(f"| {mode.capitalize()} |")
     print("+-" + "-" * len(mode) + "-+\n")
     print(
-        rag.query(
-            "What are the top themes in this story?",
-            param=QueryParam(mode=mode)
-        )
+        rag.query("What are the top themes in this story?", param=QueryParam(mode=mode))
     )