Skip to content

Commit f82f7ba

Browse files
committed
Improve example documentation
- Add README - Improve Godoc
1 parent 8dc1a7f commit f82f7ba

File tree

3 files changed

+190
-4
lines changed

3 files changed

+190
-4
lines changed

‎example/README.md‎

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Example
2+
3+
This example shows a retrieval augmented generation (RAG) application, using `chromem-go` as knowledge base for finding relevant info for a question.
4+
5+
We run the embeddings model and LLM in [Ollama](https://github.com/ollama/ollama), to showcase how a RAG application can run entirely offline, without relying on OpenAI or other third party APIs. It doesn't require a GPU, and a CPU like an 11th Gen Intel i5-1135G7 (like in the first generation Framework Laptop 13) is fast enough.
6+
7+
As LLM we use Google's [Gemma (2B)](https://huggingface.co/google/gemma-2b), a very small model that doesn't need much resources and is fast, but doesn't have much knowledge, so it's a prime example for the combination of LLMs and vector databases. We found Gemma 2B to be superior to [TinyLlama (1.1B)](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), [Stable LM 2 (1.6B)](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b) and [Phi-2 (2.7B)](https://huggingface.co/microsoft/phi-2) for the RAG use case.
8+
9+
As embeddings model we use Nomic's [nomic-embed-text v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5).
10+
11+
## How to run
12+
13+
1. Install Ollama: <https://ollama.com/download>
14+
2. Download the two models:
15+
- `ollama pull gemma:2b`
16+
- `ollama pull nomic-embed-text`
17+
3. Run the example: `go run .`
18+
19+
## Output
20+
21+
The output can differ slightly on each run, but it's along the lines of:
22+
23+
```log
24+
2024/03/02 20:02:30 Warming up Ollama...
25+
2024/03/02 20:02:33 Question: When did the Monarch Company exist?
26+
2024/03/02 20:02:33 Asking LLM...
27+
2024/03/02 20:02:34 Initial reply from the LLM: "I cannot provide information on the Monarch Company, as I am unable to access real-time or comprehensive knowledge sources."
28+
2024/03/02 20:02:34 Setting up chromem-go...
29+
2024/03/02 20:02:34 Reading JSON lines...
30+
2024/03/02 20:02:34 Adding documents to chromem-go, including creating their embeddings via Ollama API...
31+
2024/03/02 20:03:11 Querying chromem-go...
32+
2024/03/02 20:03:11 Document 1 (similarity: 0.723627): "Malleable Iron Range Company was a company that existed from 1896 to 1985 and primarily produced kitchen ranges made of malleable iron but also produced a variety of other related products. The company's primary trademark was 'Monarch' and was colloquially often referred to as the Monarch Company or just Monarch."
33+
2024/03/02 20:03:11 Document 2 (similarity: 0.550584): "The American Motor Car Company was a short-lived company in the automotive industry founded in 1906 lasting until 1913. It was based in Indianapolis Indiana United States. The American Motor Car Company pioneered the underslung design."
34+
2024/03/02 20:03:11 Asking LLM with augmented question...
35+
2024/03/02 20:03:32 Reply after augmenting the question with knowledge: "The Monarch Company existed from 1896 to 1985."
36+
```
37+
38+
The majority of the time here is spent during the embeddings creation as well as the LLM conversation, which are not part of `chromem-go`.
39+
40+
## OpenAI
41+
42+
You can easily adapt the code to work with OpenAI instead of locally in Ollama.
43+
44+
Add the OpenAI API key in your environment as `OPENAI_API_KEY`.
45+
46+
Then, if you want to create the embeddings via OpenAI, but still use Gemma 2B as LLM:
47+
48+
<details><summary>Apply this patch</summary>
49+
50+
```diff
51+
diff --git a/example/main.go b/example/main.go
52+
index 55b3076..cee9561 100644
53+
--- a/example/main.go
54+
+++ b/example/main.go
55+
@@ -14,8 +14,6 @@ import (
56+
57+
const (
58+
question = "When did the Monarch Company exist?"
59+
- // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
60+
- embeddingModel = "nomic-embed-text"
61+
)
62+
63+
func main() {
64+
@@ -48,7 +46,7 @@ func main() {
65+
// variable to be set.
66+
// For this example we choose to use a locally running embedding model though.
67+
// It requires Ollama to serve its API at "http://localhost:11434/api".
68+
- collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel))
69+
+ collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil)
70+
if err != nil {
71+
panic(err)
72+
}
73+
@@ -82,7 +80,7 @@ func main() {
74+
Content: article.Text,
75+
})
76+
}
77+
- log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...")
78+
+ log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...")
79+
err = collection.AddDocuments(ctx, docs, runtime.NumCPU())
80+
if err != nil {
81+
panic(err)
82+
```
83+
84+
</details>
85+
86+
Or alternatively, if you want to use OpenAI for everything (embeddings creation and LLM):
87+
88+
<details><summary>Apply this patch</summary>
89+
90+
```diff
91+
diff --git a/example/llm.go b/example/llm.go
92+
index 1fde4ec..7cb81cc 100644
93+
--- a/example/llm.go
94+
+++ b/example/llm.go
95+
@@ -2,23 +2,13 @@ package main
96+
97+
import (
98+
"context"
99+
- "net/http"
100+
+ "os"
101+
"strings"
102+
"text/template"
103+
104+
"github.com/sashabaranov/go-openai"
105+
)
106+
107+
-const (
108+
- // We use a local LLM running in Ollama for asking the question: https://github.com/ollama/ollama
109+
- ollamaBaseURL = "http://localhost:11434/v1"
110+
- // We use Google's Gemma (2B), a very small model that doesn't need much resources
111+
- // and is fast, but doesn't have much knowledge: https://huggingface.co/google/gemma-2b
112+
- // We found Gemma 2B to be superior to TinyLlama (1.1B), Stable LM 2 (1.6B)
113+
- // and Phi-2 (2.7B) for the retrieval augmented generation (RAG) use case.
114+
- llmModel = "gemma:2b"
115+
-)
116+
-
117+
// There are many different ways to provide the context to the LLM.
118+
// You can pass each context as user message, or the list as one user message,
119+
// or pass it in the system prompt. The system prompt itself also has a big impact
120+
@@ -47,10 +37,7 @@ Don't mention the knowledge base, context or search results in your answer.
121+
122+
func askLLM(ctx context.Context, contexts []string, question string) string {
123+
// We can use the OpenAI client because Ollama is compatible with OpenAI's API.
124+
- openAIClient := openai.NewClientWithConfig(openai.ClientConfig{
125+
- BaseURL: ollamaBaseURL,
126+
- HTTPClient: http.DefaultClient,
127+
- })
128+
+ openAIClient := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
129+
sb := &strings.Builder{}
130+
err := systemPromptTpl.Execute(sb, contexts)
131+
if err != nil {
132+
@@ -66,7 +53,7 @@ func askLLM(ctx context.Context, contexts []string, question string) string {
133+
},
134+
}
135+
res, err := openAIClient.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
136+
- Model: llmModel,
137+
+ Model: openai.GPT3Dot5Turbo,
138+
Messages: messages,
139+
})
140+
if err != nil {
141+
diff --git a/example/main.go b/example/main.go
142+
index 55b3076..044a246 100644
143+
--- a/example/main.go
144+
+++ b/example/main.go
145+
@@ -12,19 +12,11 @@ import (
146+
"github.com/philippgille/chromem-go"
147+
)
148+
149+
-const (
150+
- question = "When did the Monarch Company exist?"
151+
- // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
152+
- embeddingModel = "nomic-embed-text"
153+
-)
154+
+const question = "When did the Monarch Company exist?"
155+
156+
func main() {
157+
ctx := context.Background()
158+
159+
- // Warm up Ollama, in case the model isn't loaded yet
160+
- log.Println("Warming up Ollama...")
161+
- _ = askLLM(ctx, nil, "Hello!")
162+
-
163+
// First we ask an LLM a fairly specific question that it likely won't know
164+
// the answer to.
165+
log.Println("Question: " + question)
166+
@@ -48,7 +40,7 @@ func main() {
167+
// variable to be set.
168+
// For this example we choose to use a locally running embedding model though.
169+
// It requires Ollama to serve its API at "http://localhost:11434/api".
170+
- collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel))
171+
+ collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil)
172+
if err != nil {
173+
panic(err)
174+
}
175+
@@ -82,7 +74,7 @@ func main() {
176+
Content: article.Text,
177+
})
178+
}
179+
- log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...")
180+
+ log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...")
181+
err = collection.AddDocuments(ctx, docs, runtime.NumCPU())
182+
if err != nil {
183+
panic(err)
184+
```
185+
186+
</details>

‎example/llm.go‎

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ import (
1010
)
1111

1212
const (
13-
// We use a local LLM running in Ollama for asking the question: https://ollama.com/
13+
// We use a local LLM running in Ollama for asking the question: https://github.com/ollama/ollama
1414
ollamaBaseURL = "http://localhost:11434/v1"
15-
// We use a very small model that doesn't need much resources and is fast, but
16-
// doesn't have much knowledge: https://ollama.com/library/gemma
15+
// We use Google's Gemma (2B), a very small model that doesn't need much resources
16+
// and is fast, but doesn't have much knowledge: https://huggingface.co/google/gemma-2b
1717
// We found Gemma 2B to be superior to TinyLlama (1.1B), Stable LM 2 (1.6B)
1818
// and Phi-2 (2.7B) for the retrieval augmented generation (RAG) use case.
1919
llmModel = "gemma:2b"

‎example/main.go‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import (
1414

1515
const (
1616
question = "When did the Monarch Company exist?"
17-
// We use a local LLM running in Ollama for the embedding: https://ollama.com/library/nomic-embed-text
17+
// We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
1818
embeddingModel = "nomic-embed-text"
1919
)
2020

0 commit comments

Comments
 (0)