|
| 1 | +# Example |
| 2 | + |
| 3 | +This example shows a retrieval augmented generation (RAG) application, using `chromem-go` as knowledge base for finding relevant info for a question. |
| 4 | + |
| 5 | +We run the embeddings model and LLM in [Ollama](https://github.com/ollama/ollama), to showcase how a RAG application can run entirely offline, without relying on OpenAI or other third party APIs. It doesn't require a GPU, and a CPU like an 11th Gen Intel i5-1135G7 (like in the first generation Framework Laptop 13) is fast enough. |
| 6 | + |
| 7 | +As LLM we use Google's [Gemma (2B)](https://huggingface.co/google/gemma-2b), a very small model that doesn't need much resources and is fast, but doesn't have much knowledge, so it's a prime example for the combination of LLMs and vector databases. We found Gemma 2B to be superior to [TinyLlama (1.1B)](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), [Stable LM 2 (1.6B)](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b) and [Phi-2 (2.7B)](https://huggingface.co/microsoft/phi-2) for the RAG use case. |
| 8 | + |
| 9 | +As embeddings model we use Nomic's [nomic-embed-text v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5). |
| 10 | + |
| 11 | +## How to run |
| 12 | + |
| 13 | +1. Install Ollama: <https://ollama.com/download> |
| 14 | +2. Download the two models: |
| 15 | + - `ollama pull gemma:2b` |
| 16 | + - `ollama pull nomic-embed-text` |
| 17 | +3. Run the example: `go run .` |
| 18 | + |
| 19 | +## Output |
| 20 | + |
| 21 | +The output can differ slightly on each run, but it's along the lines of: |
| 22 | + |
| 23 | +```log |
| 24 | +2024/03/02 20:02:30 Warming up Ollama... |
| 25 | +2024/03/02 20:02:33 Question: When did the Monarch Company exist? |
| 26 | +2024/03/02 20:02:33 Asking LLM... |
| 27 | +2024/03/02 20:02:34 Initial reply from the LLM: "I cannot provide information on the Monarch Company, as I am unable to access real-time or comprehensive knowledge sources." |
| 28 | +2024/03/02 20:02:34 Setting up chromem-go... |
| 29 | +2024/03/02 20:02:34 Reading JSON lines... |
| 30 | +2024/03/02 20:02:34 Adding documents to chromem-go, including creating their embeddings via Ollama API... |
| 31 | +2024/03/02 20:03:11 Querying chromem-go... |
| 32 | +2024/03/02 20:03:11 Document 1 (similarity: 0.723627): "Malleable Iron Range Company was a company that existed from 1896 to 1985 and primarily produced kitchen ranges made of malleable iron but also produced a variety of other related products. The company's primary trademark was 'Monarch' and was colloquially often referred to as the Monarch Company or just Monarch." |
| 33 | +2024/03/02 20:03:11 Document 2 (similarity: 0.550584): "The American Motor Car Company was a short-lived company in the automotive industry founded in 1906 lasting until 1913. It was based in Indianapolis Indiana United States. The American Motor Car Company pioneered the underslung design." |
| 34 | +2024/03/02 20:03:11 Asking LLM with augmented question... |
| 35 | +2024/03/02 20:03:32 Reply after augmenting the question with knowledge: "The Monarch Company existed from 1896 to 1985." |
| 36 | +``` |
| 37 | + |
| 38 | +The majority of the time here is spent during the embeddings creation as well as the LLM conversation, which are not part of `chromem-go`. |
| 39 | + |
| 40 | +## OpenAI |
| 41 | + |
| 42 | +You can easily adapt the code to work with OpenAI instead of locally in Ollama. |
| 43 | + |
| 44 | +Add the OpenAI API key in your environment as `OPENAI_API_KEY`. |
| 45 | + |
| 46 | +Then, if you want to create the embeddings via OpenAI, but still use Gemma 2B as LLM: |
| 47 | + |
| 48 | +<details><summary>Apply this patch</summary> |
| 49 | + |
| 50 | +```diff |
| 51 | +diff --git a/example/main.go b/example/main.go |
| 52 | +index 55b3076..cee9561 100644 |
| 53 | +--- a/example/main.go |
| 54 | ++++ b/example/main.go |
| 55 | +@@ -14,8 +14,6 @@ import ( |
| 56 | + |
| 57 | + const ( |
| 58 | + question = "When did the Monarch Company exist?" |
| 59 | +- // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 |
| 60 | +- embeddingModel = "nomic-embed-text" |
| 61 | + ) |
| 62 | + |
| 63 | + func main() { |
| 64 | +@@ -48,7 +46,7 @@ func main() { |
| 65 | + // variable to be set. |
| 66 | + // For this example we choose to use a locally running embedding model though. |
| 67 | + // It requires Ollama to serve its API at "http://localhost:11434/api". |
| 68 | +- collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel)) |
| 69 | ++ collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil) |
| 70 | + if err != nil { |
| 71 | + panic(err) |
| 72 | + } |
| 73 | +@@ -82,7 +80,7 @@ func main() { |
| 74 | + Content: article.Text, |
| 75 | + }) |
| 76 | + } |
| 77 | +- log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...") |
| 78 | ++ log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...") |
| 79 | + err = collection.AddDocuments(ctx, docs, runtime.NumCPU()) |
| 80 | + if err != nil { |
| 81 | + panic(err) |
| 82 | +``` |
| 83 | + |
| 84 | +</details> |
| 85 | + |
| 86 | +Or alternatively, if you want to use OpenAI for everything (embeddings creation and LLM): |
| 87 | + |
| 88 | +<details><summary>Apply this patch</summary> |
| 89 | + |
| 90 | +```diff |
| 91 | +diff --git a/example/llm.go b/example/llm.go |
| 92 | +index 1fde4ec..7cb81cc 100644 |
| 93 | +--- a/example/llm.go |
| 94 | ++++ b/example/llm.go |
| 95 | +@@ -2,23 +2,13 @@ package main |
| 96 | + |
| 97 | + import ( |
| 98 | + "context" |
| 99 | +- "net/http" |
| 100 | ++ "os" |
| 101 | + "strings" |
| 102 | + "text/template" |
| 103 | + |
| 104 | + "github.com/sashabaranov/go-openai" |
| 105 | + ) |
| 106 | + |
| 107 | +-const ( |
| 108 | +- // We use a local LLM running in Ollama for asking the question: https://github.com/ollama/ollama |
| 109 | +- ollamaBaseURL = "http://localhost:11434/v1" |
| 110 | +- // We use Google's Gemma (2B), a very small model that doesn't need much resources |
| 111 | +- // and is fast, but doesn't have much knowledge: https://huggingface.co/google/gemma-2b |
| 112 | +- // We found Gemma 2B to be superior to TinyLlama (1.1B), Stable LM 2 (1.6B) |
| 113 | +- // and Phi-2 (2.7B) for the retrieval augmented generation (RAG) use case. |
| 114 | +- llmModel = "gemma:2b" |
| 115 | +-) |
| 116 | +- |
| 117 | + // There are many different ways to provide the context to the LLM. |
| 118 | + // You can pass each context as user message, or the list as one user message, |
| 119 | + // or pass it in the system prompt. The system prompt itself also has a big impact |
| 120 | +@@ -47,10 +37,7 @@ Don't mention the knowledge base, context or search results in your answer. |
| 121 | + |
| 122 | + func askLLM(ctx context.Context, contexts []string, question string) string { |
| 123 | + // We can use the OpenAI client because Ollama is compatible with OpenAI's API. |
| 124 | +- openAIClient := openai.NewClientWithConfig(openai.ClientConfig{ |
| 125 | +- BaseURL: ollamaBaseURL, |
| 126 | +- HTTPClient: http.DefaultClient, |
| 127 | +- }) |
| 128 | ++ openAIClient := openai.NewClient(os.Getenv("OPENAI_API_KEY")) |
| 129 | + sb := &strings.Builder{} |
| 130 | + err := systemPromptTpl.Execute(sb, contexts) |
| 131 | + if err != nil { |
| 132 | +@@ -66,7 +53,7 @@ func askLLM(ctx context.Context, contexts []string, question string) string { |
| 133 | + }, |
| 134 | + } |
| 135 | + res, err := openAIClient.CreateChatCompletion(ctx, openai.ChatCompletionRequest{ |
| 136 | +- Model: llmModel, |
| 137 | ++ Model: openai.GPT3Dot5Turbo, |
| 138 | + Messages: messages, |
| 139 | + }) |
| 140 | + if err != nil { |
| 141 | +diff --git a/example/main.go b/example/main.go |
| 142 | +index 55b3076..044a246 100644 |
| 143 | +--- a/example/main.go |
| 144 | ++++ b/example/main.go |
| 145 | +@@ -12,19 +12,11 @@ import ( |
| 146 | + "github.com/philippgille/chromem-go" |
| 147 | + ) |
| 148 | + |
| 149 | +-const ( |
| 150 | +- question = "When did the Monarch Company exist?" |
| 151 | +- // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 |
| 152 | +- embeddingModel = "nomic-embed-text" |
| 153 | +-) |
| 154 | ++const question = "When did the Monarch Company exist?" |
| 155 | + |
| 156 | + func main() { |
| 157 | + ctx := context.Background() |
| 158 | + |
| 159 | +- // Warm up Ollama, in case the model isn't loaded yet |
| 160 | +- log.Println("Warming up Ollama...") |
| 161 | +- _ = askLLM(ctx, nil, "Hello!") |
| 162 | +- |
| 163 | + // First we ask an LLM a fairly specific question that it likely won't know |
| 164 | + // the answer to. |
| 165 | + log.Println("Question: " + question) |
| 166 | +@@ -48,7 +40,7 @@ func main() { |
| 167 | + // variable to be set. |
| 168 | + // For this example we choose to use a locally running embedding model though. |
| 169 | + // It requires Ollama to serve its API at "http://localhost:11434/api". |
| 170 | +- collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel)) |
| 171 | ++ collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil) |
| 172 | + if err != nil { |
| 173 | + panic(err) |
| 174 | + } |
| 175 | +@@ -82,7 +74,7 @@ func main() { |
| 176 | + Content: article.Text, |
| 177 | + }) |
| 178 | + } |
| 179 | +- log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...") |
| 180 | ++ log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...") |
| 181 | + err = collection.AddDocuments(ctx, docs, runtime.NumCPU()) |
| 182 | + if err != nil { |
| 183 | + panic(err) |
| 184 | +``` |
| 185 | + |
| 186 | +</details> |
0 commit comments