Using Clojure for RAG with Ollama and Pyjama
Retrieval-Augmented Generation (RAG) is a powerful technique that combines retrieval-based search with generative AI models. This approach ensures that responses are based on relevant context rather than solely on the model’s pretrained knowledge. In this blog post, we’ll explore a simple RAG setup in Clojure using Ollama and Pyjama.

Setting Up the Environment
The script starts by defining the URL of the Ollama server, which serves the embedding and generative models:
(def url (or (System/getenv "OLLAMA_URL")
"http://localhost:11432"))
If the OLLAMA_URL
environment variable is set, it will use that; otherwise, it defaults to http://localhost:11432
.
Next, we define the embedding model:
(def embedding-model "granite-embedding")
The embedding model converts textual data into vector representations that can be efficiently compared to find similarities.
Loading the Source of Truth
Our RAG setup requires a “source of truth,” which serves as the knowledge base. This is loaded from a file:
(def source-of-truth
(pyjama.utils/load-lines-of-file "test/morning/source_of_truth.txt"))
The load-lines-of-file
function reads the file line by line and stores it in source-of-truth
.
Generating Embeddings
We generate embeddings for our source documents using the generate-vectorz-documents
function:
(let [embed-config {:url url
:chunk-size 500
:documents source-of-truth
:embedding-model embedding-model}
documents (pyjama.embeddings/generate-vectorz-documents embed-config)
chunk-size
: Defines the maximum number of tokens per chunk.documents
: The source documents loaded earlier.embedding-model
: The model responsible for embedding generation.
This function transforms the documents into vector representations that can later be used for similarity searches.
Setting Up the Query
We then define a question and a pre-formatted prompt template:
question "why ice floats?"
pre "Context: \n\n
%s.
\n\n
Answer the question:
%s
using no previous knowledge and ONLY knowledge from the context."
The %s
placeholders allow us to insert the retrieved context and the actual question dynamically.
Configuring the RAG Pipeline
We configure the RAG pipeline with additional settings:
config (assoc embed-config
:question question
:strategy :euclidean
:top-n 1
:model "llama3.1"
:pre pre)
strategy :euclidean
: Uses Euclidean distance to find the closest matching documents.top-n 1
: Retrieves the single most relevant document.model "llama3.1"
: Specifies the generative model to use.pre pre
: The pre-formatted prompt is assigned to ensure the model stays within the given context.
Running the RAG Pipeline
Finally, we execute the RAG process by retrieving relevant documents and generating an answer:
(println
(pyjama.embeddings/simple-rag config documents)))
This step performs the following:
- Finds the most relevant document using embeddings and similarity search.
- Constructs a prompt using the retrieved context.
- Generates a response using the
llama3.1
model.
Sample Output
When asked, “Why does ice float?”, the response might be:
According to the context, ice floats because it’s cold. The reason it wants to get warm is so that it can be nearer to the sun…
While the response might need refinement, this showcases how retrieval-based augmentation influences the generated answers.
Conclusion
This setup provides a simple but effective way to implement RAG in Clojure using Ollama and Pyjama. By leveraging embeddings for document retrieval and feeding them into a generative model, we ensure that responses are grounded in relevant knowledge.
This method can be extended by:
- Expanding the knowledge base.
- Using different similarity strategies.
- Tuning prompt engineering for better responses.
Full code listing on github