Using Clojure for RAG with Ollama and Pyjama

Author included in AI

2025-02-13 515 words 3 minutes

Contents

Retrieval-Augmented Generation (RAG) is a powerful technique that combines retrieval-based search with generative AI models. This approach ensures that responses are based on relevant context rather than solely on the model’s pretrained knowledge. In this blog post, we’ll explore a simple RAG setup in Clojure using Ollama and Pyjama.

Setting Up the Environment

The script starts by defining the URL of the Ollama server, which serves the embedding and generative models:

(def url (or (System/getenv "OLLAMA_URL")
             "http://localhost:11432"))

If the OLLAMA_URL environment variable is set, it will use that; otherwise, it defaults to http://localhost:11432.

Next, we define the embedding model:

(def embedding-model "granite-embedding")

The embedding model converts textual data into vector representations that can be efficiently compared to find similarities.

Loading the Source of Truth

Our RAG setup requires a “source of truth,” which serves as the knowledge base. This is loaded from a file:

(def source-of-truth
  (pyjama.utils/load-lines-of-file "test/morning/source_of_truth.txt"))

The load-lines-of-file function reads the file line by line and stores it in source-of-truth.

Generating Embeddings

We generate embeddings for our source documents using the generate-vectorz-documents function:

(let [embed-config {:url             url
                      :chunk-size      500
                      :documents       source-of-truth
                      :embedding-model embedding-model}
        documents (pyjama.embeddings/generate-vectorz-documents embed-config)

chunk-size: Defines the maximum number of tokens per chunk.
documents: The source documents loaded earlier.
embedding-model: The model responsible for embedding generation.

This function transforms the documents into vector representations that can later be used for similarity searches.

Setting Up the Query

We then define a question and a pre-formatted prompt template:

        question "why ice floats?"

        pre "Context: \n\n
        %s.
        \n\n
        Answer the question:
        %s
        using no previous knowledge and ONLY knowledge from the context."

The %s placeholders allow us to insert the retrieved context and the actual question dynamically.

Configuring the RAG Pipeline

We configure the RAG pipeline with additional settings:

        config (assoc embed-config
                 :question question
                 :strategy :euclidean
                 :top-n 1

                 :model "llama3.1"
                 :pre pre)

strategy :euclidean: Uses Euclidean distance to find the closest matching documents.
top-n 1: Retrieves the single most relevant document.
model "llama3.1": Specifies the generative model to use.
pre pre: The pre-formatted prompt is assigned to ensure the model stays within the given context.

Running the RAG Pipeline

Finally, we execute the RAG process by retrieving relevant documents and generating an answer:

    (println
      (pyjama.embeddings/simple-rag config documents)))

This step performs the following:

Finds the most relevant document using embeddings and similarity search.
Constructs a prompt using the retrieved context.
Generates a response using the llama3.1 model.

Sample Output

When asked, “Why does ice float?”, the response might be:

According to the context, ice floats because it’s cold. The reason it wants to get warm is so that it can be nearer to the sun…

While the response might need refinement, this showcases how retrieval-based augmentation influences the generated answers.

Conclusion

This setup provides a simple but effective way to implement RAG in Clojure using Ollama and Pyjama. By leveraging embeddings for document retrieval and feeding them into a generative model, we ensure that responses are grounded in relevant knowledge.

This method can be extended by:

Expanding the knowledge base.
Using different similarity strategies.
Tuning prompt engineering for better responses.

Full code listing on github