Transparently use multiple Ollama servers

Just a code summary of a crazy week.

At this point Pyjama can load balance requests to different ollama server transparently via pyjama.parallel/generate.

Basically, that gives:

(->>
    {:url "http://localhost:11432,http://localhost:11434"
     :models  ["llama3.1"]
     :format  {:type "integer"}
     :pre     "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
       - Do not give an answer yourself.
       - No comment.
       - No explanation.
       - No extra text. "
     :prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
               ["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
               ["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
               ["Why is the sky blue" "Because it is Christmas. "]
               ]}
    (pyjama.parallel/generate)
    (map #(-> [(:prompt %) (:result %) (:url %)]))
    (clojure.pprint/pprint))

And yes, the prompts are dispatched to the different ollama servers as can he seen in the urls in the answer, showing where the request was effectively executed. The set of URLs can also be set via the OLLAMA_URL system env at runtime.

Using Schemas with Ollama

Ollama added support for json schema-based output a while ago. So you can tell what output is coming from back from the model, and with a nice structure.

I found the quality of the answers to be slightly lower, but with the added advantage of a well-formatted response.

Pyjama has full support for creating ollama functions based on a schema. Which means function are called as Clojure function, and transparently making request calls to your ollama running model.

Llama parse from Clojure

Who is Llama Parse ?

Sure! Here’s a fun intro to LlamaParse:

Imagine you have a mountain of documents—PDFs, messy scans, or text-heavy reports—and you need to extract useful information fast. Instead of manually copying and pasting like a tired intern, meet LlamaParse, your AI-powered document whisperer!

LlamaParse takes complex, unstructured documents and turns them into clean, structured data, ready for analysis or automation. Whether you’re dealing with legal contracts, research papers, or financial reports, this Llama doesn’t spit—it delivers.

Ollama on Raspberry Pi

As part of getting into SLM for small AI devices, we are going to look at inference speed on the Raspberry Pi. Mostly six models are of interest:

  • tinyllama
  • tinydolphin
  • phi3
  • smallthinker
  • granite3.1-moe
  • llama3.2:1b

Each model has a small number of parameters, to make sure we get usable speeds for inference.

We will get speed figures on simple inference for each of them.

In a previous post, we installed Arch Linux on the raspberry pi (make sure to have a look at Running Arch on Pi ).

Arch Linux on Raspberry Pi 5: Back from the trenches

I have some software packages on the Arch User Repositry (AUR), making it easy to deploy on various devices. It used to be a bit tricky to install Arch on the raspberry pi, but now it is possible to get to a running system quite fast, so here are notes on how to get going, and get your Pi running with Arch.

Download the ISO

First of all, we need the ISO for install. The iso/zip I downloaded was from rasparch on sourceforge. Directly burned the file to the SD card, using standard tools Etcher. I inserted the SD card in the Pi, plug the power, and the small beast started booting …