Transparently use multiple Ollama servers
Just a code summary of a crazy week.
At this point Pyjama can load balance requests to different ollama server transparently via pyjama.parallel/generate.
Basically, that gives:
(->>
{:url "http://localhost:11432,http://localhost:11434"
:models ["llama3.1"]
:format {:type "integer"}
:pre "This is a potential answer %s02 for this question: %s01. Give a score to the answer on a scale 1 to 100: based on how accurate it.
- Do not give an answer yourself.
- No comment.
- No explanation.
- No extra text. "
:prompts [["Why is the sky blue" "The sky appears blue because of a process called Rayleigh scattering."]
["Why is the sky blue" "Blue is scattered more than other colors because it travels as shorter, smaller waves."]
["Why is the sky blue" "During the day the sky looks blue because it's the blue light that gets scattered the most. "]
["Why is the sky blue" "Because it is Christmas. "]
]}
(pyjama.parallel/generate)
(map #(-> [(:prompt %) (:result %) (:url %)]))
(clojure.pprint/pprint))
And yes, the prompts are dispatched to the different ollama servers as can he seen in the urls in the answer, showing where the request was effectively executed. The set of URLs can also be set via the OLLAMA_URL system env at runtime.