Llama parse from Clojure

Author included in AI

2025-02-05 500 words 3 minutes

Contents

Who is Llama Parse ?

Sure! Here’s a fun intro to LlamaParse:

Imagine you have a mountain of documents—PDFs, messy scans, or text-heavy reports—and you need to extract useful information fast. Instead of manually copying and pasting like a tired intern, meet LlamaParse, your AI-powered document whisperer!

LlamaParse takes complex, unstructured documents and turns them into clean, structured data, ready for analysis or automation. Whether you’re dealing with legal contracts, research papers, or financial reports, this Llama doesn’t spit—it delivers.

With support for OCR, tables, and multi-page documents, LlamaParse is like having a supercharged reading assistant that never gets bored. So, if you’re drowning in text, let the Llama do the heavy lifting! 🦙🚀

Llama Parse from Clojure

Working on a different AI project, was wondering if there was a way to use Llama Parse from:

outside the llamaindex framework
outside the atrocious python world

And .. yes.

Basically, the simple workflow is:

upload a document with options
you get a job id in response, while llama parse the document
you check from time to time the status of the parsing on that job id
retrieve the parsed document in different format, markdown, text, (and pdfs, images, sounds…)

That’s it.

This will be mostly a commented copy paste of the test code from github.

Prepare the API token

For all of the code to work, you need to create an account and then an API key on LlamaCloud.

Once this is done, set it as an environment variable:

export LLAMA_CLOUD_API_KEY=...

And then, ready to go.

Upload a file to llama parse

The upload document works via the parse-file function:

(parse-file
        ; <file-path-or_url>
        ; <options>
)

You can also pass in directly URLs.

Options are various, and are matching the llama parse documentation in a Clojure way.

{ :language                       "ja"
  :premium-mode                   true
  :skip-diagonal-text             true
  :spreadsheet-extract-sub-tables true
  :use-vendor-multimodal-model    true
  :vendor-multimodal-model-name   "anthropic-sonnet-3.5"
  :job-timeout-in-seconds         300
  ...
}

So, a regular call to process a file remotely accessible would be:

(parse-file
        "https://www.toyota.com/content/dam/toyota/brochures/pdf/2025/gr86_ebrochure.pdf"
        {:language                       "en"})
; (:id "6ff7e56e-7809-4be8-8e33-d93a81ef69f1" ..)

Which returns an id for the processing job.

The job will also show in the processing queue in your account.

Check the status of the job

(-> "6ff7e56e-7809-4be8-8e33-d93a81ef69f1"
      get-job-status)
;{:id dc37ec98-ba31-460c-a39b-6f6d4c3b7095, :status SUCCESS}

The status will be set to SUCCESS once the parsing of the document has finished, usually pretty fast.

See the parsing result

(-> "6ff7e56e-7809-4be8-8e33-d93a81ef69f1"
      (get-parsing-result :markdown))
; ...

Other format are :text, … not really tested so much.

In the default case, :markdown is used, and you can the markdown text directly.

Download directly to a markdown file

I added a convenient function to retrieve the content of a file to markdown.

(wait-and-download "6ff7e56e-7809-4be8-8e33-d93a81ef69f1" "toyota.md")

Upload, Wait, and Download

In case, you want to avoid headaches, and let the clojure code do everything for you:

(llama-parser
    "https://www.toyota.com/content/dam/toyota/brochures/pdf/2025/gr86_ebrochure.pdf"
    {}
    ".")
; file will be saved in : ./gr86_ebrochure.pdf.md

And the result is pretty impressive, especially on tables inside the PDF.

Now, you know everything on the latest GR86, and … maybe your AI too ?