Docling

Docling transforms PDF documents into rich JSON or Markdown formats with ease and speed, making it the perfect companion for your knowledge engineering project, feeding hungry LLMs with high quality training data or providing rich input to RAG.

Highlights

  • ⚡ Converts any PDF document to JSON or Markdown format, stable and lightning fast
  • 📑 Understands detailed page layout, reading order and recovers table structures
  • 🔍 Includes OCR support for scanned PDFs
  • 🤖 Integrates easily with LLM app / RAG frameworks like 🦙 LlamaIndex and 🦜🔗 LangChain
  • 💻 Provides a simple and convenient CLI

Project Goals

  • Document AI by enabling advanced workflows unlocking knowledge extraction and exploration from documents.
  • Drive Open-Source Innovation by fostering a collaborative ecosystem around document AI and understanding.
  • Data formats aligning document-based datasets to a uniform format for a common downstream consumption.