Return to Articles

From Layout to Logic: How Docling is Redefining Document AI  

Agata Ferretti
Michele Dolfi
Peter Star
docling open ai open data YouTube

Original Interview from January 2025

When you need document processing that is seamless, efficient, and entirely controlled by you-free of restrictions or reliance on external API, Docling is the answer. Docling  is an open-source Python package designed to prepare documents for GenAI models with precision while not compromising speed.  

As an affiliated project of the AI Alliance, Docling started as a project within IBM research and gained substantial traction. Since its release, Docling earned more than 25,000 stars on GitHub

To get an idea of what Docling is and where to start watch our video

open data open source AI ALLIANCE

So far, Docling offers: 
Local Processing – No cloud dependencies, ensuring data privacy 
High-Quality Output – Small yet powerful models for fast and accurate results 
Flexible Licensing – Designed to empower enterprise and community adoption 

In the spirit of collaboration and innovation, we’re excited to see AI Alliance member Hugging Face partnering with IBM Research to enhance Docling’s capabilities. Hugging Face recently introduced two lightweight Vision Language Models, SmolVLM-256M and SmolVLM-500M. These highly efficient, multimodal models offer powerful vision and text understanding in a compact format, making them ideal for document processing, image captioning, and visual reasoning—key areas where Docling excels. 

SmolDocling, built on Hugging Face’s SmolVLM, is a groundbreaking open-source document processing model that redefines document conversion. Packing 256M parameters into an end-to-end solution, it introduces DocTags, a universal markup format that not only captures content but also preserves structural information and spatial positioning.  

With its remarkable efficiency, SmolDocling accurately interprets and reproduces complex document features in a single step. Such features include code listings, tables, equations, charts, and lists, layout segmentation across diverse document types—including business reports, academic papers, patents, and structured forms.  

Notably, it competes with Vision Language Models up to 27 times larger, while significantly reducing computational demands. Click here to see a demo of Smoldocling capabilities. In addition, the Smoldocling team is contributing new, publicly sourced datasets for charts, tables, equations, and code recognition to further advance open AI research. The model is already available, and the datasets will be released soon. 

By March 27, 2025, SmolDocling became the #1 trending model on Hugging Face—one of the few small Vision Language Models (VLMs) to achieve this milestone! This success, driven by the AI Alliance mission, underscores the power of multi-stakeholder collaboration in developing and deploying open foundation models. 

To continue this momentum, we invite the broader community of users and developers to contribute to Docling and SmolDocling by building new features, creating plug-in extensions, and enabling seamless integrations. Together, we can push the boundaries of open AI innovation! 

There are three key ways to get involved

  1. Create New Datasets – Contribute datasets for training or evaluation. We encourage you to use Docling to curate and share open datasets that can enrich the AI Alliance Open Trusted Data catalog and benefit the open-source community. We are particularly interested in high-quality, domain-specific, and reasoning datasets that support model tuning, application grounding, and enrichment. 
  2. Develop Advanced Applications – Use Docling in increasingly sophisticated application scenarios across domains such as legal, materials science, semiconductors, and finance. Help expand its functionality by building new features and integration patterns. Join our Application and Tools Working Group in Focus Area 3
  3. Enhance SmolDocling Models – Contribute by developing new SmolDocling models tailored to your needs. Stay tuned for updates on open-source fine-tuning efforts! 

docling smoldocling

docling data prep

docling open data

Are you working on an open AI-driven project that could use a bigger stage? The AI Alliance is committed to helping you scale through visibility, collaboration, and support. 

Join the movement. Build. Contribute. Collaborate. Innovate.  

References:

 https://scholar.google.com/citations?view_op=view_citation&hl=en&user=sPuvIfgAAAAJ&cst[…]&sortby=pubdate&citation_for_view=sPuvIfgAAAAJ:VOx2b1Wkg3QC

https://scholar.google.com/citations?view_op=view_citation&hl=en&user=sPuvIfgAAAAJ&sortby=pubdate&citation_for_view=sPuvIfgAAAAJ:wbdj-CoPYUoC

https://scholar.google.com/citations?view_op=view_citation&hl=en&user=sPuvIfgAAAAJ&sortby=pubdate&citation_for_view=sPuvIfgAAAAJ:t6usbXjVLHcC

Related Articles

View All

Architecture of Data Prep Kit Framework 

Technical Report

The Data Prep Kit (DPK) framework enables scalable data transformation using Python, Ray, and Spark, while supporting various data sources such as local disk, S3, and Hugging Face datasets. It defines abstract base classes for transformations, allowing developers to implement custom data and folder transforms that operate seamlessly across different runtimes. DPK also introduces a data abstraction layer to streamline data access and facilitate checkpointing. To support large-scale processing, it provides three runtimes: Python for small datasets, Ray for distributed execution across clusters, and Spark for highly scalable processing using Resilient Distributed Datasets (RDDs). Additionally, DPK integrates with Kubeflow Pipelines (KFP) for automating transformations within Kubernetes environments. The framework includes transform utilities, testing support, and simplified APIs for invoking transforms efficiently. By abstracting complexity, DPK simplifies development, deployment, and execution of data processing pipelines in both local and distributed environments.

Transform Pipelines in Data Prep Kit 

Technical Report

The blog post explores how Kubeflow Pipelines (KFP) automate Data Prep Kit (DPK) transforms on Kubernetes, simplifying execution, scaling, and scheduling. It details the required Kubernetes infrastructure, reusable KFP components, and a pipeline generator for automating workflows. By integrating KFP, DPK streamlines orchestrating and managing complex data transformations.

The State of Open Source AI Trust and Safety - End of 2024 Edition

News

We conducted a survey with 100 AI Alliance members to learn about the state of open source AI trust and safety for 2024. This blog post highlights key findings on AI applications, model popularity, safety concerns, regulatory focus, and gaps in current safety practices, while also providing an overview of notable open-source projects, tools, and research in the field of AI trust and safety.