Return to Articles

Defining Open Source AI: The Road Ahead

News
The AI Alliance
abstract gradient

The AI Alliance is a global coalition of AI builders and users that believes that open science and open development is the best way to bring trustworthy and beneficial AI to the world.

Open source is an essential part of the AI ecosystem: highly capable open source foundation models, tools to tune and adapt them, and an increasing amount of open community innovation and governance¹ are enabling safer, more accurate, more valuable AI.

Evidence backs this up: According to one survey, 14 million developers (47% of all developers globally) are using open source AI models.² What are they doing? There are thousands of individuals across the world working on better support for their own native languages, better ways to make sure AI output is trusted and safe, new ways to use AI to solve problems most of us don’t even know about. For example, the Llama family of models has been downloaded over 1 billion times to date. There are now tens of thousands of community-created variants of the Llama open model family on Hugging Face alone.² Open communities of AI researchers and developers, such as BigScience, BigCode, and the Allen Institute for AI, have demonstrated the power of open science in pioneering advancements in AI and simultaneously cultivating downstream value for users. ServiceNow, for example, has reported that their implementation and use of BigCode’s StarCoder models led to a 52% improvement in speed of development and innovation in real-world use cases such as text-to-code and text-to-workflow.³ IBM’s Granite family of open foundations and Red Hat’s Instruct Lab project for community collaborative model tuning has focused on enterprise use cases and shown how important openness and transparency are to building trust in Enterprise adoption of AI.

There is a vigorous debate about the exact definition of Open Source AI. We think this debate is important and needs time to play out. Let's start practically, focus on the essentials, and acknowledge that definitions will continue to evolve.

Open is first and foremost a practical approach to make AI more capable and trusted. Let’s make sure the ongoing debates about the precise definitions of Open Source AI are enablers of progress and not blockers.

Open Source Foundation Models: A Practical Baseline

A pre-trained Open Source foundation model should reasonably enable a person with a technical background to use, modify, study and share the model. The artifacts typically needed to do this include the pre-trained model weights and the model architecture’s source code which is used to run inference with the model, released under a permissive license and accompanied by technical facts and documentation (often called a model card). We think this is a useful baseline – a working definition – and it works. We can decide over time how we refine it as the ecosystem evolves.

Models with at least these components available are the heart of AI innovation today and have been the basis of hundreds of thousands of forks, adaptations, and deployed use cases bringing practical capability to users globally. Examples of such models include the Llama family of models from Meta, the Granite family of models from IBM, Arctic from Snowflake, Falcon from the Technology Innovation Institute in Dubai, Mosaic from Databricks, DeepSeek-R1 from DeepSeek-AI, StarCoder from BigCode, and OLMo from Allen Institute for AI.

Beyond Open Source: Enabling Open Science

Many of the model families listed above release even more information, research and evaluation results, and technical artifacts from the model development process. While not strictly required to use, adapt, modify, and share a model, openly releasing data sets, training code, data pipelines, interim checkpoints, evaluation code, research and evaluation results can be especially helpful to researchers who are working to understand and develop more capable, safer, and trusted AI models, agents and systems. Like-minded open communities such as NumFOCUS also contribute to the Open Science movement by promoting open practices, supporting sustainable software development, and facilitating a collaborative scientific community.

We encourage model developers including our AI Alliance research projects to release as much as possible. The AI Alliance is also prioritizing building large scale community resources like our Open Trusted Data Initiative that can help model builders and application developers everywhere. For our own AI Alliance model development projects we strive to release as much as possible.

More than Just Models!

There are many other technology components beyond those related directly to a pre-trained foundation model that are critical to responsible Open Source AI innovation. Tuning data sets, benchmarks for safety and accuracy, databases, application frameworks, libraries and frameworks for building and tuning models, recipes and patterns of application, and more are important to developing and deploying AI. Increasingly, these artifacts are the key to building safe, trusted, and useful AI, especially agentic AI.

Avoiding One-Size-Fits-All Solutions

Open source and open science in AI has many goals, many parts, and a diverse community of people globally that are driving it. That’s why attempts to require a monolithic definition of Open Source AI have led to a debate that threatens to block progress. These definitional approaches may also inadvertently reduce model developers’ incentives to provide open access to state-of-the-art technology and undermine the development of an Open Source AI technology ecosystem – which we believe is the only way AI can become trusted and widely used.

This is especially true if regulators adopt too-narrow definitions for Open Source AI technology under the law. These approaches may result in an environment where only a few large companies own cutting-edge AI technology, limiting innovation, and blocking broad access to AI's benefits.

The Road Ahead

Open Source software is now well-defined. But only after years of maturation advancing hand in hand with practical progress. Fortunately, we have time to mature our definitions of Open Source AI. That’s why we applaud efforts seeking to advance the state of definitions and look forward to being part of these discussions and informing them.

In the meantime, let’s make sure we adopt a practical approach that helps enable and accelerate progress in AI.

¹ https://about.fb.com/news/2025/03/celebrating-1-billion-downloads-llama/
² https://huggingface.co/models?sort=trending&search=Llama
³ https://www.servicenow.com/blogs/2024/bigcode-open-innovation-case-study

Related Articles

View All

Architecture of Data Prep Kit Framework 

Technical Report

The Data Prep Kit (DPK) framework enables scalable data transformation using Python, Ray, and Spark, while supporting various data sources such as local disk, S3, and Hugging Face datasets. It defines abstract base classes for transformations, allowing developers to implement custom data and folder transforms that operate seamlessly across different runtimes. DPK also introduces a data abstraction layer to streamline data access and facilitate checkpointing. To support large-scale processing, it provides three runtimes: Python for small datasets, Ray for distributed execution across clusters, and Spark for highly scalable processing using Resilient Distributed Datasets (RDDs). Additionally, DPK integrates with Kubeflow Pipelines (KFP) for automating transformations within Kubernetes environments. The framework includes transform utilities, testing support, and simplified APIs for invoking transforms efficiently. By abstracting complexity, DPK simplifies development, deployment, and execution of data processing pipelines in both local and distributed environments.

The State of Open Source AI Trust and Safety - End of 2024 Edition

News

We conducted a survey with 100 AI Alliance members to learn about the state of open source AI trust and safety for 2024. This blog post highlights key findings on AI applications, model popularity, safety concerns, regulatory focus, and gaps in current safety practices, while also providing an overview of notable open-source projects, tools, and research in the field of AI trust and safety.

The AI Alliance: Our First Year

News

The AI Alliance launched last December with a mission to build, enable, and advocate for open innovation in AI globally. We’re well on our way!