Return to Articles

Defining Open Source AI: The Road Ahead

News
The AI Alliance

The AI Alliance is a global coalition of AI builders and users that believes that open science and open development is the best way to bring trustworthy and beneficial AI to the world.

Open source is an essential part of the AI ecosystem: highly capable open source foundation models, tools to tune and adapt them, and an increasing amount of open community innovation and governance¹ are enabling safer, more accurate, more valuable AI.

Evidence backs this up: According to one survey, 14 million developers (47% of all developers globally) are using open source AI models.² What are they doing? There are thousands of individuals across the world working on better support for their own native languages, better ways to make sure AI output is trusted and safe, new ways to use AI to solve problems most of us don’t even know about. For example, the Llama family of models has been downloaded over 1 billion times to date. There are now tens of thousands of community-created variants of the Llama open model family on Hugging Face alone.² Open communities of AI researchers and developers, such as BigScience, BigCode, and the Allen Institute for AI, have demonstrated the power of open science in pioneering advancements in AI and simultaneously cultivating downstream value for users. ServiceNow, for example, has reported that their implementation and use of BigCode’s StarCoder models led to a 52% improvement in speed of development and innovation in real-world use cases such as text-to-code and text-to-workflow.³ IBM’s Granite family of open foundations and Red Hat’s Instruct Lab project for community collaborative model tuning has focused on enterprise use cases and shown how important openness and transparency are to building trust in Enterprise adoption of AI.

There is a vigorous debate about the exact definition of Open Source AI. We think this debate is important and needs time to play out. Let's start practically, focus on the essentials, and acknowledge that definitions will continue to evolve.

Open is first and foremost a practical approach to make AI more capable and trusted. Let’s make sure the ongoing debates about the precise definitions of Open Source AI are enablers of progress and not blockers.

Open Source Foundation Models: A Practical Baseline

A pre-trained Open Source foundation model should reasonably enable a person with a technical background to use, modify, study and share the model. The artifacts typically needed to do this include the pre-trained model weights and the model architecture’s source code which is used to run inference with the model, released under a permissive license and accompanied by technical facts and documentation (often called a model card). We think this is a useful baseline – a working definition – and it works. We can decide over time how we refine it as the ecosystem evolves.

Models with at least these components available are the heart of AI innovation today and have been the basis of hundreds of thousands of forks, adaptations, and deployed use cases bringing practical capability to users globally. Examples of such models include the Llama family of models from Meta, the Granite family of models from IBM, Arctic from Snowflake, Falcon from the Technology Innovation Institute in Dubai, Mosaic from Databricks, DeepSeek-R1 from DeepSeek-AI, StarCoder from BigCode, and OLMo from Allen Institute for AI.

Beyond Open Source: Enabling Open Science

Many of the model families listed above release even more information, research and evaluation results, and technical artifacts from the model development process. While not strictly required to use, adapt, modify, and share a model, openly releasing data sets, training code, data pipelines, interim checkpoints, evaluation code, research and evaluation results can be especially helpful to researchers who are working to understand and develop more capable, safer, and trusted AI models, agents and systems. Like-minded open communities such as NumFOCUS also contribute to the Open Science movement by promoting open practices, supporting sustainable software development, and facilitating a collaborative scientific community.

We encourage model developers including our AI Alliance research projects to release as much as possible. The AI Alliance is also prioritizing building large scale community resources like our Open Trusted Data Initiative that can help model builders and application developers everywhere. For our own AI Alliance model development projects we strive to release as much as possible.

More than Just Models!

There are many other technology components beyond those related directly to a pre-trained foundation model that are critical to responsible Open Source AI innovation. Tuning data sets, benchmarks for safety and accuracy, databases, application frameworks, libraries and frameworks for building and tuning models, recipes and patterns of application, and more are important to developing and deploying AI. Increasingly, these artifacts are the key to building safe, trusted, and useful AI, especially agentic AI.

Avoiding One-Size-Fits-All Solutions

Open source and open science in AI has many goals, many parts, and a diverse community of people globally that are driving it. That’s why attempts to require a monolithic definition of Open Source AI have led to a debate that threatens to block progress. These definitional approaches may also inadvertently reduce model developers’ incentives to provide open access to state-of-the-art technology and undermine the development of an Open Source AI technology ecosystem – which we believe is the only way AI can become trusted and widely used.

This is especially true if regulators adopt too-narrow definitions for Open Source AI technology under the law. These approaches may result in an environment where only a few large companies own cutting-edge AI technology, limiting innovation, and blocking broad access to AI's benefits.

The Road Ahead

Open Source software is now well-defined. But only after years of maturation advancing hand in hand with practical progress. Fortunately, we have time to mature our definitions of Open Source AI. That’s why we applaud efforts seeking to advance the state of definitions and look forward to being part of these discussions and informing them.

In the meantime, let’s make sure we adopt a practical approach that helps enable and accelerate progress in AI.

¹ https://about.fb.com/news/2025/03/celebrating-1-billion-downloads-llama/
² https://huggingface.co/models?sort=trending&search=Llama
³ https://www.servicenow.com/blogs/2024/bigcode-open-innovation-case-study

Related Articles

View All

From Semiconductor to Maritime: A Blueprint for Domain-Specific AI in Safety-Critical Industries

From semiconductor fabs to open seas, the AI Alliance is redefining how domain-specific AI supports safety-critical industries. This blog spotlights Llamarine, a maritime large language model co-developed by Aitomatic and Furuno, building on lessons from SemiKong, the first semiconductor-specific model. Designed to embody real seamanship rather than generic knowledge, Llamarine integrates deep maritime regulations and Furuno’s decades of navigational expertise into its reasoning. The result is a model that provides deterministic, regulation-compliant, and operationally sound guidance—outperforming GPT-4o, Claude Sonnet 3.5, and other general-purpose models. Together, these projects outline a blueprint for trustworthy, specialized AI that can be applied across industries where precision and reliability are non-negotiable.

GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

GEO-Bench-2, developed by IBM, ServiceNow, and the AI Alliance Climate & Sustainability Working Group, establishes a new global standard for evaluating Geospatial Foundation Models (GeoFMs). By combining 19 datasets across 8 subsets, a flexible evaluation protocol, and over 15,000 baseline experiments, it delivers a transparent, rigorous, and collaborative framework for advancing geospatial AI. Integrated with tools like TerraTorch and hosted on Hugging Face, GEO-Bench-2 bridges research and real-world impact—empowering scientists, industry, and policymakers to measure progress, accelerate innovation, and apply trustworthy AI to climate, sustainability, and disaster resilience challenges.

V0.1 of the OTDI dataset specification

Announcing the Open Trusted Data Initiative (OTDI) draft v0.1 dataset specification

Announcing the Open Trusted Data Initiative (OTDI) draft v0.1 dataset specification...