The week AI sovereignty became an engineering discipline

Inside IBM’s One Madison Avenue, the AI Alliance gathered researchers, enterprises, open source leaders, and sovereign institutions around a single question: who gets to build the AI systems the world will depend on? For two years, the conversation has centered on scale — bigger models, more compute, more centralized platforms. Last week, a different question kept surfacing: what if the next phase of AI is about scaling participation, not just scaling models?

That question drives Project Tapestry, the AI Alliance’s new effort to build community-owned, frontier-capable sovereign AI through distributed model development. Partners across countries, sectors, and institutions contribute data, compute, and talent toward a shared base model family — while each keeps ownership of what it brings. This wasn’t an isolated afternoon: it’s the same conviction the Alliance carried to its leadership reception in India and its Tapestry workshop in Paris earlier this year, where the project was born. Three continents, one idea. What follows is what our speakers argued, alongside the research that says it’s not just conjecture.

01 · The setup

We're using a sliver of what exists

Anthony Annunziata · AI Alliance · IBM — The Problem Statement & Tapestry Overview

Anthony grounded the day in a number that should unsettle anyone who thinks the frontier is “solved”: the AI Alliance now spans 205 organizations across 29 countries and 100+ projects — and yet, as he put it, we’re using only a very small fraction of the world’s resources, especially data. Even “open-weight” models, he noted, typically hide their pipelines and training data, so they aren’t truly open. His framing — scale intelligence outward — became the throughline.

“We're only really using a very small fraction of the world's resources, especially data.”
— Anthony Annunziata

The research — The public-data ceiling projects the stock of public human text (~300 trillion tokens) could be exhausted between roughly 2026 and 2032 on current trends (Villalobos et al., 2024), and a 1,800-dataset audit found today’s training-data licensing concentrated and opaque (Longpre et al., 2023).

02 · The method

Train together, keep your data home

Dean Wampler · AI Alliance · IBM — Consortium Training

Dean laid out the engineering heart of Tapestry: an “N + 1” design where dozens of sovereign nodes train on local data that legally cannot leave its borders, then merge updates upward into a shared model. He was candid about the hard parts — hardware heterogeneity, non-trivial merge algorithms, and, critically, preventing the model from memorizing private data, which is why evaluation becomes central. And he reframed the word itself: sovereignty “means that you have agency. You have ownership.”

The research — Federated learning formalized training on decentralized data while sharing only updates (McMahan et al., 2017), and DP-SGD bounds how much any single example can influence a model (Abadi et al., 2016).

03 · The blind spot

Who is AI actually for?

Ambrish Rawat · IBM Research — Cultural Alignment · The Next Billion AI Index

Ambrish named a gap the field has under-served: “there has not been enough focus to deliver something for the next billion. There’s no incentive structure around it.” His answer is an index built with local stakeholders — spanning effectiveness, operational practicality, and societal integrity — explicitly not a universal leaderboard. Validated with developers in India and across Africa, the dimensions that ranked highest were cost-effectiveness, usability, and trust.

The research — LLMs default to US/European opinions, and prompting or translation doesn’t fully correct it (Durmus et al., 2023); under-representation is documented at the language level (Joshi et al., 2020) and the efficiency angle in the energy literature (Strubell et al., 2019).

04 · The resource

Data, responsibly used

Kaushik Bhatta · AI Alliance · B3 Alliance — Data, Part I — The Spectrum from Private to Fully Open

Kaushik’s throughline was incentives over intentions: “it’s not just about being a good person or good leadership — it’s about systems and incentives.” He made the imbalance concrete — only ~5% of the world speaks English as a first language, yet ~50% of the web corpus is English — and made the stakes visceral, pointing to the recent “Fable” access suspension and the 2017 CLOUD Act as reasons sovereign institutions need infrastructure they own, not rent.

The research — The language disparity is quantified in (Joshi et al., 2020), the commons-depletion trend in (Villalobos et al., 2024), and the tightening of open data via licensing in (Longpre et al., 2023).

05 · The open web

Every language that isn't English is underrepresented

Pedro Ortiz Suarez · Common Crawl Foundation — Data, Part II — Expanding Linguistic & Cultural Coverage

The open web that trains everything is itself ~40% English, so “every language that is not English can be considered underrepresented.” Common Crawl’s fixes are concrete: crowdsourcing URL “seeds” for underrepresented languages (a pilot with Africa’s Masakhane community surfaced 28M+ new pages) and an openly built language-ID dataset, CommonLID, spanning 109 languages with 80 native speakers. A pointed finding: at web scale, GPT-class models were worse at language identification than a simple n-gram model — and far more expensive.

The research — It’s the operational complement to the diversity literature (Joshi et al., 2020): coverage comes from building open data infrastructure with the communities a model serves, not from scraping harder.

06 · The untapped ocean

We are not running out of data

Ronnie Falcon · OpenMined — Data, Part III — Data Infrastructure for Frontier Sovereign AI

Ronnie dismantled the “we’re running out of data” narrative: against roughly 180 zettabytes of digitized data, today’s largest training sets are “a drop in the ocean.” His model is “network-source AI” with attribution-based control — the AI queries content at inference time and gets back insights, not copies. OpenMined is already building this for 70 news organizations in Indonesia.

“Don't bake work into the model — let the AI query content directly at inference time.”
— Ronnie Falcon

The research — (Villalobos et al., 2024) bound public text, not all data — unlocking the rest responsibly needs the attribution layer the field lacks (Longpre et al., 2023).

07 · The reality check

Owning it in production

Alexandra Machado · Red Hat — Owning Your Platform AI Journey

The closing voice from the field. Alexandra shared numbers that reframe the “AI adoption” story: an energy company with 7,000 AI pilots and a major bank with 10,000 — almost none reaching production, blocked by cost, explainability, and sovereignty. Her prescription: treat the AI platform as a product, with a paved path to production, one governed data foundation, and a repeatable governance pattern — built with security, infrastructure, and developer teams in the same room.

“You need to own your journey. You cannot let vendors go first.”
— Alexandra Machado, Red Hat

The research — Her governance toolkit is established practice — interpretability methods like LIME (Ribeiro et al., 2016) and SHAP (Lundberg & Lee, 2017), plus model cards and continuous evaluation.

08 · The shift

Own your stack — together

Across eight talks, one message kept surfacing: the world’s data, talent, and compute are vastly underused; access to closed models can change overnight; and the way forward is to own your stack, not rent it. AI may become the defining infrastructure of this century — and the question is becoming less about who owns the smartest model, and more about whether the world’s intelligence will belong to everyone or only to a handful of platforms.

The takeaway

“We need to collaborate. We cannot do it alone.”

— Alexandra Machado, Red Hat

Last week in New York, sovereignty stopped sounding like a slogan and started looking like a build plan. Project Tapestry’s first issue is open — and the slides and recording are available now.

AI Alliance · Newsletter

Open, governed, decision-grade AI — in your inbox.

Research, reference architectures and working code from across our member network. No noise, monthly cadence.

View full post