Blog & Articles

Perspectives, news, and technical reports from our community.

Blog Posts & Articles

GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

GEO-Bench-2, developed by IBM, ServiceNow, and the AI Alliance Climate & Sustainability Working Group, establishes a new global standard for evaluating Geospatial Foundation Models (GeoFMs). By combining 19 datasets across 8 subsets, a flexible evaluation protocol, and over 15,000 baseline experiments, it delivers a transparent, rigorous, and collaborative framework for advancing geospatial AI. Integrated with tools like TerraTorch and hosted on Hugging Face, GEO-Bench-2 bridges research and real-world impact—empowering scientists, industry, and policymakers to measure progress, accelerate innovation, and apply trustworthy AI to climate, sustainability, and disaster resilience challenges.

From Semiconductor to Maritime: A Blueprint for Domain-Specific AI in Safety-Critical Industries

From semiconductor fabs to open seas, the AI Alliance is redefining how domain-specific AI supports safety-critical industries. This blog spotlights Llamarine, a maritime large language model co-developed by Aitomatic and Furuno, building on lessons from SemiKong, the first semiconductor-specific model. Designed to embody real seamanship rather than generic knowledge, Llamarine integrates deep maritime regulations and Furuno’s decades of navigational expertise into its reasoning. The result is a model that provides deterministic, regulation-compliant, and operationally sound guidance—outperforming GPT-4o, Claude Sonnet 3.5, and other general-purpose models. Together, these projects outline a blueprint for trustworthy, specialized AI that can be applied across industries where precision and reliability are non-negotiable.

Building AI Agents to Real-World Use Cases

The AI Alliance's open-source projects, AgentLabUI (a practitioner workbench for building AI agents) and Gofannon (a set of agent tools) work together with ATA Systems' front-end development to create production-ready AI applications in days rather than weeks. The approach is demonstrated through a collaborative Grant Matching Agent case study, where researchers can upload their CV and receive curated funding opportunities within minutes, showcasing a complete workflow from agent development to end-user delivery. AgentLabUI serves as a flexible IDE where practitioners can swap models, build modular tools, and integrate various frameworks, while the Agent UI provides a simple interface for non-technical users to interact with deployed agents without needing to understand the underlying complexity. This two-layer system bridges the gap between AI R&D and real-world adoption, making advanced AI capabilities accessible, secure, and practical across organizations.

AI Alliance x National AI Research Resource Pilot Deep Partnership Program

The AI Alliance is joining the National AI Research Resource (NAIRR) Pilot Deep Partnership program to break down barriers to high-performance computing for researchers, educators, and innovators. In collaboration with Mass Open Cloud, Red Hat, and IBM Research, this initiative provides access to GPUs, CPUs, storage, enterprise cloud tools, and open-source AI models—including IBM’s Granite models, InstructLab, Docling, and science-focused foundation models developed with NASA and ESA. Eligible projects can apply by October 1, 2025, to receive resource credits for Core AI or AI for Science tracks, with supported work running through July 2026. More than just free infrastructure, this program empowers researchers and educators to build, adapt, and contribute open-source AI tools that advance both science and society.

How Can We Test Enterprise AI Applications?

The AI Alliance’s Trust and Safety Focus Area has released version V0.2.0 of the “Achieving Confidence in Enterprise AI Applications” guide, addressing one of the biggest challenges in enterprise adoption of generative AI: how to test probabilistic systems. Traditional enterprise developers are accustomed to deterministic testing, but AI introduces new complexities. The living guide bridges this gap by adapting benchmark techniques into unit, integration, and acceptance benchmarks for AI applications. It shows how to leverage LLMs to generate and validate datasets, reduce randomness in application design, and identify AI “features” that can be developed incrementally in agile workflows. A practical healthcare chatbot example demonstrates how FAQs can be handled deterministically while still using LLMs for flexible input interpretation, balancing trust, safety, and innovation. This release marks a step forward in helping developers confidently design, test, and deploy enterprise-grade AI systems, while inviting broader collaboration from the community.

Building a Deep Research Agent Using MCP-Agent

This article by Sarmad Qadri documents the journey of building a Deep Research Agent with MCP-Agent, highlighting the evolution from an initial Orchestrator design, to an over-engineered Adaptive Workflow, and finally to the streamlined Deep Orchestrator. The author emphasizes that “MCP is all you need,” showing how connecting LLMs to MCP servers with simple design patterns enables agents to perform complex, multi-step research tasks. Key lessons include the importance of simplicity over complexity, leveraging deterministic code-based verification alongside LLM reasoning, external memory for efficiency, and structured prompting for clarity. The resulting Deep Orchestrator balances performance, scalability, and adaptability, proving effective across domains like finance research. Future directions include remote execution, intelligent tool and model selection, and treating memory/knowledge as MCP resources. The open-source project, available on GitHub, offers developers a powerful foundation for creating general-purpose AI research agents.

Openly shared AI tools are transforming medicine. chemistry and more

AI Alliance: Open Science Community Harnessing the Power of Open-Source AI

News

The open science community is increasingly using open source AI to accelerate discovery and innovation across disciplines, with collaboration at its core. At Meta’s first Open Source AI Summit for Advancing Scientific Discovery, scientists and researchers showcased breakthroughs made possible by open models like Llama and FAIR. Examples include UT Health’s use of MedSAM for cancer detection and AIChat for secure personalized research, Mayo Clinic’s RadOnc-GPT for radiation oncology, and national labs leveraging massive compute resources for molecular and materials discovery through the OMol25 project. These efforts highlight how openly shared AI tools are transforming medicine, chemistry, and beyond, reinforcing the importance of continued collaboration through initiatives like the AI Alliance.

Open, Trusted Wikimedia Datasets by Wikimedia Enterprise 

The AI Alliance has launched Wikimedia datasets through the Open Trusted Data Initiative, providing structured Wikipedia and Wikidata content in machine-readable formats for AI development. These openly licensed datasets offer human-moderated content across 300+ languages with full edit histories and citations, enabling developers worldwide to build transparent, verifiable AI systems with comprehensive global knowledge coverage and multilingual support.

AI Alliance Accelerating Open-Source AI Innovation with Llama Stack

We are excited to announce a deeper collaboration between the AI Alliance and Meta’s Llama Stack, marking a significant milestone in advancing open-source AI development. The AI Alliance officially supports Llama Stack as a foundational AI application framework designed to empower developers, enterprises, and partners in building and deploying AI applications with ease and confidence.

DoomArena: A Security Testing Framework for AI Agents

Technical Report

The AI Alliance releases new AI-powered programming language and industrial AI agent framework, adds new Japanese members, and launches AI Alliance Japan  

The AI Alliance announced three developments: Dana, an AI-powered programming language that generates code from natural language descriptions; OpenDXA, an open-source agent framework for industrial AI applications; and AI Alliance Japan, a regional working group with nine founding members including IBM, NEC, and Panasonic focused on sovereign AI development. Dana introduces intent-driven development where developers describe functionality rather than write traditional code, while OpenDXA targets complex industrial workflows with explainable AI. The Japan initiative will focus on manufacturing, semiconductor, and navigation applications, with their first project supporting LLM-jp, Japan's national language model. All projects are open-source and available through the AI Alliance collaboration.

The AI Alliance Forms Non-profit AI Lab and AI Technology & Advocacy Association to Scale Open-Source Innovation 

New legal entities and boards intend to scale the AI Alliance’s mission to support and perform open-source development, open research, education, and advocacy for AI globally.