Blog & Articles

Perspectives, news, and technical reports from our community.

Blog Posts & Articles

AI Alliance x National AI Research Resource Pilot Deep Partnership Program

The AI Alliance is joining the National AI Research Resource (NAIRR) Pilot Deep Partnership program to break down barriers to high-performance computing for researchers, educators, and innovators. In collaboration with Mass Open Cloud, Red Hat, and IBM Research, this initiative provides access to GPUs, CPUs, storage, enterprise cloud tools, and open-source AI models—including IBM’s Granite models, InstructLab, Docling, and science-focused foundation models developed with NASA and ESA. Eligible projects can apply by October 1, 2025, to receive resource credits for Core AI or AI for Science tracks, with supported work running through July 2026. More than just free infrastructure, this program empowers researchers and educators to build, adapt, and contribute open-source AI tools that advance both science and society.

How Can We Test Enterprise AI Applications?

The AI Alliance’s Trust and Safety Focus Area has released version V0.2.0 of the “Achieving Confidence in Enterprise AI Applications” guide, addressing one of the biggest challenges in enterprise adoption of generative AI: how to test probabilistic systems. Traditional enterprise developers are accustomed to deterministic testing, but AI introduces new complexities. The living guide bridges this gap by adapting benchmark techniques into unit, integration, and acceptance benchmarks for AI applications. It shows how to leverage LLMs to generate and validate datasets, reduce randomness in application design, and identify AI “features” that can be developed incrementally in agile workflows. A practical healthcare chatbot example demonstrates how FAQs can be handled deterministically while still using LLMs for flexible input interpretation, balancing trust, safety, and innovation. This release marks a step forward in helping developers confidently design, test, and deploy enterprise-grade AI systems, while inviting broader collaboration from the community.

Building a Deep Research Agent Using MCP-Agent

This article by Sarmad Qadri documents the journey of building a Deep Research Agent with MCP-Agent, highlighting the evolution from an initial Orchestrator design, to an over-engineered Adaptive Workflow, and finally to the streamlined Deep Orchestrator. The author emphasizes that “MCP is all you need,” showing how connecting LLMs to MCP servers with simple design patterns enables agents to perform complex, multi-step research tasks. Key lessons include the importance of simplicity over complexity, leveraging deterministic code-based verification alongside LLM reasoning, external memory for efficiency, and structured prompting for clarity. The resulting Deep Orchestrator balances performance, scalability, and adaptability, proving effective across domains like finance research. Future directions include remote execution, intelligent tool and model selection, and treating memory/knowledge as MCP resources. The open-source project, available on GitHub, offers developers a powerful foundation for creating general-purpose AI research agents.

Openly shared AI tools are transforming medicine. chemistry and more

AI Alliance: Open Science Community Harnessing the Power of Open-Source AI

News

The open science community is increasingly using open source AI to accelerate discovery and innovation across disciplines, with collaboration at its core. At Meta’s first Open Source AI Summit for Advancing Scientific Discovery, scientists and researchers showcased breakthroughs made possible by open models like Llama and FAIR. Examples include UT Health’s use of MedSAM for cancer detection and AIChat for secure personalized research, Mayo Clinic’s RadOnc-GPT for radiation oncology, and national labs leveraging massive compute resources for molecular and materials discovery through the OMol25 project. These efforts highlight how openly shared AI tools are transforming medicine, chemistry, and beyond, reinforcing the importance of continued collaboration through initiatives like the AI Alliance.

Open, Trusted Wikimedia Datasets by Wikimedia Enterprise 

The AI Alliance has launched Wikimedia datasets through the Open Trusted Data Initiative, providing structured Wikipedia and Wikidata content in machine-readable formats for AI development. These openly licensed datasets offer human-moderated content across 300+ languages with full edit histories and citations, enabling developers worldwide to build transparent, verifiable AI systems with comprehensive global knowledge coverage and multilingual support.

AI Alliance Accelerating Open-Source AI Innovation with Llama Stack

We are excited to announce a deeper collaboration between the AI Alliance and Meta’s Llama Stack, marking a significant milestone in advancing open-source AI development. The AI Alliance officially supports Llama Stack as a foundational AI application framework designed to empower developers, enterprises, and partners in building and deploying AI applications with ease and confidence.

DoomArena: A Security Testing Framework for AI Agents

Technical Report

The AI Alliance releases new AI-powered programming language and industrial AI agent framework, adds new Japanese members, and launches AI Alliance Japan  

The AI Alliance announced three developments: Dana, an AI-powered programming language that generates code from natural language descriptions; OpenDXA, an open-source agent framework for industrial AI applications; and AI Alliance Japan, a regional working group with nine founding members including IBM, NEC, and Panasonic focused on sovereign AI development. Dana introduces intent-driven development where developers describe functionality rather than write traditional code, while OpenDXA targets complex industrial workflows with explainable AI. The Japan initiative will focus on manufacturing, semiconductor, and navigation applications, with their first project supporting LLM-jp, Japan's national language model. All projects are open-source and available through the AI Alliance collaboration.

The AI Alliance Forms Non-profit AI Lab and AI Technology & Advocacy Association to Scale Open-Source Innovation 

New legal entities and boards intend to scale the AI Alliance’s mission to support and perform open-source development, open research, education, and advocacy for AI globally. 

Screenshot AI Alliance Association Statement June 10 2025

AI Alliance Urges Lawmakers to Rethink the NY RAISE Act

News

LLM-as-a-Judge Without the Headaches: EvalAssist Brings Structure and Simplicity to the Chaos of LLM Output Review

Technical Report

Evaluating AI model outputs at scale is a major challenge for teams using LLMs, especially when assessing nuanced qualities like politeness, fairness, and tone that traditional benchmarks miss. IBM Research has released EvalAssist, an open-source tool that streamlines the "LLM-as-a-Judge" approach, allowing teams to define custom evaluation criteria and apply them at scale using models like GPT-4 or IBM's Granite. The platform offers multiple evaluation strategies including direct assessment and pairwise comparison, while providing transparency through chain-of-thought explanations and bias detection. Built on IBM's Unitxt toolkit, EvalAssist aims to make AI evaluation more rigorous, scalable, and trustworthy for real-world applications.

Mastering Data Cleaning for Fine-Tuning LLMs and RAG Architectures

News

In the rapidly advancing field of artificial intelligence, data cleaning has become a mission-critical step in ensuring the success of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) architectures. This blog emphasizes the importance of high-quality, structured data in preventing AI model hallucinations, reducing algorithmic bias, enhancing embedding quality, and improving information retrieval accuracy. It covers essential AI data preprocessing techniques like deduplication, PII redaction, noise filtering, and text normalization, while spotlighting top tools such as IBM Data Prep Kit, AI Fairness 360, and OpenRefine. With real-world applications ranging from LLM fine-tuning to graph-based knowledge systems, the post offers a practical guide for data scientists and AI engineers looking to optimize performance, ensure ethical compliance, and build scalable, trustworthy AI systems.