Return to Articles

AI Alliance: Open Science Community Harnessing the Power of Open-Source AI

News
Kevin ChanKevin Chan
Openly shared AI tools are transforming medicine. chemistry and more

The open science community is harnessing the power of open source AI to accelerate scientific discovery and drive innovation across various disciplines. This vibrant ecosystem thrives on collaboration, leveraging open source AI models, datasets, and tools to drive progress. The work is shared openly, enabling everyone to benefit and build upon existing knowledge.

Meta recently hosted its inaugural Open Source AI Summit for Advancing Scientific Discovery in Austin, TX, to bring to life the value of open science innovation and to deepen connections between the open source AI community and the global scientific research ecosystem. The summit brought together scientists, researchers, and academics for discussions, workshops, and demos of scientific innovations made possible by open source AI.

Take, for example, the groundbreaking research being done by Dr. Xiaoqian Jiang and research teams across the medical system at University of Texas Houston, who are advancing AI research using Meta’s open source AI models. Dr. Bo Wang developed MedSAM and MedSAM 2, built on Segment Anything Model, to analyze large-scale imaging data to automatically detect potential cancer in lymph nodes for earlier patient treatment. UT Health’s AIChat uses Llama to support personalized learning and research initiatives and to foster innovation while handling Protected Health Information (PHI) in a data-secure way.

Mayo Clinic’s Dr Wei Liu is applying Llama in radiation oncology, having developed RadOnc-GPT, a model that automates treatment planning and improves patient care through large dataset analysis and personalized plan generation.

Meta also assisted a number of national labs to support new molecular design and materials discovery, including the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and Los Alamos National Laboratory. The Open Molecules 2025 (OMol25) effort required 6 billion compute hours to create, and is the result of 100 million calculations that simulate the quantum mechanics of atoms and molecules in four key areas chosen for their potential impact on science.

“We’re talking about two orders of magnitude more compute than any kind of academic data set that’s ever been made,” said Dr. Sam Blau, a research scientist at the Lawrence Berkeley National Laboratory who worked with Meta on the project. “It’s going to dramatically change how people do computational chemistry.”

Openly-released assets like Meta’s Llama and Fundamental AI Research (FAIR) models are advancing promising new scientific applications, such as molecular analysis, clinical trials development, scientific literature synthesis, material science, robotics, and more. 

Open science relies on ongoing collaboration to enable everyone to benefit and build upon existing knowledge, and the AI Alliance’s community can help accelerate scientific discovery and innovation with our ongoing commitment to open source AI innovation.

Related Articles

View All

How Can We Test Enterprise AI Applications?

The AI Alliance’s Trust and Safety Focus Area has released version V0.2.0 of the “Achieving Confidence in Enterprise AI Applications” guide, addressing one of the biggest challenges in enterprise adoption of generative AI: how to test probabilistic systems. Traditional enterprise developers are accustomed to deterministic testing, but AI introduces new complexities. The living guide bridges this gap by adapting benchmark techniques into unit, integration, and acceptance benchmarks for AI applications. It shows how to leverage LLMs to generate and validate datasets, reduce randomness in application design, and identify AI “features” that can be developed incrementally in agile workflows. A practical healthcare chatbot example demonstrates how FAQs can be handled deterministically while still using LLMs for flexible input interpretation, balancing trust, safety, and innovation. This release marks a step forward in helping developers confidently design, test, and deploy enterprise-grade AI systems, while inviting broader collaboration from the community.

Mastering Data Cleaning for Fine-Tuning LLMs and RAG Architectures

News

In the rapidly advancing field of artificial intelligence, data cleaning has become a mission-critical step in ensuring the success of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) architectures. This blog emphasizes the importance of high-quality, structured data in preventing AI model hallucinations, reducing algorithmic bias, enhancing embedding quality, and improving information retrieval accuracy. It covers essential AI data preprocessing techniques like deduplication, PII redaction, noise filtering, and text normalization, while spotlighting top tools such as IBM Data Prep Kit, AI Fairness 360, and OpenRefine. With real-world applications ranging from LLM fine-tuning to graph-based knowledge systems, the post offers a practical guide for data scientists and AI engineers looking to optimize performance, ensure ethical compliance, and build scalable, trustworthy AI systems.

Defining Open Source AI: The Road Ahead

News

Open source and open science in AI is a practical, proven approach to enabling access, innovation, trust, and value creation now. Let’s focus on that as we better define it.