Materials and Chemistry Working Group

We curate datasets, tasks and benchmarks for materials science, build out foundation models in chemistry for prediction of properties, experimental outcomes or generation of new candidates and create a framework to foster collaboration between human experts and AI agents that will ultimately help solve global urgent challenges in sustainability and safety of materials.

Join the Materials and Chemistry Working Group

Interim co-leads

Projects

The following open source models are available on GitHub:

  • smi-ted - SMILES-based Transformer Encoder-Decoder (SMILES-TED) is an encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants (289M and 8×289M).
  • selfies-ted - SMI-SSED (SMILES-SSED) is a Mamba-based encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples, encompassing 4 billion molecular tokens sourced from PubChem. The model is tailored for complex tasks such as quantum property prediction and offers efficient, high-speed inference capabilities.
  • mhg-ged - SELFIES-based Transformer Encoder-Decoder (SELFIES-TED) is an encoder-decoder model based on BART that not only learns molecular representations but also auto-regressively generates molecules. Pre-trained on a dataset of ~1B molecules from PubChem and Zinc-22.
  • smi-ssed - Molecular Hypergraph Grammar with Graph-based Encoder Decoder (MHG-GED) is an autoencoder that combines a GNN-based encoder with a sequential MHG-based decoder. The GNN encodes molecular input to achieve strong predictive performance on molecular graphs, while the MHG decodes structurally valid molecules. Pre-trained on a dataset of ~1.34M molecules curated from PubChem.

… and we have plans for more models to be released soon.

Agenda

We invite you to be part of our ever-growing community of innovators, researchers, and industry experts who share a common goal: to develop safe, scientifically reliable, and accessible AI-powered tools for materials discovery and chemistry.

The following projects have been proposed and are being considered, and we’re looking for volunteers to join or co-lead in these areas:

  • Training fused foundation models.
  • Trans-dimensional flow-matching for molecular generation
  • Training of GFlowNets for materials generation
  • Reproducing AlphaFold 3 capabilities
  • Organizing hackathons

Other resources

The IBM team have released 3 foundation models at github.com/ibm/materials and we have plans for further releases in 2024 and 2025. These are supported by these publications and more:

  • [1] Soares, Eduardo, Victor Shirasuna, Emilio Vital Brazil, Renato Cerqueira, Dmitry Zubarev, and Kristin Schmidt. "A Large Encoder-Decoder Family of Foundation Models For Chemical Language." arXiv preprint arXiv:2407.20267 (2024).
  • [2] Soares, Eduardo, Akihiro Kishimoto, Victor Shirasuna, Hiroshi Kajino, Emilio Ashton Vital Brazil, Seiji Takeda, and Renato Cerqueira. "A Multi-View approach based on Graphs and Chemical Language Foundation Model for Molecular Properties Prediction." In AAAI Conference on Artificial Intelligence. 2024.
  • [3] Soares, Eduardo Almeida, Victor Shirasuna, Emilio Ashton Vital Brazil, Renato Fontoura de Gusmao Cerqueira, Dmitry Zubarev, Tiffany Callahan, and Sara Capponi. "MoLMamba: A Large State-Space-based Foundation Model for Chemistry." In American Chemical Society (ACS) Fall Meeting. 2024.
  • [4] Anonymous, 2024. Agnostic Causality-Driven Enhancement of Chemical Foundation Models on Downstream Tasks. NeurIPS 2024 (in preparation)
  • [5] Kishimoto, Akihiro, Hiroshi Kajino, Masataka Hirose, Junta Fuchiwaki, Indra Priyadarsini, Lisa Hamada, Hajime Shinohara, Daiju Nakano, and Seiji Takeda. "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network." arXiv preprint arXiv:2309.16374 (2023).
  • [6] Takeda, Seiji, Lisa Hamada, Emilio Ashton Vital Brazil, Eduardo Almeida Soares, and Hajime Shinohara. "SELF-BART: A Transformer-based Molecular Representation Model using SELFIES." arXiv (2024).
  • [7] SELFIES-TED : A Robust Transformer Model for Molecular Representation using SELFIES, in review for ICLR 2025 [link]
  • [8] Priyadarsini, Indra, Vidushi Sharma, Seiji Takeda, Akihiro Kishimoto, Lisa Hamada, and Hajime Shinohara. "Improving Performance Prediction of Electrolyte Formulations with Transformer-based Molecular Representation Model." arXiv preprint arXiv:2406.19792 (2024).

Meeting schedule

We have virtual meetings approximately every two weeks separately in Japan and Europe/America timezones. We also arrange workshops and social events at relevant large conferences (ICML, ACS etc). For example, we organized a social event called Breaking Silos: Open Collaboration for AI x Science at NeurIPS 2024.

Frequently Asked Questions (FAQ)

  • How can I join the AI Alliance Working Group for Materials and Chemistry (WG4M)? Contact us via the form below and we’ll add you to our slack channel and mailing list.
  • How can I access the open-source models that have already been released? You can find the existing open-source foundation models on GitHub here, with example notebooks, instructions on usage, and links to the model weights on HuggingFace for each of the released models..

Read more about the Materials and Chemistry working group and our foundation models:

Join the Materials and Chemistry Working Group

By submitting this form, you agree that the AI Alliance will collect and process the personal information you provide to keep you informed about AI Alliance initiatives and enable your involvement in AI Alliance activities. Additionally, you agree that the AI Alliance may share the personal information you provide with its member organizations so that they may communicate with you about AI Alliance initiatives and your involvement in AI Alliance activities.

You may withdraw your consent for the processing of your personal information by the AI Alliance. Please contact us to request a permanent deletion.