Materials and Chemistry Working Group
We curate datasets, tasks and benchmarks for materials science, build out foundation models in chemistry for prediction of properties, experimental outcomes or generation of new candidates and create a framework to foster collaboration between human experts and AI agents that will ultimately help solve global urgent challenges in sustainability and safety of materials.
Join the Materials and Chemistry Working GroupInterim co-leads
- Dave Braines - IBM Research
- Tomoki Nagai - JSR Corporation
- Seiji Takeda, for Japan and Asia - IBM Research
- Flaviu Cipcigan, for Europe - IBM Research
- Jed Pitera, for North America - IBM Research
- Emilio Vital Brazil, for South America - IBM Research
Projects
- smi-ted - SMILES-based Transformer Encoder-Decoder (SMILES-TED) is an encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants (289M and 8×289M).
- selfies-ted - SMI-SSED (SMILES-SSED) is a Mamba-based encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples, encompassing 4 billion molecular tokens sourced from PubChem. The model is tailored for complex tasks such as quantum property prediction and offers efficient, high-speed inference capabilities.
- mhg-ged - SELFIES-based Transformer Encoder-Decoder (SELFIES-TED) is an encoder-decoder model based on BART that not only learns molecular representations but also auto-regressively generates molecules. Pre-trained on a dataset of ~1B molecules from PubChem and Zinc-22.
- smi-ssed - Molecular Hypergraph Grammar with Graph-based Encoder Decoder (MHG-GED) is an autoencoder that combines a GNN-based encoder with a sequential MHG-based decoder. The GNN encodes molecular input to achieve strong predictive performance on molecular graphs, while the MHG decodes structurally valid molecules. Pre-trained on a dataset of ~1.34M molecules curated from PubChem.
Agenda
- Training fused foundation models.
- Trans-dimensional flow-matching for molecular generation
- Training of GFlowNets for materials generation
- Reproducing AlphaFold 3 capabilities
- Organizing hackathons
Other resources
- [1] Soares, Eduardo, Victor Shirasuna, Emilio Vital Brazil, Renato Cerqueira, Dmitry Zubarev, and Kristin Schmidt. "A Large Encoder-Decoder Family of Foundation Models For Chemical Language." arXiv preprint arXiv:2407.20267 (2024).
- [2] Soares, Eduardo, Akihiro Kishimoto, Victor Shirasuna, Hiroshi Kajino, Emilio Ashton Vital Brazil, Seiji Takeda, and Renato Cerqueira. "A Multi-View approach based on Graphs and Chemical Language Foundation Model for Molecular Properties Prediction." In AAAI Conference on Artificial Intelligence. 2024.
- [3] Soares, Eduardo Almeida, Victor Shirasuna, Emilio Ashton Vital Brazil, Renato Fontoura de Gusmao Cerqueira, Dmitry Zubarev, Tiffany Callahan, and Sara Capponi. "MoLMamba: A Large State-Space-based Foundation Model for Chemistry." In American Chemical Society (ACS) Fall Meeting. 2024.
- [4] Anonymous, 2024. Agnostic Causality-Driven Enhancement of Chemical Foundation Models on Downstream Tasks. NeurIPS 2024 (in preparation)
- [5] Kishimoto, Akihiro, Hiroshi Kajino, Masataka Hirose, Junta Fuchiwaki, Indra Priyadarsini, Lisa Hamada, Hajime Shinohara, Daiju Nakano, and Seiji Takeda. "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network." arXiv preprint arXiv:2309.16374 (2023).
- [6] Takeda, Seiji, Lisa Hamada, Emilio Ashton Vital Brazil, Eduardo Almeida Soares, and Hajime Shinohara. "SELF-BART: A Transformer-based Molecular Representation Model using SELFIES." arXiv (2024).
- [7] SELFIES-TED : A Robust Transformer Model for Molecular Representation using SELFIES, in review for ICLR 2025 [link]
- [8] Priyadarsini, Indra, Vidushi Sharma, Seiji Takeda, Akihiro Kishimoto, Lisa Hamada, and Hajime Shinohara. "Improving Performance Prediction of Electrolyte Formulations with Transformer-based Molecular Representation Model." arXiv preprint arXiv:2406.19792 (2024).
Meeting schedule
Frequently Asked Questions (FAQ)
- How can I join the AI Alliance Working Group for Materials and Chemistry (WG4M)? Contact us via the form below and we’ll add you to our slack channel and mailing list.
- How can I access the open-source models that have already been released? You can find the existing open-source foundation models on GitHub here, with example notebooks, instructions on usage, and links to the model weights on HuggingFace for each of the released models..