Open Trusted Data Initiative

Cataloging and managing trustworthy datasets.

A current challenge in AI is the “murky” provenance of many datasets used for training and tuning large language models (LLMs), which raises concerns for model developers and users of the potential for models to output private, confidential, and copyrighted information that might have been part of the training dataset, among other concerns.

OTDI aims to address these concerns with an industry wide effort to gather and process data fully in the open, allowing model developers and users to have full confidence in the provenance and governance of the data they use.

Visit our microsite to learn more about our goals and how you can participate.