Data
Data is the heart of AI. Large scale training corpuses consisting of text, image, audio, video create foundation models. Post-training or tuning data sets enrich these models for specific expert domains, agentic tasks and interactions like function and API calling, human interaction, and to ensure they are safe and trusted.

- Pre-training Data
- Post-training Data: for agents and domains
- Multi-lingual Data: toward AI for All Languages
- Open Trusted Data Initiative
- Open Trusted Data Catalog
- Validation Pipelines for Data
- Processing Pipelines for Data
- Docling
- Data Prep Kit
- Structured Knowledge for Agents
- Ally Cat: Getting Started using Your Data in an Application