Data

Data is the heart of AI. Large scale training corpuses consisting of text, image, audio, video create foundation models. Post-training or tuning data sets enrich these models for specific expert domains, agentic tasks and interactions like function and API calling, human interaction, and to ensure they are safe and trusted.

abstract gradient
  1. Pre-training Data
  2. Post-training Data: for agents and domains
  3. Multi-lingual Data: toward AI for All Languages
  4. Open Trusted Data Initiative
  5. Open Trusted Data Catalog
  6. Validation Pipelines for Data
  7. Processing Pipelines for Data
  8. Docling
  9. Data Prep Kit
  10. Structured Knowledge for Agents
  11. Ally Cat: Getting Started using Your Data in an Application