Trust and Safety Evaluations

Project

Much like other software, generative AI (“GenAI”) models and the AI systems that use them need to be trusted and useful to their users. Evaluation is the key.

Evaluation aims to provide the evidence for gaining users’ trust in models and systems. More specifically, evaluation refers to the capability of measuring and quantifying how a model or system responds to inputs. Are the responses within acceptable bounds, for example free of hate speech and hallucinations, are they useful to users, cost-effective, etc.?

The Trust and Safety Evaluations project fills gaps in the current landscape of the taxonomy of different kinds of evaluation, the tools for creating and running evaluations, and leaderboards to address particular categories of user needs.