Trust and Safety Evaluation Projects

Much like other software, generative AI (“GenAI”) models and the AI systems that use them need to be trusted and useful to their users. Evaluation is the key.

Evaluation aims to provide the evidence for gaining users’ trust in models and systems. More specifically, evaluation refers to the capability of measuring and quantifying how a model or system responds to inputs. Are the responses within acceptable bounds, for example free of hate speech and hallucinations, are they useful to users, cost-effective, etc.?

The Trust and Safety Evaluation Projects fill gaps in the current landscape of the taxonomy of different kinds of evaluation, the tools for creating and running evaluations, and leaderboards to address particular categories of user needs.

In particular, we are currently focusing on developer testing as an evaluation problem; how do software developers learn to use and adapt AI evaluation tools, like benchmarks, synthetic data, and statistical analysis, to write focused "unit benchmarks", "integration benchmarks", and "acceptance benchmarks" for generative AI behaviors?