Trusted Evals request for proposals

Context

Model evaluation, or ‘evals’, in the generative AI era is simultaneously one of the most important areas of investment but also one of the highest areas of entropy.

Our goals for the Trusted Evals working group are to:

Raise awareness for community efforts around trust and safety – including, but not limited to, work happening globally in various languages and in various domains (e.g. cyber security, CBRNE, etc.). Foster and grow the academic and technical communities in Trustworthy AI. And by extension create a center of mass for domain experts that can help us push beyond where evals stand today and into areas where we don’t have good visibility; and
Drive the development of comprehensive, reliable, and stable tools for model evaluation, where these tools provide repeatable, reproducible, and diverse results, and where these tools are regularly refreshed and ever evolving as we learn about new risks for generative AI and other evaluation concerns. This means, as a community, creating new benchmarks and metrics to address quality, safety, robustness and performance aspects of generative AI models. The approach is to be as broad and diverse as possible with the goal of uncovering new evals in as many domains as possible so we can learn as a community. The goal is NOT to create a standard for model evaluations, but to work closely with MLCommons to help shepherd a subset of these evals into their standardization effort.

About

Generally speaking, evals shine a light on model capabilities (we love to evaluate how well our models can reason) but they also can expose where models could present certain risks of harm. One of the major challenges as evals become more esoteric is that access is limited for experts in various fields that also intersect in GenAI. For example, at Meta there is a team of cyber security experts who also build models for coding and productivity and, in parallel, are building safeguards for things like malicious code generation. For many of CBRNE risks, these experts don’t have a central place to aggregate nor are they working with generative AI experts. Further to this, there isn’t a de facto place today for the open community to evaluate your models across a growing number of potential harms.

Request for proposal

We are seeking new perspectives in the AI evaluation domain. To foster further innovation in this area, we are pleased to invite the community to participate in the AI Alliance’s Trusted Evals working group by submitting a response to our Request for Proposal to be included in the working group’s efforts to:

Raise awareness - selected proposals will be showcased through AI Alliance communications, including our newsletter, blog, whitepapers, and website; and
Drive the development of comprehensive, reliable, and stable tools - The AI Alliance intends to support select project proposals with resources to help teams accelerate progress in building the foundations of safe, trusted AI.

For this RFP, we are excited to work with those in academia, big industry, startups and anyone excited to collaborate in the open and build an ecosystem around their work.

Areas of interest

Cybersecurity threats
Sensitive data detection including areas such as toxic content (e.g. hate speech), personally identifiable information (PII), bias, etc.
Model performance including helpfulness, quality, alignment, robustness, etc., (as opposed to operational concerns like throughput, latency, scalability, etc.)
Knowledge and factuality
Multilingual evaluation
Mediation
Balancing harms and helpfulness
Personal data memorization / data governance
Vertical domains such as legal, financial, medical
Areas related to CBRNE (chemical, biological, radiological, nuclear, and high yield explosives)
Weapons acquisition specifically
Measuring data set integrity when data is created by AI: label fairness, prompt generation for RLHF
Effectiveness of tool use that exacerbates malicious intent
Demographic representation across different countries
Distributionbias
Politicalbias
Capability fairness
Undesirable use cases
Regional bias, discrimination
Violence and hate
Terrorism and sanctioned individuals
Defamation
Misinformation
Guns,illegal weapons, controlled substances

Requirements

Proposals should include a link to a summary of your project, in English, explaining the area of focus, dataset description and any relevant prior work. This includes:

Name of project and abstract
Core team member bios & affiliations
Link to website (if applicable)
Link to GitHub repository
License
Links to whitepapers or publications