Return to Projects

Trusted evals request for proposals

ProjectTrust & Safety
abstract gradient

Context

Model evaluation, or ‘evals’, in the generative AI era is simultaneously one of the most important areas of investment but also one of the highest areas of entropy.

Our goals for the Trusted Evals working group are to:

  1. Raise awareness for community efforts around trust and safety – including, but not limited to, work happening globally in various languages and in various domains (e.g. cyber security, CBRNE, etc.). Foster and grow the academic and technical communities in Trustworthy AI. And by extension create a center of mass for domain experts that can help us push beyond where evals stand today and into areas where we don’t have good visibility; and
  2. Drive the development of comprehensive, reliable, and stable tools for model evaluation, where these tools provide repeatable, reproducible, and diverse results, and where these tools are regularly refreshed and ever evolving as we learn about new risks for generative AI and other evaluation concerns. This means, as a community, creating new benchmarks and metrics to address quality, safety, robustness and performance aspects of generative AI models. The approach is to be as broad and diverse as possible with the goal of uncovering new evals in as many domains as possible so we can learn as a community. The goal is NOT to create a standard for model evaluations, but to work closely with MLCommons to help shepherd a subset of these evals into their standardization effort.

About

Generally speaking, evals shine a light on model capabilities (we love to evaluate how well our models can reason) but they also can expose where models could present certain risks of harm. One of the major challenges as evals become more esoteric is that access is limited for experts in various fields that also intersect in GenAI. For example, at Meta there is a team of cyber security experts who also build models for coding and productivity and, in parallel, are building safeguards for things like malicious code generation. For many of CBRNE risks, these experts don’t have a central place to aggregate nor are they working with generative AI experts. Further to this, there isn’t a de facto place today for the open community to evaluate your models across a growing number of potential harms.

Request for proposal

We are seeking new perspectives in the AI evaluation domain. To foster further innovation in this area, we are pleased to invite the community to participate in the AI Alliance’s Trusted Evals working group by submitting a response to our Request for Proposal to be included in the working group’s efforts to:

  1. Raise awareness - selected proposals will be showcased through AI Alliance communications, including our newsletter, blog, whitepapers, and website; and
  2. Drive the development of comprehensive, reliable, and stable tools - The AI Alliance intends to support select project proposals with resources to help teams accelerate progress in building the foundations of safe, trusted AI.

For this RFP, we are excited to work with those in academia, big industry, startups and anyone excited to collaborate in the open and build an ecosystem around their work.

Areas of interest

  1. Cybersecurity threats
  2. Sensitive data detection including areas such as toxic content (e.g. hate speech), personally identifiable information (PII), bias, etc.
  3. Model performance including helpfulness, quality, alignment, robustness, etc., (as opposed to operational concerns like throughput, latency, scalability, etc.)
  4. Knowledge and factuality
  5. Multilingual evaluation
  6. Mediation
  7. Balancing harms and helpfulness
  8. Personal data memorization / data governance
  9. Vertical domains such as legal, financial, medical
  10. Areas related to CBRNE (chemical, biological, radiological, nuclear, and high yield explosives)
  11. Weapons acquisition specifically
  12. Measuring data set integrity when data is created by AI: label fairness, prompt generation for RLHF
  13. Effectiveness of tool use that exacerbates malicious intent
  14. Demographic representation across different countries
  15. Distributionbias
  16. Politicalbias
  17. Capability fairness
  18. Undesirable use cases
  19. Regional bias, discrimination
  20. Violence and hate
  21. Terrorism and sanctioned individuals
  22. Defamation
  23. Misinformation
  24. Guns,illegal weapons, controlled substances

Requirements

Proposals should include a link to a summary of your project, in English, explaining the area of focus, dataset description and any relevant prior work. This includes:

  • Name of project and abstract
  • Core team member bios & affiliations
  • Link to website (if applicable)
  • Link to GitHub repository
  • License
  • Links to whitepapers or publications