Return to Projects
Context
- Raise awareness for community efforts around trust and safety – including, but not limited to, work happening globally in various languages and in various domains (e.g. cyber security, CBRNE, etc.). Foster and grow the academic and technical communities in Trustworthy AI. And by extension create a center of mass for domain experts that can help us push beyond where evals stand today and into areas where we don’t have good visibility; and
- Drive the development of comprehensive, reliable, and stable tools for model evaluation, where these tools provide repeatable, reproducible, and diverse results, and where these tools are regularly refreshed and ever evolving as we learn about new risks for generative AI and other evaluation concerns. This means, as a community, creating new benchmarks and metrics to address quality, safety, robustness and performance aspects of generative AI models. The approach is to be as broad and diverse as possible with the goal of uncovering new evals in as many domains as possible so we can learn as a community. The goal is NOT to create a standard for model evaluations, but to work closely with MLCommons to help shepherd a subset of these evals into their standardization effort.
About
Request for proposal
- Raise awareness - selected proposals will be showcased through AI Alliance communications, including our newsletter, blog, whitepapers, and website; and
- Drive the development of comprehensive, reliable, and stable tools - The AI Alliance intends to support select project proposals with resources to help teams accelerate progress in building the foundations of safe, trusted AI.
Areas of interest
- Cybersecurity threats
- Sensitive data detection including areas such as toxic content (e.g. hate speech), personally identifiable information (PII), bias, etc.
- Model performance including helpfulness, quality, alignment, robustness, etc., (as opposed to operational concerns like throughput, latency, scalability, etc.)
- Knowledge and factuality
- Multilingual evaluation
- Mediation
- Balancing harms and helpfulness
- Personal data memorization / data governance
- Vertical domains such as legal, financial, medical
- Areas related to CBRNE (chemical, biological, radiological, nuclear, and high yield explosives)
- Weapons acquisition specifically
- Measuring data set integrity when data is created by AI: label fairness, prompt generation for RLHF
- Effectiveness of tool use that exacerbates malicious intent
- Demographic representation across different countries
- Distributionbias
- Politicalbias
- Capability fairness
- Undesirable use cases
- Regional bias, discrimination
- Violence and hate
- Terrorism and sanctioned individuals
- Defamation
- Misinformation
- Guns,illegal weapons, controlled substances
Requirements
- Name of project and abstract
- Core team member bios & affiliations
- Link to website (if applicable)
- Link to GitHub repository
- License
- Links to whitepapers or publications