The demand for AI safety and accountability is increasing, but current tests and benchmarks may not be sufficient, as stated in a new report.
Generative AI models, capable of analyzing and outputting various forms of content, are under scrutiny for their tendency to make errors and exhibit unpredictable behavior. To address this, organizations are proposing new benchmarks to evaluate the safety of these models.
Last year, Scale AI established a lab focused on evaluating model alignment with safety guidelines. Recently, NIST and the U.K. AI Safety Institute introduced tools to assess model risks.
However, these existing model evaluation methods may be inadequate, as discovered by the Ada Lovelace Institute (ALI) in their study involving experts from academic labs, civil society, and vendor models producers. The study highlighted limitations in current evaluations, such as being non-exhaustive, easily manipulated, and lacking real-world applicability.
The experts emphasized the importance of rigorous testing for AI models, comparing it to the safety standards in other industries, like pharmaceuticals or automotive. They discussed the deficiencies in current evaluation practices and the challenges faced in assessing AI model safety effectively.
Benchmarks and Red Teaming
The study authors reviewed academic literature to understand the risks posed by AI models and the evaluation methods in use. Interviews with experts revealed disagreements within the industry regarding evaluation methods and benchmarks.
Issues like data contamination, inadequate benchmark selection, and limited red teaming standards were identified, raising concerns about the effectiveness of current evaluations.
Possible Solutions
The pressure to release models quickly and a lack of comprehensive testing have hindered progress in AI evaluations. Public-sector involvement and transparent communication from the evaluation community were proposed as essential steps in improving model safety.
Developing context-specific evaluations tailored to diverse user groups and potential attack scenarios was suggested as a way to enhance model safety. However, achieving absolute safety in AI models remains a challenging task, as models’ safety depends on various contextual factors and safeguards.