AI Evaluation: From Traditional Paradigms to New Avenues in the Era of General-Purpose AI

09 April 2025

15:00 - 17:00

Location:

Conference Hall, Building C, Area Science Park, Padriciano 99, Trieste

Speaker:

Lorenzo Pacchiardi, Research Associate at the Leverhulme Centre for the Future of Intelligence (University of Cambridge)

AI evaluation is a booming field, increasingly focused on modern general-purpose AI. However, current evaluation practices remain largely rooted in methodologies originally designed for narrow, single-task AI systems. In this talk, I present a recent work surveying the landscape of AI evaluation, revealing distinct paradigms, clarifying their underlying goals, and exposing gaps in existing methodologies. I then present two works introducing novel approaches designed to address challenges particularly relevant to general-purpose AI. The first is a benchmarking framework that simultaneously measures a model’s performance and assesses the predictability of that performance—shifting the focus toward anticipating errors in high-stakes scenarios. The second features an evaluation battery annotated with detailed rubrics that generates comprehensive ability profiles for large language models, providing deeper insights into their strengths, limitations, and operational reliability.

Share the event

Copy link

x-twitter

facebook