Invited Speakers

Speakers

The invited lineup covers the full agent-failure pipeline: mechanisms, closed-loop diagnostics, security, evaluation, and practical fixes.

Yoshua Bengio

Universite de Montreal, LawZero, and Mila

AI safety, frontier model governance, and deep learning foundations

Yoshua Bengio is a Full Professor of Computer Science at Université de Montréal, Founder of Mila (the Quebec AI Institute), and Co-President of LawZero. A recipient of the 2018 Turing Award alongside Geoffrey Hinton and Yann LeCun, his research spans deep learning foundations and, increasingly, AI safety and the governance of frontier systems.

GFlowNetsInternational AI Safety Report

James Zou

Stanford University and CZ Biohub

Tool use failures and automated red-teaming

James Zou is an Associate Professor of Biomedical Data Science, Computer Science, and Electrical Engineering at Stanford. His work focuses on making AI more reliable, human-compatible, and statistically rigorous, with major applications in health and biomedicine.

AvaTaRAutoRedTeamer

Bo Li

University of Illinois Urbana-Champaign and Virtue AI

Agent attack surfaces and tool-chain exploits

Bo Li is the Wexler AI Scholar and an Associate Professor at UIUC, where she works on trustworthy machine learning, AI safety, security, privacy, and robustness. She is also the founder and CEO of Virtue AI.

ShieldAgentAutoRedTeamer

Iryna Gurevych

TU Darmstadt and MBZUAI

Long-horizon drift and error localization

Iryna Gurevych is a Professor of Computer Science at TU Darmstadt and founder of the UKP Lab. Her research spans natural language processing, trustworthy AI, and machine learning methods for robust and interpretable language systems.

OpenFactCheckError localization for long-form QA

Greg Durrett

New York University

Verification gaps and grounded checking

Greg Durrett is an Associate Professor at NYU whose research centers on natural language processing, factuality, verification, and reasoning with language models in knowledge-intensive settings.

MiniCheckMolecular Facts

Samy Bengio

Apple and EPFL

Reasoning brittleness and evaluation artifacts

Samy Bengio is a longtime AI research leader whose work spans large-scale machine learning, reasoning, and evaluation. He has held senior research leadership roles at Apple, Google, and major ML conferences.

The Illusion of ThinkingReasoning's Razor

Nouha Dziri

Allen Institute for AI

Faithfulness failures and feedback grounding

Nouha Dziri is a researcher at AI2 working on large language models with a focus on reasoning limits, faithfulness, post-training, and safety-oriented evaluation.

FaithDialBEGIN

Maarten Sap

Carnegie Mellon University and AI2

Social failure modes and human-agent risk

Maarten Sap is an Assistant Professor at Carnegie Mellon University and a part-time researcher at AI2. His work studies social intelligence, human-centered language systems, safety risks, and how people interact with language agents.

SOTOPIASOTOPIA-pi

Yi Dong

NVIDIA Research

Tool and runtime reliability, plus system regressions

Yi Dong is a principal research scientist at NVIDIA working on reasoning models and virtual agents. His research spans model reliability, prolonged reinforcement learning, and practical agent deployment for enterprise settings.

ProRLProRLv2

Yu Su

The Ohio State University and industry

Web-agent failures and online evaluation gaps

Yu Su is an Associate Professor at The Ohio State University whose work studies grounded language understanding, web agents, interactive systems, and evaluation in realistic environments.

Online-Mind2WebSEEACT

Rishi Bommasani

Stanford HAI

Frontier AI governance and deployment implications

Rishi Bommasani is a Senior Research Scholar at Stanford HAI whose work examines the societal, economic, and governance implications of frontier AI. He will speak on the economics and governance of frontier AI.

Foundation ModelsHELM