Events & Talks

Deep Models and Optimization Talk Upcoming 28-11-2025 How does data shape learning in LLMs? A case study of factual recall and the surprising role of data diversity (by Nicolas Zucchet, ETH Zurich) Data drives LLM training, yet we have limited scientific understanding of how it shapes learning dynamics and thus the final model. This talk, based on two recent works [1 <https://arxiv.org/abs/2503.21676> ][2 <https://arxiv.org/abs/2505.17863> ] will examine these questions with a focus on factual recall. We will begin by analyzing how LLMs learn a synthetic factual recall task, that serves as a test bed for knowledge acquisition and where we can precisely control data distribution properties. Our experiments reveal that learning proceeds in distinct stages, and, surprisingly, that skewe... Antonio Orvieto
Thumb ticker sm photo nicolas zucchet
AI Safety and Alignment Talk Upcoming 14-11-2025 Lessons from the Impossibility of Safety What kind of results is impossible for safety research, and what pathways forward can we hope to achieve? First, we will discuss theoretical results on rule-following that demonstrate token-level jailbreaks as an architectural inevitability of attention (LogicBreaks). While initially pessimistic, these theoretical insights can also be leveraged to steer models to state of the art performance in five lines of code (InstABoost). Lastly, we will argue for a shift in safety strategy away from aligning model weights to stateful monitoring, as the only level at which one can hope to stop misuse (... Maksym Andriushchenko
Thumb ticker sm wong eric web
AI Safety and Alignment Talk 13-10-2025 "Stress Testing Deliberative Alignment for Anti-Scheming Training" by Alexander Meinke (Tübingen) A core challenge in aligning powerful, goal-directed AI is the convergent incentive for an agent to preserve its own objectives against modification. A sufficiently capable model may therefore learn to 'scheme' by strategically appearing aligned when under oversight in order to avoid goal modification. In our latest work, we collaborated with OpenAI to study whether we can train models not to scheme by teaching o3 and o4-mini to avoid covert actions through deliberative alignment. The training reduces but doesn't eliminate covert behavior, and we show that some of the improvement comes from... Maksym Andriushchenko
Thumb ticker sm photo am
Social Foundations of Computation Talk 25-09-2025 How benchmarking broke in the LLM era and what to salvage IMPRS-IS Keynote Lecture by Moritz Hardt Benchmarking is a process of continual improvement through competitive testing, central to engineering communities. Although benchmarking has long fueled progress in machine learning, there’s a growing crisis about recent generative models. In this talk, I'll discuss the causes of this crisis and how to achieve valid model comparisons—and, by extension, valid model rankings. Currently, different benchmarks yield contradictory comparisons, even when targeting the same task. Multi-task benchmarks exacerbate ranking disagreements, as do attempts to scale up evaluation. Toward diagnosing the pr...
Thumb ticker sm 2025 audience 1