Events & Talks
Social Foundations of Computation
Talk
Bilal Zafar
25-11-2025
On Counterfactual Reasoning Abilities of LLMs
Benchmark results suggest that LLMs can match or even surpass human performance across a range of tasks. Do these impressive benchmark statistics reflect genuine understanding? In this talk, I will discuss some ongoing work that probes LLMs’ understanding through their ability to generate and evaluate counterfactual examples. We find that while LLMs are highly accurate on standard versions of benchmarks like GSM8K and FolkTexts, they often struggle to generate counterfactual versions of the inputs. Even when they do, their subsequent prediction often does not agree with their own counterfac...
Moritz Hardt
Social Foundations of Computation
Talk
Elad Hazan
03-11-2025
Learning in Dynamical Systems
Learning in dynamical systems is a fundamental challenge underlying modern sequence modeling. Despite extensive study, efficient algorithms with formal guarantees for general nonlinear systems have remained elusive. This talk presents a provably efficient framework for online learning in any bounded and Lipschitz nonlinear dynamical system, establishing the first sublinear regret guarantees in a dimension-free setting. Our approach combines Koopman lifting, Luenberger observers, and, crucially, spectral filtering to show that nonlinear dynamics are learnable. These insights motivate a new n...
Moritz Hardt
Social Foundations of Computation
Talk
Moritz Hardt
25-09-2025
How benchmarking broke in the LLM era and what to salvage
IMPRS-IS Keynote Lecture by Moritz Hardt
Benchmarking is a process of continual improvement through competitive testing, central to engineering communities. Although benchmarking has long fueled progress in machine learning, there’s a growing crisis about recent generative models. In this talk, I'll discuss the causes of this crisis and how to achieve valid model comparisons—and, by extension, valid model rankings. Currently, different benchmarks yield contradictory comparisons, even when targeting the same task. Multi-task benchmarks exacerbate ranking disagreements, as do attempts to scale up evaluation. Toward diagnosing the pr...
Social Foundations of Computation
Talk
David Blei
30-06-2025
Hierarchical Causal Models
Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on “flat” models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as means) for causal inference. We show that disaggregating your data replacing a flat causal model with a hierarchical causal model can provide new opportunities for identification and estimation. As examples, we will study how to identify and estimate causal effects under unmeasured confounder...
Moritz Hardt
Social Foundations of Computation
Talk
Bryan Wilder
03-04-2025
Predictive vs causal targeting of social interventions
Machine learning is increasingly used to inform which people receive limited interventions in a wide range of domains, including healthcare, human services, education, development, and more. What is the right quantity for such models to predict?
Moritz Hardt
Social Foundations of Computation
Talk
Jason Hartline
24-03-2025
Optimization of Scoring Rules
Scoring rules are everywhere. Any decision problem where an agent has beliefs about an unknown state and takes an action and realizes payoffs according to the action and the realized state is a scoring rule. Behavioral subjects in experiments are evaluated and rewarded according to scoring rules. Machine learning algorithms are trained and evaluated according to scoring rules. Students' coursework is graded according to scoring rules.
Moritz Hardt
Social Foundations of Computation
Talk
Stratis Tsirtsis
11-02-2025
- 11-03-2025
Counterfactual Token Generation in Large Language Models
Imagine the following story, generated by a large language model: "Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth—she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself.” Now, let’s conduct a thought experiment: how would the story have unfolded if the model had chosen “Captain Maeve” as the protagonist instead?
Moritz Hardt
Social Foundations of Computation
Talk
Dr. Krishna P. Gummadi
19-11-2024
Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs)
The talk will begin with a short demo of an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable (in)effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some curiosity-driven questions about LLMs; specifically, how do LLMs receive, process, organize, store, and retrieve information.
Moritz Hardt
Social Foundations of Computation
Talk
Kate Donahue
27-08-2024
AI as a resource: strategy, uncertainty, and societal welfare
In recent years, humanity has been faced with a new resource - artificial intelligence. AI can be a boon to society, or can also have negative impacts, especially with inappropriate use. My research agenda studies the societal impact of AI, particularly focusing on AI as a resource and on the strategic decisions that agents make in deciding how to use it.
Ana-Andreea Stoica
Social Foundations of Computation
Talk
Zachary Robertson
16-07-2024
Towards Scalable Information Elicitation for Oversight in Human-AI Systems
The growing complexity of AI outputs, particularly those generated by large language models, poses challenges for comprehensive human oversight. In this work, we propose a scalable information elicitation mechanism to incentivize truthful and consistent reasoning in human-AI systems. Our approach leverages pre-trained language models to estimate mutual information between agent outputs using the Difference of Entropies (DoE) estimator. Through theoretical analysis, we demonstrate the mechanism's incentive-compatibility properties and examine the scaling laws of its implementability. We eval...
Moritz Hardt
Social Foundations of Computation
Talk
Evimaria Terzi
01-07-2024
Beyond accuracy: understanding the performance of LLMs on exams designed for humans
Many recent studies of LLM performance have focused on the ability of LLMs to achieve outcomes comparable to humans on academic and professional exams. However, it is not clear whether such studies shed light on the extent to which models show reasoning ability, and there is controversy about the significance and implications of such results. We seek to look more deeply into the question of how and whether the performance of LLMs on exams designed for humans reflects true aptitude inherent in LLMs. We do so by making use of the tools of psychometrics which are designed to perform meanin...
Ana-Andreea Stoica
Social Foundations of Computation
Talk
Nathan Kallus
24-06-2024
The Unreasonable Effectiveness of Distributional Reinforcement Learning
Distributional Reinforcement Learning (RL) learns the whole conditional distribution of rewards-to-go, given current state and action, but then only ever looks at the mean (e.g., C51, IQN). While this appears inefficient on its face, empirically it often improves on analogous approaches (e.g., DQN) that directly learn just the conditional mean (i.e., the Q-function). A principled understanding as to why and when this happens has been elusive.
Moritz Hardt
Social Foundations of Computation
Talk
Lili Xu
19-02-2024
High-stakes decisions from low-quality data:
AI decision-making for planetary health
Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, overcoming the messy data ubiquitous in these settings. Here,...
Ana-Andreea Stoica
Social Foundations of Computation
Talk
Fernando P. Santos
28-06-2023
The impact of link recommendation algorithms on opinion dynamics
Online social networks are increasingly central in shaping our political opinions. These are also prime spaces where humans co-exist with AI: algorithms to personalize contents and provide recommendations are pervasive in online platforms. Link recommendation algorithms (also known as social recommendation systems) are used to recommend new connections — e.g., friends or users to follow — based on supposed familiarity, similar interests, or the potential to serve as a source of useful information. These algorithms impact the evolution of social networks’ topology, yet their long-term impact...
Celestine Mendler-Dünner
Social Foundations of Computation
Talk
Prof.Dr. Carsten Eickhoff
05-04-2023
Retrieval-Powered Zero Shot Text Classification
Unstructured data, especially in the form of natural language text, is one of the most prevalent and rapidly growing information types available to humankind. Unlocking the (often hidden) potential of such resources via natural language processing and understanding techniques can greatly support, or altogether enable, an exciting range of downstream applications. In this talk, I will give a brief high-level overview of ongoing NLP and IR efforts in the Health NLP lab, before moving on to an investigation of zero-shot text classification in a diagnostic decision support setting. More than m...
Moritz Hardt