Events & Talks

Social Foundations of Computation Talk Bilal Zafar 25-11-2025 On Counterfactual Reasoning Abilities of LLMs Benchmark results suggest that LLMs can match or even surpass human performance across a range of tasks. Do these impressive benchmark statistics reflect genuine understanding? In this talk, I will discuss some ongoing work that probes LLMs’ understanding through their ability to generate and evaluate counterfactual examples. We find that while LLMs are highly accurate on standard versions of benchmarks like GSM8K and FolkTexts, they often struggle to generate counterfactual versions of the inputs. Even when they do, their subsequent prediction often does not agree with their own counterfac... Moritz Hardt

Social Foundations of Computation Talk Elad Hazan 03-11-2025 Learning in Dynamical Systems Learning in dynamical systems is a fundamental challenge underlying modern sequence modeling. Despite extensive study, efficient algorithms with formal guarantees for general nonlinear systems have remained elusive. This talk presents a provably efficient framework for online learning in any bounded and Lipschitz nonlinear dynamical system, establishing the first sublinear regret guarantees in a dimension-free setting. Our approach combines Koopman lifting, Luenberger observers, and, crucially, spectral filtering to show that nonlinear dynamics are learnable. These insights motivate a new n... Moritz Hardt

Social Foundations of Computation Talk Moritz Hardt 25-09-2025 How benchmarking broke in the LLM era and what to salvage IMPRS-IS Keynote Lecture by Moritz Hardt Benchmarking is a process of continual improvement through competitive testing, central to engineering communities. Although benchmarking has long fueled progress in machine learning, there’s a growing crisis about recent generative models. In this talk, I'll discuss the causes of this crisis and how to achieve valid model comparisons—and, by extension, valid model rankings. Currently, different benchmarks yield contradictory comparisons, even when targeting the same task. Multi-task benchmarks exacerbate ranking disagreements, as do attempts to scale up evaluation. Toward diagnosing the pr...

Social Foundations of Computation Talk David Blei 30-06-2025 Hierarchical Causal Models Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on “flat” models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as means) for causal inference. We show that disaggregating your data replacing a flat causal model with a hierarchical causal model can provide new opportunities for identification and estimation. As examples, we will study how to identify and estimate causal effects under unmeasured confounder... Moritz Hardt

Social Foundations of Computation Talk Bryan Wilder 03-04-2025 Predictive vs causal targeting of social interventions Machine learning is increasingly used to inform which people receive limited interventions in a wide range of domains, including healthcare, human services, education, development, and more. What is the right quantity for such models to predict? Moritz Hardt

Social Foundations of Computation Talk Jason Hartline 24-03-2025 Optimization of Scoring Rules Scoring rules are everywhere. Any decision problem where an agent has beliefs about an unknown state and takes an action and realizes payoffs according to the action and the realized state is a scoring rule. Behavioral subjects in experiments are evaluated and rewarded according to scoring rules. Machine learning algorithms are trained and evaluated according to scoring rules. Students' coursework is graded according to scoring rules. Moritz Hardt

Social Foundations of Computation Talk Stratis Tsirtsis 11-02-2025 - 11-03-2025 Counterfactual Token Generation in Large Language Models Imagine the following story, generated by a large language model: "Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth—she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself.” Now, let’s conduct a thought experiment: how would the story have unfolded if the model had chosen “Captain Maeve” as the protagonist instead? Moritz Hardt

Social Foundations of Computation Talk Dr. Krishna P. Gummadi 19-11-2024 Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs) The talk will begin with a short demo of an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable (in)effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some curiosity-driven questions about LLMs; specifically, how do LLMs receive, process, organize, store, and retrieve information. Moritz Hardt

Social Foundations of Computation Talk Kate Donahue 27-08-2024 AI as a resource: strategy, uncertainty, and societal welfare In recent years, humanity has been faced with a new resource - artificial intelligence. AI can be a boon to society, or can also have negative impacts, especially with inappropriate use. My research agenda studies the societal impact of AI, particularly focusing on AI as a resource and on the strategic decisions that agents make in deciding how to use it. Ana-Andreea Stoica

Social Foundations of Computation Talk Zachary Robertson 16-07-2024 Towards Scalable Information Elicitation for Oversight in Human-AI Systems The growing complexity of AI outputs, particularly those generated by large language models, poses challenges for comprehensive human oversight. In this work, we propose a scalable information elicitation mechanism to incentivize truthful and consistent reasoning in human-AI systems. Our approach leverages pre-trained language models to estimate mutual information between agent outputs using the Difference of Entropies (DoE) estimator. Through theoretical analysis, we demonstrate the mechanism's incentive-compatibility properties and examine the scaling laws of its implementability. We eval... Moritz Hardt

Social Foundations of Computation Talk Evimaria Terzi 01-07-2024 Beyond accuracy: understanding the performance of LLMs on exams designed for humans Many recent studies of LLM performance have focused on the ability of LLMs to achieve outcomes comparable to humans on academic and professional exams. However, it is not clear whether such studies shed light on the extent to which models show reasoning ability, and there is controversy about the significance and implications of such results. We seek to look more deeply into the question of how and whether the performance of LLMs on exams designed for humans reflects true aptitude inherent in LLMs. We do so by making use of the tools of psychometrics which are designed to perform meanin... Ana-Andreea Stoica

Social Foundations of Computation Talk Nathan Kallus 24-06-2024 The Unreasonable Effectiveness of Distributional Reinforcement Learning Distributional Reinforcement Learning (RL) learns the whole conditional distribution of rewards-to-go, given current state and action, but then only ever looks at the mean (e.g., C51, IQN). While this appears inefficient on its face, empirically it often improves on analogous approaches (e.g., DQN) that directly learn just the conditional mean (i.e., the Q-function). A principled understanding as to why and when this happens has been elusive. Moritz Hardt

Social Foundations of Computation Talk Lili Xu 19-02-2024 High-stakes decisions from low-quality data:  AI decision-making for planetary health Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, overcoming the messy data ubiquitous in these settings. Here,... Ana-Andreea Stoica

Social Foundations of Computation Talk Fernando P. Santos 28-06-2023 The impact of link recommendation algorithms on opinion dynamics Online social networks are increasingly central in shaping our political opinions. These are also prime spaces where humans co-exist with AI: algorithms to personalize contents and provide recommendations are pervasive in online platforms. Link recommendation algorithms (also known as social recommendation systems) are used to recommend new connections — e.g., friends or users to follow — based on supposed familiarity, similar interests, or the potential to serve as a source of useful information. These algorithms impact the evolution of social networks’ topology, yet their long-term impact... Celestine Mendler-Dünner

Social Foundations of Computation Talk Prof.Dr. Carsten Eickhoff 05-04-2023 Retrieval-Powered Zero Shot Text Classification Unstructured data, especially in the form of natural language text, is one of the most prevalent and rapidly growing information types available to humankind. Unlocking the (often hidden) potential of such resources via natural language processing and understanding techniques can greatly support, or altogether enable, an exciting range of downstream applications. In this talk, I will give a brief high-level overview of ongoing NLP and IR efforts in the Health NLP lab, before moving on to an investigation of zero-shot text classification in a diagnostic decision support setting. More than m... Moritz Hardt