On Counterfactual Reasoning Abilities of LLMs
Benchmark results suggest that LLMs can match or even surpass human performance across a range of tasks. Do these impressive benchmark statistics reflect genuine understanding? In this talk, I will discuss some ongoing work that probes LLMs’ understanding through their ability to generate and evaluate counterfactual examples. We find that while LLMs are highly accurate on standard versions of benchmarks like GSM8K and FolkTexts, they often struggle to generate counterfactual versions of the inputs. Even when they do, their subsequent prediction often does not agree with their own counterfactual reasoning. We also find that hidden states of LLMs can be indicative of the counterfactual performance.
Speaker Biography
Bilal Zafar (Ruhr University Bochum and Research Center for Trustworthy Data Science and Security)
Professor of Computer Science
Bilal Zafar is a Professor at Ruhr University Bochum and UAR Research Center for Trustworthy Data Science and Security. He is also a Principal Investigator at the CASA Cluster of Excellence. Previously, he worked as a Senior Applied Scientist at Amazon Web Services and as a Research Scientist at Bosch Center for AI. His work strives to bring together insights from multiple disciplines to build a truly useful and practical understanding of AI/ML trustworthiness. His work has been recognized with a Best Paper Honorable Mention award at WWW and an Otto Hahn Medal from the Max Planck Society.