Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Conference Paper A data and task-constrained mechanistic model of the mouse outer retina shows robustness to contrast variations Kadhim, K. L., Beck, J., Huang, Z., Macke, J. H., Rieke, F., Euler, T., Deistler, M., Berens, P. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) bioRxiv BibTeX

Empirical Inference Conference Paper Are Language Models Efficient Reasoners? A Perspective from Logic Programming Opedal, A., Zengaffinen, Y., Shirakami, H., Pasti, C., Sachan, M., Saparov, A., Cotterell, R., Schölkopf, B. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection Lalwani*, A., Kim*, T., Chopra, L., Hahn, C., Jin, Z., Sachan, M. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 132-147, (Editors: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh), The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, IJCNLP & AACL, December 2025, *equal contribution (Published)
Translating natural language into formal language such as First-Order Logic (FOL) is a foundational challenge in NLP with wide-ranging applications in automated reasoning, misinformation tracking, and knowledge validation. In this paper, we introduce Natural Language to First-Order Logic (NL2FOL), a framework to autoformalize natural language to FOL step-by-step using Large Language Models (LLMs). Our approach addresses key challenges in this translation process, including the integration of implicit background knowledge. By leveraging structured representations generated by NL2FOL, we use Satisfiability Modulo Theory (SMT) solvers to reason about the logical validity of natural language statements. We present logical fallacy detection as a case study to evaluate the efficacy of NL2FOL. Being neurosymbolic, our approach also provides interpretable insights into the reasoning process and demonstrates robustness without requiring model fine-tuning or labeled training data. Our framework achieves good performance on multiple datasets{--}on the Logic dataset, NL2FOL achieves an F1-score of 78{\%}, while generalizing effectively to the LogicClimate dataset with an F1-score of 80{\%}.
DOI URL BibTeX

Empirical Inference Conference Paper CauSciBench: Assessing LLM Causal Reasoning for Scientific Research Acharya, S., Zhang, T. J., Kim, A., Haghighat, A., Sun, X., Shrestha, R. B., Mordig, M., Danisman, F., Jose, C., Qi, Y., Cobben, P., Schölkopf, B., Sachan, M., Jin, Z. NeurIPS 2025: 5th Workshop on Mathematical Reasoning and AI (Math-AI) and CauScien Workshop, December 2025 (Published) URL BibTeX

Empirical Inference Conference Paper Counterfactual reasoning: an analysis of in-context emergence Miller, M., Schölkopf, B., Guo, S. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Cultural Alien Sampler: Open-ended art generation balancing originality and coherence Hernandez, A., Yakura, H., Brinkmann, L., Sola, M. C., Alhaija, H. A., Serna, I., Rahaman, N., Schölkopf, B., Rahwan, I. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, Creative AI Track, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Do-PFN: In-Context Learning for Causal Effect Estimation Robertson*, J., Reuter*, A., Guo, S., Hollmann, N., Hutter, F., Schölkopf, B. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025, *equal contribution (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models Vetter, J., Gloeckler, M., Gedon, D., Macke, J. H. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators Moss, G., Muhle, L. S., Drews, R., Macke, J. H., Schröder, C. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Forecasting in Offline Reinforcement Learning for Non-stationary Environments Ada, S. E., Martius, G., Ugur, E., Oztop, E. In Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Identifying multi-compartment Hodgkin-Huxley models with high-density extracellular voltage recordings Tanoh, I. C., Deistler, M., Macke, J. H., Linderman, S. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries Ceraolo*, R., Kharlapenko*, D., Khan*, A., Reymond, A., Mihalcea, R., Schölkopf, B., Sachan, M., Jin, Z. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 534-563, (Editors: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh), The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, IJCNLP & AACL, December 2025, *equal contribution (Published)
Recent progress in Large Language Model (LLM) technology has changed our role in interacting with these models. Instead of primarily testing these models with questions we already know answers to, we are now using them for queries where the answers are unknown to us, driven by human curiosity. This shift highlights the growing need to understand curiosity-driven human questions {--} those that are more complex, open-ended, and reflective of real-world needs. To this end, we present Quriosity, a collection of 13K naturally occurring questions from three diverse sources: human-to-search-engine queries, human-to-human interactions, and human-to-LLM conversations. Our comprehensive collection enables a rich understanding of human curiosity across various domains and contexts. Our analysis reveals a significant presence of causal questions (up to 42{\%}) in the dataset, for which we develop an iterative prompt improvement framework to identify all causal queries and examine their unique linguistic properties, cognitive complexity and source distribution. We also lay the groundwork for exploring efficient identifiers of causal questions, providing six efficient classification models.
DOI URL BibTeX

Empirical Inference Conference Paper Reparameterized LLM Training via Orthogonal Equivalence Transformation Qiu, Z., Buchholz, S., Xiao, T., Dax, M., Schölkopf, B., Liu, W. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Root Cause Analysis of Outliers with Missing Structural Knowledge Orchard, W. R., Okati, N., Garrido Mejia, S., Blöbaum, P., Janzing, D. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper SPARTAN: A Sparse Transformer World Model Attending to What Matters Lei, A., Schölkopf, B., Posner, I. Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 39th Annual Conference on Neural Information Processing Systems, December 2025 (Accepted) arXiv BibTeX

Empirical Inference Conference Paper Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models Choi*, Y., Li*, C., Yang, Y., Jin, Z. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 28895-28928, (Editors: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet), Association for Computational Linguistics, EMNLP, November 2025, *equal contribution (Published)
As large language models (LLMs) are increasingly integrated into multi-agent and human-AI systems, understanding their awareness of both self-context and conversational partners is essential for ensuring reliable performance and robust safety. While prior work has extensively studied situational awareness which refers to an LLM’s ability to recognize its operating phase and constraints, it has largely overlooked the complementary capacity to identify and adapt to the identity and characteristics of a dialogue partner. In this paper, we formalize this latter capability as interlocutor awareness and present the first systematic evaluation of its emergence in contemporary LLMs. We examine interlocutor inference across three dimensions—reasoning patterns, linguistic style, and alignment preferences—and show that LLMs reliably identify same-family peers and certain prominent model families, such as GPT and Claude. To demonstrate its practical significance, we develop three case studies in which interlocutor awareness both enhances multi-LLM collaboration through prompt adaptation and introduces new alignment and safety vulnerabilities, including reward-hacking behaviors and increased jailbreak susceptibility. Our findings highlight the dual promise and peril of identity—sensitive behavior in LLMs, underscoring the need for further understanding of interlocutor awareness and new safeguards in multi-agent deployments.
DOI URL BibTeX

Empirical Inference Conference Paper Are Language Models Consequentialist or Deontological Moral Reasoners? Samway, K., Kleiman-Weiner, M., Guzman Piedrahita, D., Mihalcea, R., Schölkopf, B., Jin, Z. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 30699-30726, (Editors: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet), Association for Computational Linguistics, EMNLP, November 2025 (Published)
As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments."
DOI URL BibTeX

Empirical Inference Conference Paper Improving Large Language Model Safety with Contrastive Representation Learning Simko, S., Sachan, M., Schölkopf, B., Jin, Z. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 28166-28194, (Editors: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet), Association for Computational Linguistics, November 2025 (Published) arXiv DOI URL BibTeX

Empirical Inference Conference Paper Orthogonal Finetuning Made Scalable Qiu*, Z., Liu*, W., Weller, A., Schölkopf, B. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 31946-31963, (Editors: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet), Association for Computational Linguistics, EMNLP, November 2025, *equal contribution (Published)
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley{--}Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in the Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.
DOI URL BibTeX

Empirical Inference Article In silico biological discovery with large perturbation models Miladinovic*, D., Höppe*, T., Chevalley, M., Georgiou, A., Stuart, L., Mehrjou, A., Bantscheff, M., Schölkopf, B., Schwab, P. Nature Computational Science, October 2025, *equal contribution (Published)
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks—from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here we present the large perturbation model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene–gene interaction networks. LPM learns meaningful joint representations of perturbations, readouts and contexts, enables the study of biological relationships in silico and could considerably accelerate the derivation of insights from pooled perturbation experiments.
DOI URL BibTeX

Empirical Inference Conference Paper Corrupted by reasoning: Reasoning language models become free-riders in public goods games Guzman Piedrahita, D., Yang, Y., Sachan, M., Ramponi, G., Schölkopf, B., Jin, Z. Second Conference on Language Modeling (COLM 2025), October 2025 (Published) arXiv URL BibTeX

Haptic Intelligence Autonomous Learning Empirical Inference Conference Paper Adding Internal Audio Sensing to Internal Vision Enables Human-Like In-Hand Fabric Recognition with Soft Robotic Fingertips Andrussow, I., Solano, J., Richardson, B. A., Martius, G., Kuchenbecker, K. J. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 373-380, Seoul, South Korea, September 2025 (Published)
Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. It is challenging to reproduce this rich, dynamic perceptual capability in robots because tactile sensors typically cannot achieve both high spatial resolution and high temporal sampling rate. In this work, we present a system that can sense both types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand's middle finger and thumb each feature a soft tactile sensor: one is the open- source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Our results test the influence of each sensing modality on overall classification performance, showing high utility for the audio-based sensor. Our transformer-based method achieves a maximum fabric classification accuracy of 97% on a dataset of 20 common fabrics. Incorporating an external microphone away from Minsound increases our method's robustness in loud ambient noise conditions. To show that this audio-visual tactile sensing approach generalizes beyond the training data, we learn general representations of fabric stretchiness, thickness, and roughness.
DOI BibTeX

Empirical Inference Conference Paper Active Fine-Tuning of Multi-Task Policies Bagatella, M., Hübotter, J., Martius, G., Krause, A. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:2409-2441, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Deep Models and Optimization Conference Paper Generalized Interpolating Discrete Diffusion von Rütte, D., Fluri, J., Ding, Y., Orvieto, A., Schölkopf, B., Hofmann, T. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:61810-61843, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Generative Intervention Models for Causal Perturbation Modeling Schneider, N., Lorch, L., Kilbertus, N., Schölkopf, B., Krause, A. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:53388-53412, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Kekić, A., Garrido Mejia, S., Schölkopf, B. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:29651-29669, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Position: Probabilistic Modelling is Sufficient for Causal Inference Mlodozeniec, B. K., Krueger, D., Turner, R. E. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:81810-81840, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Probabilistic Machine Learning for Real-Time Gravitational-Wave Inference Dax, M. Eberhard Karls Universität Tübingen, July 2025, (MPI IS + ELLIS Institute T{\"u}bingen) (Published) BibTeX

Empirical Inference Conference Paper Progressive Tempering Sampler with Diffusion Rissanen*, S., OuYang*, R., He*, J., Chen, W., Heinonen, M., Solin, A., Hernández-Lobato, J. M. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:51724-51746, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Autonomous Learning Conference Paper SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models Sancaktar, C., Gumbsch, C., Zadaianchuk, A., Kolev, P., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52745-52777, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), International Conference on Machine Learning , July 2025 (Published) arXiv Project website URL BibTeX

Empirical Inference Conference Paper Scalable Gaussian Processes with Latent Kronecker Structure Lin, J. A., Ament, A., Balandat, M., Eriksson, D., Hernández-Lobato, J. M., Bakshy, E. Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:37730-37744, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published) arXiv URL BibTeX

Autonomous Learning Empirical Inference Conference Paper Zero-Shot Offline Imitation Learning via Optimal Transport Rupf, T., Bagatella, M., Gürtler, N., Frey, J., Martius, G. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 267:52345-52381, Proceedings of Machine Learning Research, (Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry), PMLR, International Conference on Machine Learning, July 2025 (Published)
Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.
arXiv URL BibTeX

Empirical Inference Article Flow annealed importance sampling bootstrap meets differentiable particle physics Kofler, A., Stimper, V., Mikhasenko, M., Kagan, M., Heinrich, L. Machine Learning: Science and Technology, 6(2), IOP Publishing, June 2025 (Published)
High-energy physics requires the generation of large numbers of simulated data samples from complex but analytically tractable distributions called matrix elements. Surrogate models, such as normalizing flows, are gaining popularity for this task due to their computational efficiency. We adopt an approach based on Flow Annealed importance sampling Bootstrap (FAB) that evaluates the differentiable target density during training and helps avoid the costly generation of training data in advance. We show that FAB reaches higher sampling efficiency with fewer target evaluations in high dimensions in comparison to other methods.
DOI URL BibTeX

Empirical Inference Conference Paper Temporally Consistent Object-Centric Learning by Contrasting Slots Manasyan, A., Seitzer, M., Radovic, F., Martius, G., Zadaianchuk, A. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5401-5411, June 2025 (Published) DOI BibTeX

Empirical Inference Conference Paper VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models Ye, M., Liu, W., He, P. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8679-8688, June 2025 (Published) DOI BibTeX

Empirical Inference Perceiving Systems Conference Paper ChatHuman: Chatting about 3D Humans with Tools Lin, J., Feng, Y., Liu, W., Black, M. J. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8150-8161, June 2025 (Published)
Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including 3D pose, shape, contact, human-object interaction, and emotion. While widely applicable in vision and other areas, such methods require expert knowledge to select, use, and interpret the results. To address this, we introduce ChatHuman, a language-driven system that integrates the capabilities of specialized methods into a unified framework. ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks, adeptly discussing and resolving related challenges. Built on a Large Language Model (LLM) framework, ChatHuman is trained to autonomously select, apply, and interpret a diverse set of tools in response to user inputs. Our approach overcomes significant hurdles in adapting LLMs to 3D human tasks, including the need for domain-specific knowledge and the ability to interpret complex 3D outputs. The innovations of ChatHuman include leveraging academic publications to instruct the LLM on tool usage, employing a retrieval-augmented generation model to create in-context learning examples for managing new tools, and effectively discriminating between and integrating tool results by transforming specialized 3D outputs into comprehensible formats. Experiments demonstrate that ChatHuman surpasses existing models in both tool selection accuracy and overall performance across various 3D human tasks, and it supports interactive chatting with users. ChatHuman represents a significant step toward consolidating diverse analytical methods into a unified, robust system for 3D human tasks.
project pdf Paper DOI BibTeX

Empirical Inference Conference Paper Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation Sanyal, A., Hu, Y., Yu, Y., Ma, Y., Wang, Y., Schölkopf, B. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:2170-2178, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning Lin, J. University of Cambridge, UK, May 2025, (Cambridge-T{\"u}bingen-Fellowship-Program) (Published) BibTeX

Empirical Inference Ph.D. Thesis The Geometry of Learning Via Loss Landscape Curvature Singh, S. P. ETH Zurich, Switzerland, May 2025, CLS Fellowship Program (Published) BibTeX

Empirical Inference Conference Paper Training Neural Samplers with Reverse Diffusive KL Divergence He*, J., Chen*, W., Zhang*, M., Barber, D., Hernández-Lobato, J. M. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:5167-5175, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector Zhang, A., Xiao, T. Z., Liu, W., Bamler, R., Wischik, D. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:2701-2709, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025 (Published) URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Advancing Out-of-Distribution Detection via Local Neuroplasticity Canevaro, A., Schmidt, J., Marvi, M. S., Yu, H., Martius, G., Jordan, J. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Can Large Language Models Understand Symbolic Graphics Programs? Qiu, Z., Liu, W., Feng, H., Liu, Z., Xiao, T. Z., Collins, K. M., Tenenbaum, J. B., Weller, A., Black, M. J., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)
Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM’s ability to answer semantic questions about the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to “imagine” and reason how the corresponding graphics content would look with only the symbolic description of the local curvatures and strokes. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability – Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM’s understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks.
arXiv Paper BibTeX

Empirical Inference Conference Paper Compositional simulation-based inference for time series Gloeckler*, M., Toyota*, S., Fukumizu, K., Macke, J. H. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Robust Machine Learning Conference Paper Cross-Entropy Is All You Need to Invert the Data Generating Process Reizinger*, P., Bizeul*, A., Juhos*, A., Vogt, J. E., Balestriero, R., Brendel, W., Klindt, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *Joint first authorship (Published) arXiv BibTeX

Empirical Inference Conference Paper Differentially private steering for Large language model alignment Goel, A., Hu, Y., Gurevych, I., Sanyal, A. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets Liu, Z., Xiao, T. Z., Liu, W., Bengio, Y., Zhang, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)
While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. Inspired by recent successes in generative flow networks (GFlowNets), a class of probabilistic models that sample with the unnormalized density of a reward function, we propose a novel GFlowNet method dubbed Nabla-GFlowNet (abbreviated as ∇-GFlowNet), the first GFlowNet method that leverages the rich signal in reward gradients, together with an objective called ∇-DB plus its variant residual ∇-DB designed for prior-preserving diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.
arXiv BibTeX