Publications

Empirical Inference Conference Paper Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector Zhang, A., Xiao, T. Z., Liu, W., Bamler, R., Wischik, D. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 258:2701-2709, Proceedings of Machine Learning Research, (Editors: Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz), PMLR, May 2025 (Published) URL BibTeX

Empirical Inference Autonomous Learning Conference Paper Advancing Out-of-Distribution Detection via Local Neuroplasticity Canevaro, A., Schmidt, J., Marvi, M. S., Yu, H., Martius, G., Jordan, J. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Can Large Language Models Understand Symbolic Graphics Programs? Qiu, Z., Liu, W., Feng, H., Liu, Z., Xiao, T. Z., Collins, K. M., Tenenbaum, J. B., Weller, A., Black, M. J., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)

Abstract ›

Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM’s ability to answer semantic questions about the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to “imagine” and reason how the corresponding graphics content would look with only the symbolic description of the local curvatures and strokes. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability – Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM’s understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks.

arXiv Paper BibTeX

Empirical Inference Conference Paper Compositional simulation-based inference for time series Gloeckler*, M., Toyota*, S., Fukumizu, K., Macke, J. H. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Learning and Dynamical Systems Conference Paper Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering Kladny, K., Schölkopf, B., Muehlebach, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv URL BibTeX

Empirical Inference Robust Machine Learning Conference Paper Cross-Entropy Is All You Need to Invert the Data Generating Process Reizinger*, P., Bizeul*, A., Juhos*, A., Vogt, J. E., Balestriero, R., Brendel, W., Klindt, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *Joint first authorship (Published) arXiv BibTeX

Empirical Inference Conference Paper Differentially private steering for Large language model alignment Goel, A., Hu, Y., Gurevych, I., Sanyal, A. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Perceiving Systems Conference Paper Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets Liu, Z., Xiao, T. Z., Liu, W., Bengio, Y., Zhang, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published)

Abstract ›

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. Inspired by recent successes in generative flow networks (GFlowNets), a class of probabilistic models that sample with the unnormalized density of a reward function, we propose a novel GFlowNet method dubbed Nabla-GFlowNet (abbreviated as ∇-GFlowNet), the first GFlowNet method that leverages the rich signal in reward gradients, together with an objective called ∇-DB plus its variant residual ∇-DB designed for prior-preserving diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.

arXiv BibTeX

Empirical Inference Robust Machine Learning Conference Paper Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning Reizinger, P., Guo, S., Huszár, F., Schölkopf, B., Brendel, W. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Conference Paper Improving Probabilistic Diffusion Models With Optimal Covariance Matching Ou*, Z., Zhang*, M., Zhang, A., Xiao, T. Z., Li, Y., Barber, D. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper Influence Functions for Scalable Data Attribution in Diffusion Models Mlodozeniec, B. K., Eschenhagen, R., Bae, J., Immer, A., Krueger, D., Turner, R. E. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Robust Machine Learning Conference Paper Interaction Asymmetry: A General Principle for Learning Composable Abstractions Brady, J., von Kügelgen, J., Lachapelle, S., Buchholz, S., Kipf*, T., Brendel*, W. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *joint senior author (Published) arXiv BibTeX

Empirical Inference Conference Paper Language Model Alignment in Multilingual Trolley Problems Jin, Z., Kleiman-Weiner, M., Piatti, G., Levine, S., Liu, J., Gonzalez, F., Ortu, F., Strausz, A., Sachan, M., Mihalcea, R., Choi, Y., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Autonomous Learning Conference Paper On the Transfer of Object-Centric Representation Learning Didolkar, A. R., Zadaianchuk, A., Goyal, A., Mozer, M. C., Bengio, Y., Martius*, G., Seitzer*, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) URL BibTeX

Empirical Inference Conference Paper Preference Elicitation for Offline Reinforcement Learning Pace, A., Schölkopf, B., Rätsch, G., Ramponi, G. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Conference Paper Standardizing Structural Causal Models Ormaniec*, W., Sussex*, S., Lorch*, L., Schölkopf, B., Krause, A. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Empirical Inference Conference Paper The Directionality of Optimization Trajectories in Neural Networks Singh, S. P., He, B., Hofmann, T., Schölkopf, B. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) URL BibTeX

Empirical Inference Article The Fiction Machine Bottou, L., Schölkopf, B. SIAM News, 58(3), April 2025 (Published) URL BibTeX

Empirical Inference Conference Paper What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Ormaniec, W., Dangel, F., Singh, S. P. The Thirteenth International Conference on Learning Representations (ICLR), April 2025 (Published) arXiv BibTeX

Empirical Inference Conference Paper Why AI Is WEIRD and Should Not Be This Way: Towards AI For Everyone, With Everyone, By Everyone Mihalcea*, R., Ignat*, O., Bai, L., Borah, A., Chiruzzo, L., Jin, Z., Kwizera, C., Nwatu, J., Poria, S., Solorio, T. The Thirty-Nineth AAAI Conference on Artificial Intelligence, AAAI 2025 (Senior Member Presentation Track), (27)28657-28670, (Editors: Toby Walsh, Julie Shah, Zico Kolter ), AAAI Press, April 2025, *equal contribution (Published)

Abstract ›

This paper presents a vision for creating AI systems that are inclusive at every stage of development, from data collection to model design and evaluation. We address key limitations in the current AI pipeline and its WEIRD* representation, such as lack of data diversity, biases in model performance, and narrow evaluation metrics. We also focus on the need for diverse representation among the developers of these systems, as well as incentives that are not skewed toward certain groups. We highlight opportunities to develop AI systems that are for everyone (with diverse stakeholders in mind), with everyone (inclusive of diverse data and annotators), and by everyone (designed and developed by a globally diverse workforce). *WEIRD = an acronym coined by Joseph Henrich to highlight the coverage limitations of many psychological studies, referring to populations that are Western, Educated, Industrialized, Rich, and Democratic; while we do not fully adopt this term for AI, as its current scope does not perfectly align with the WEIRD dimensions, we believe that today’s AI has a similarly "weird" coverage, particularly in terms of who is involved in its development and who benefits from it.

arXiv DOI URL BibTeX

Empirical Inference Conference Paper MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Opedal*, A., Shirakami*, H., Schölkopf, B., Saparov, A., Sachan, M. The Thirteenth International Conference on Learning Representations (ICLR), April 2025, *equal contribution (Published) arXiv BibTeX

Empirical Inference Article Early warning of complex climate risk with integrated artificial intelligence Reichstein, M., Benson, V., Blunk, J., Camps-Valls, G., Creutzig, F., Fearnley, C. J., Han, B., Kornhuber, K., Rahaman, N., Schölkopf, B., Tárraga, J. M., Vinuesa, R., Dall, K., Denzler, J., Frank, D., Martini, G., Nganga, N., Maddix, D. C., Weldemariam, K. Nature Communications, 16(1), March 2025 (Published) DOI BibTeX

Empirical Inference Ph.D. Thesis Learning to Generalize Across Distribution Shifts Träuble, F. J. University of Tübingen, Germany, March 2025, (IMPRS-PhD-Fellowship-Program and ELLIS-PhD-Fellowship-Program) (Published) BibTeX

Empirical Inference Article Real-time inference for binary neutron star mergers using machine learning Dax, M., Green, S. R., Gair, J., Gupte, N., Pürrer, M., Raymond, V., Wildberger, J., Macke, J. H., Buonanno, A., Schölkopf, B. Nature, 639(8053):49-53, March 2025 (Published) DOI URL BibTeX

Empirical Inference Article Artificial intelligence for modelling infectious disease epidemics Kraemer, M. U. G., Tsui, J. L., Chang, S. Y., Lytras, S., Khurana, M. P., Vanderslott, S., Bajaj, S., Scheidwasser, N., Curran-Sebastian, J. L., Semenova, E., Zhang, M., Unwin, H. J. T., Watson, O. J., Mills, C., Dasgupta, A., Ferretti, L., Scarpino, S. V., Koua, E., Morgan, O., Tegally, H., et al. Nature, 638(8051):623-635, February 2025 (Published) DOI URL BibTeX

Empirical Inference Ph.D. Thesis Predictions, Policies, Rewards: Models of Decision-Making from Observational Data Pace, A. ETH Zurich, Switzerland, February 2025, ETH AI Center-Fellowship-Program (Published) BibTeX

Empirical Inference Article Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels Gebhard, T. D., Wildberger, J., Dax, M., Kofler, A., Angerhausen, D., Quanz, S., Schölkopf, B. Astronomy \& Astrophysics, 693, January 2025 (Published) DOI URL BibTeX

Empirical Inference Ph.D. Thesis Machine Learning Meets Exoplanet Science: Methodical contributions to direct imaging and atmospheric retrieval Gebhard, T. ETH Zurich, Switzerland, January 2025 (Published) BibTeX

Empirical Inference Technical Report International AI Safety Report Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika, M., Michael, J., Newman, J., et al. (DSIT 2025/001), 2025 (Published) URL BibTeX

Empirical Inference Book Chapter Natural Language Processing Jin, Z., Mihalcea, R., Schölkopf, B. In Elgar Encyclopedia of Political Communication, (Editors: Nai, A. and Grömping, M. and Wirz, D.), Edward Elgar Publishing, 2025 (Published) PDF URL BibTeX

Empirical Inference Conference Paper From Causal to Concept-Based Representation Learning Rajendran*, G., Buchholz*, S., Aragam, B., Schölkopf, B., Ravikumar, P. K. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:101250-101296, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Learning Partitions from Context Buchholz, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:140066-140112, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Didolkar, A. R., Goyal, A., Ke, N. R., Guo, S., Valko, M., Lillicrap, T. P., Rezende, D. J., Bengio, Y., Mozer, M. C., Arora, S. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:19783-19812, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper A Generative Model of Symmetry Transformations Allingham, J. U., Mlodozeniec, B. K., Padhy, S., Antorán, J., Krueger, D., Turner, R. E., Nalisnick, E., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:91091-91130, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Article A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions Rastogi, C., Song, X., Jin, Z., Stelmakh, I., Daumé III, H., Zhang, K., Shah, N. B. PLOS ONE, 19(12), Public Library of Science, December 2024 (Published) DOI URL BibTeX

Empirical Inference Conference Paper Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art Hernandez, A., Brinkmann, L., Serna, I., Rahaman, N., Alhaija, H. A., Yakura, H., Sola, M. C., Schölkopf, B., Rahwan, I. NeurIPS 2024 Workshop on Creativity and Generative AI, December 2024 (Published) arXiv BibTeX

Empirical Inference Conference Paper Causal vs. Anticausal merging of predictors Garrido Mejia, S., Blöbaum, P., Schölkopf, B., Janzing, D. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:1402-1427, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Ph.D. Thesis Causality for Natural Language Processing Jin, Z. University of Tübingen, Germany, December 2024, (ELLIS PhD student program) (Published) URL BibTeX

Empirical Inference Conference Paper Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias Chen*, Y., Vethavikashini*, C. R., Mattern*, J., Mihalcea, R., Jin, Z. NeurIPS 2024 Workshop on Causality and Language Models (CaLM), December 2024, *equal contribution (Published) DOI URL BibTeX

Empirical Inference Conference Paper Cooperate or Collapse: Emergence of Sustainability in a Society of LLM Agents Piatti*, G., Jin*, Z., Kleiman-Weiner*, M., Schölkopf, B., Sachan, M., Mihalcea, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:111715-111759, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Do Finetti: On Causal Effects for Exchangeable Data Guo, S., Zhang, C., Muhan, K., Huszár*, F., Schölkopf*, B. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:127317-127345, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal supervision (Published) URL BibTeX

Empirical Inference Conference Paper Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes Lin, J. A., Padhy, S., Mlodozeniec, B. K., Antorán, J., Hernández-Lobato, J. M. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:15460-15496, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Inferring stochastic low-rank recurrent neural networks from neural data Pals, M., Sağtekin, A. E., Pei, F., Gloeckler, M., Macke, J. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:18225-18264, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Latent Diffusion for Neural Spiking Data Kapoor, J., Schulz, A., Vetter, J., Pei, F., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:118119-118154, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Limits of Transformer Language Models on Learning to Compose Algorithms Thomm, J., Camposampiero, G., Terzic, A., Hersche, M., Schölkopf, B., Rahimi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:7631-7674, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) arXiv URL BibTeX

Empirical Inference Conference Paper Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks Chen, W., Ge, H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024) , 37:97562-97586, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper On Affine Homotopy between Language Encoders Chan, R., Bourmasmoud, R., Svete, A., Ren, Y., Guo, Q., Jin, Z., Ravfogel, S., Sachan, M., Schölkopf, B., El-Assady, M., Cotterell, R. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:73337-73365, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Shaving Weights with Occam’s Razor: Bayesian Sparsification for Neural Networks using the Marginal Likelihood Dhahri, R., Immer, A., Charpentier, B., Günnemann, S., Fortuin, V. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:24959-24989, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Vetter, J., Moss, G., Schröder, C., Gao, R., Macke, J. H. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:88772-88806, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024 (Published) URL BibTeX

Empirical Inference Conference Paper Theoretical Characterisation of the Gauss Newton Conditioning in Neural Networks Zhao*, J., Singh*, S. P., Lucchi, A. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), 37:114965-115000, (Editors: A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang), Curran Associates, Inc., 38th Annual Conference on Neural Information Processing Systems, December 2024, *equal contribution (Published) URL BibTeX

Publications

Filter by