Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Intelligent Control Systems Movement Generation and Control Probabilistic Numerics Empirical Inference Article Robot Learning with Crash Constraints Marco, A., Baumann, D., Khadiv, M., Hennig, P., Righetti, L., Trimpe, S. IEEE Robotics and Automation Letters, 6(2):1439-1446, IEEE, February 2021 (Published)
In the past decade, numerous machine learning algorithms have been shown to successfully learn optimal policies to control real robotic systems. However, it is common to encounter failing behaviors as the learning loop progresses. Specifically, in robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. Both complicate the design of proper reward functions to penalize failures. In this paper, we propose a framework that addresses those issues. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints, where no data is obtained upon constraint violation. The no-data case is addressed by a novel GP model (GPCR) for the constraint that combines discrete events (failure/success) with continuous observations (only obtained upon success). We demonstrate the effectiveness of our framework on simulated benchmarks and on a real jumping quadruped, where the constraint threshold is unknown a priori. Experimental data is collected, by means of constrained Bayesian optimization, directly on the real robot. Our results outperform manual tuning and GPCR proves useful on estimating the constraint threshold.
DOI URL BibTeX

Probabilistic Numerics Article Three-dimensional models of core-collapse supernovae from low-mass progenitors with implications for Crab Stockinger, G., Janka, H., Kresse, D., Melson, T., Ertl, T., Gabler, M., Gessner, A., Wongwathanarat, A., Tolstov, A., Leung, S., Nomoto, K., Heger, A. Monthly Notices of the Royal Astronomical Society , 496(2):2039-2084, August 2020 (Published) DOI BibTeX

Probabilistic Numerics Article Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings Kanagawa, M., Sriperumbudur, B. K., Fukumizu, K. Foundations of Computational Mathematics, 20:155-1944, February 2020 (Published)
This paper presents convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between distance design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.
arXiv DOI BibTeX

Empirical Inference Probabilistic Numerics Article Analytical probabilistic modeling of dose-volume histograms Wahl, N., Hennig, P., Wieser, H., Bangert, M. Medical Physics, 47(10):5260-5273, 2020 (Published) DOI BibTeX

Empirical Inference Probabilistic Numerics Article Convergence rates of Gaussian ODE filters Kersting, H., Sullivan, T. J., Hennig, P. Statistics and Computing, 30(6):1791-1816, 2020 (Published) DOI BibTeX

Probabilistic Numerics Article Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective Tronarp, F., Kersting, H., Särkkä, S., Hennig, P. Statistics and Computing, 29(6):1297-1315, October 2019 (Published)
We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.
DOI URL BibTeX

Empirical Inference Probabilistic Numerics Article Dense connectomic reconstruction in layer 4 of the somatosensory cortex Motta, A., Berning, M., Boergens, K. M., Staffler, B., Beining, M., Loomba, S., Hennig, P., Wissler, H., Helmstaedter, M. Science, 366(6469):eaay3134, 2019 (Published) DOI BibTeX

Probabilistic Numerics Article On the positivity and magnitudes of Bayesian quadrature weights Karvonen, T., Kanagawa, M., Särkä, S. Statistics and Computing, 29:1317-1333, 2019 (Published) DOI BibTeX

Probabilistic Numerics Article Probabilistic Linear Solvers: A Unifying View Bartels, S., Cockayne, J., Ipsen, I., Hennig, P. Statistics and Computing, 29(6):1249-1263, 2019 (Published) DOI URL BibTeX

Empirical Inference Probabilistic Numerics Article Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective Tronarp, F., Kersting, H., Särkkä, S. H. P. Statistics and Computing, 29(6):1297-1315, 2019 (Published) DOI BibTeX

Probabilistic Numerics Article Convergence Rates of Gaussian ODE Filters Kersting, H., Sullivan, T. J., Hennig, P. arXiv preprint 2018, arXiv:1807.09737 [math.NA], July 2018
A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution $x$ and its first $q$ derivatives a priori as a Gauss--Markov process $\boldsymbol{X}$, which is then iteratively conditioned on information about $\dot{x}$. We prove worst-case local convergence rates of order $h^{q+1}$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order $h^q$ in the case of $q=1$ and an integrated Brownian motion prior, and analyse how inaccurate information on $\dot{x}$ coming from approximate evaluations of $f$ affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence---in the sense that they globally contract at the same rate as the truncation error.
URL BibTeX

Probabilistic Numerics Article A probabilistic model for the numerical solution of initial value problems Schober, M., Särkkä, S., Hennig, P. Statistics and Computing, 29(1):99–122, 2018 (Published)
We study connections between ordinary differential equation (ODE) solvers and probabilistic regression methods in statistics. We provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics. Adding to this picture, we show that several multistep methods of Nordsieck form can be recast as Kalman filtering on q-times integrated Wiener processes. Doing so provides a family of IVP solvers that return a Gaussian posterior measure, rather than a point estimate. We show that some such methods have low computational overhead, nontrivial convergence order, and that the posterior has a calibrated concentration rate. Additionally, we suggest a step size adaptation algorithm which completes the proposed method to a practically useful implementation, which we experimentally evaluate using a representative set of standard codes in the DETEST benchmark set.
PDF Code DOI BibTeX

Probabilistic Numerics Empirical Inference Article Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S. Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018
This paper introduces a novel Hilbert space representation of a counterfactual distribution---called counterfactual mean embedding (CME)---with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical diagnosis, whose performance relies on certain interventions. To infer the outcomes of such interventions, we propose to embed the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel. Under appropriate assumptions, the CME allows us to perform causal inference over the entire landscape of the counterfactual distribution. The CME can be estimated consistently from observational data without requiring any parametric assumption about the underlying distributions. We also derive a rate of convergence which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Our framework can deal with not only real-valued outcome, but potentially also more complex and structured outcomes such as images, sequences, and graphs. Lastly, our experimental results on off-policy evaluation tasks demonstrate the advantages of the proposed estimator.
arXiv BibTeX

Probabilistic Numerics Article Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B. K. Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.
arXiv BibTeX

Probabilistic Numerics Article Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models Nishiyama, Y., Kanagawa, M., Gretton, A., Fukumizu, K. Arxiv e-prints, arXiv:1409.5178v2 [stat.ML], 2018
Kernel Bayesian inference is a powerful nonparametric approach to performing Bayesian inference in reproducing kernel Hilbert spaces or feature spaces. In this approach, kernel means are estimated instead of probability distributions, and these estimates can be used for subsequent probabilistic operations (as for inference in graphical models) or in computing the expectations of smooth functions, for instance. Various algorithms for kernel Bayesian inference have been obtained by combining basic rules such as the kernel sum rule (KSR), kernel chain rule, kernel product rule and kernel Bayes' rule. However, the current framework only deals with fully nonparametric inference (i.e., all conditional relations are learned nonparametrically), and it does not allow for flexible combinations of nonparametric and parametric inference, which are practically important. Our contribution is in providing a novel technique to realize such combinations. We introduce a new KSR referred to as the model-based KSR (Mb-KSR), which employs the sum rule in feature spaces under a parametric setting. Incorporating the Mb-KSR into existing kernel Bayesian framework provides a richer framework for hybrid (nonparametric and parametric) kernel Bayesian inference. As a practical application, we propose a novel filtering algorithm for state space models based on the Mb-KSR, which combines the nonparametric learning of an observation process using kernel mean embedding and the additive Gaussian noise model for a state transition process. While we focus on additive Gaussian noise models in this study, the idea can be extended to other noise models, such as the Cauchy and alpha-stable noise models.
arXiv BibTeX

Probabilistic Numerics Article Probabilistic Line Searches for Stochastic Optimization Mahsereci, M., Hennig, P. Journal of Machine Learning Research, 18(119):1-59, November 2017 (Published) URL BibTeX

Probabilistic Numerics Article Analytical probabilistic modeling of RBE-weighted dose for ion therapy Wieser, H., Hennig, P., Wahl, N., Bangert, M. Physics in Medicine and Biology (PMB), 62(23):8959-8982, 2017 URL BibTeX

Probabilistic Numerics Perceiving Systems Article Early Stopping Without a Validation Set Mahsereci, M., Balles, L., Lassner, C., Hennig, P. arXiv preprint arXiv:1703.09580, 2017
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.
URL BibTeX

Probabilistic Numerics Article Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy Wahl, N., Hennig, P., Wieser, H. P., Bangert, M. Physics in Medicine & Biology, 62(14):5790-5807, 2017
The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn001.gif] {$\leqslant {5}$} min). The resulting standard deviation (expectation value) of dose show average global ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn002.gif] {$\gamma_{{3}\% / {3}~{\rm mm}}$} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
URL BibTeX

Probabilistic Numerics Article Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning Roos, F. D., Hennig, P. arXiv preprint arXiv:1706.00241, 2017
Solving symmetric positive definite linear problems is a fundamental computational task in machine learning. The exact solution, famously, is cubicly expensive in the size of the matrix. To alleviate this problem, several linear-time approximations, such as spectral and inducing-point methods, have been suggested and are now in wide use. These are low-rank approximations that choose the low-rank space a priori and do not refine it over time. While this allows linear cost in the data-set size, it also causes a finite, uncorrected approximation error. Authors from numerical linear algebra have explored ways to iteratively refine such low-rank approximations, at a cost of a small number of matrix-vector multiplications. This idea is particularly interesting in the many situations in machine learning where one has to solve a sequence of related symmetric positive definite linear problems. From the machine learning perspective, such deflation methods can be interpreted as transfer learning of a low-rank approximation across a time-series of numerical tasks. We study the use of such methods for our field. Our empirical results show that, on regression and classification problems of intermediate size, this approach can interpolate between low computational cost and numerical precision.
URL BibTeX

Empirical Inference Probabilistic Numerics Article Probabilistic numerics and uncertainty in computations Hennig, P., Osborne, M. A., Girolami, M. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179), 2015 (Published)
We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.
PDF DOI BibTeX

Perceiving Systems Empirical Inference Probabilistic Numerics Article Quasi-Newton Methods: A New Direction Hennig, P., Kiefel, M. Journal of Machine Learning Research, 14(1):843-865, March 2013
Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.
website+code pdf URL BibTeX

Empirical Inference Probabilistic Numerics Article Entropy Search for Information-Efficient Global Optimization Hennig, P., Schuler, C. Journal of Machine Learning Research, 13:1809-1837, -, June 2012
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.
PDF Web BibTeX