Publications

DEPARTMENTS

Emperical Interference

Haptic Intelligence

Modern Magnetic Systems

Perceiving Systems

Physical Intelligence

Robotic Materials

Social Foundations of Computation


Research Groups

Autonomous Vision

Autonomous Learning

Bioinspired Autonomous Miniature Robots

Dynamic Locomotion

Embodied Vision

Human Aspects of Machine Learning

Intelligent Control Systems

Learning and Dynamical Systems

Locomotion in Biorobotic and Somatic Systems

Micro, Nano, and Molecular Systems

Movement Generation and Control

Neural Capture and Synthesis

Physics for Inference and Optimization

Organizational Leadership and Diversity

Probabilistic Learning Group


Topics

Robot Learning

Conference Paper

2022

Autonomous Learning

Robotics

AI

Career

Award


Empirical Inference Probabilistic Numerics Conference Paper Convergence Guarantees for Adaptive Bayesian Quadrature Methods Kanagawa, M., Hennig, P. Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 6234-6245, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (Published) URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Limitations of the empirical Fisher approximation for natural gradient descent Kunstner, F., Hennig, P., Balles, L. Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 4158-4169, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (Published) URL BibTeX

Probabilistic Numerics Conference Paper Active Multi-Information Source Bayesian Quadrature Gessner, A. G. J. M. M. Proceedings 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 712-721, (Editors: Adams, RP; Gogate, V), UAI, July 2019 (Published) URL BibTeX

Probabilistic Numerics Empirical Inference Conference Paper Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization de Roos, F., Hennig, P. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89:1448-1457, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (Published)
Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well.
PDF URL BibTeX

Probabilistic Numerics Empirical Inference Conference Paper Fast and Robust Shortest Paths on Manifolds Learned from Data Arvanitidis, G., Hauberg, S., Hennig, P., Schober, M. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89:1506-1515, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (Published) PDF URL BibTeX

Probabilistic Numerics Conference Paper Kernel Recursive ABC: Point Estimation with Intractable Likelihood Kajihara, T., Kanagawa, M., Yamazaki, K., Fukumizu, K. Proceedings of the 35th International Conference on Machine Learning, 2405-2414, PMLR, July 2018
We propose a novel approach to parameter estimation for simulator-based statistical models with intractable likelihood. Our proposed method involves recursive application of kernel ABC and kernel herding to the same observed data. We provide a theoretical explanation regarding why the approach works, showing (for the population setting) that, under a certain assumption, point estimates obtained with this method converge to the true parameter, as recursion proceeds. We have conducted a variety of numerical experiments, including parameter estimation for a real-world pedestrian flow simulator, and show that in most cases our method outperforms existing approaches.
Paper BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S. Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) at ICML, July 2018 (Published) BibTeX

Probabilistic Numerics Conference Paper Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Balles, L., Hennig, P. Proceedings of the 35th International Conference on Machine Learning (ICML), 80:404-413, Proceedings of Machine Learning Research, (Editors: Jennifer Dy and Andreas Krause), PMLR, ICML, July 2018 (Published)
The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner's toolbox for problems where ADAM fails.
URL BibTeX

Autonomous Motion Probabilistic Numerics Intelligent Control Systems Conference Paper On the Design of LQR Kernels for Efficient Controller Learning Marco, A., Hennig, P., Schaal, S., Trimpe, S. Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (Published)
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.
arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI BibTeX

Probabilistic Numerics Perceiving Systems Conference Paper Coupling Adaptive Batch Sizes with Learning Rates Balles, L., Romero, J., Hennig, P. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), ID 141, (Editors: Gal Elidan, Kristian Kersting, and Alexander T. Ihler), August 2017 (Published)
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.
Code URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Dynamic Time-of-Flight Schober, M., Adam, A., Yair, O., Mazor, S., Nowozin, S. Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 170-179, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (Published) DOI BibTeX

Autonomous Motion Probabilistic Numerics Intelligent Control Systems Conference Paper Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1557-1563, IEEE, Piscataway, NJ, USA, May 2017 (Published) PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI BibTeX

Probabilistic Numerics Conference Paper Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:528-536, Proceedings of Machine Learning Research, (Editors: Sign, Aarti and Zhu, Jerry), PMLR, April 2017 (Published) pdf URL BibTeX

Probabilistic Numerics Conference Paper Active Uncertainty Calibration in Bayesian ODE Solvers Kersting, H., Hennig, P. Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), 309-318, (Editors: Ihler, Alexander T. and Janzing, Dominik), June 2016 (Published)
There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al.~introduced a sampling-based class of methods that are `well-calibrated' in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al.~pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations.
URL BibTeX

Autonomous Motion Probabilistic Numerics Intelligent Control Systems Conference Paper Automatic LQR Tuning Based on Gaussian Process Global Optimization Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 270-277, IEEE, IEEE International Conference on Robotics and Automation, May 2016 (Published)
This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree- of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Results of a two- and four- dimensional tuning problems highlight the method’s potential for automatic controller tuning on robotic platforms.
Video - Automatic LQR Tuning Based on Gaussian Process Global Optimization - ICRA 2016 Video - Automatic Controller Tuning on a Two-legged Robot PDF DOI BibTeX

Probabilistic Numerics Conference Paper Batch Bayesian Optimization via Local Penalization González, J., Dai, Z., Hennig, P., Lawrence, N. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51:648-657, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C.), May 2016 (Published) URL BibTeX

Probabilistic Numerics Conference Paper Probabilistic Approximate Least-Squares Bartels, S., Hennig, P. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51:676-684, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C. ), May 2016 (Published)
Least-squares and kernel-ridge / Gaussian process regression are among the foundational algorithms of statistics and machine learning. Famously, the worst-case cost of exact nonparametric regression grows cubically with the data-set size; but a growing number of approximations have been developed that estimate good solutions at lower cost. These algorithms typically return point estimators, without measures of uncertainty. Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor). This allows estimating the error of the least-squares solution on a subset of the data relative to the full-data solution. The uncertainty can be used to control the computational effort invested in the approximation. Our algorithm has linear cost in the data-set size, and a simple formal form, so that it can be implemented with a few lines of code in programming languages with linear algebra functionality.
URL BibTeX

Autonomous Motion Empirical Inference Probabilistic Numerics Intelligent Control Systems Conference Paper Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S. Machine Learning in Planning and Control of Robot Motion Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (iROS), Machine Learning in Planning and Control of Robot Motion Workshop, October 2015 (Published)
This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Preliminary results of a low-dimensional tuning problem highlight the method’s potential for automatic controller tuning on robotic platforms.
PDF DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper A Random Riemannian Metric for Probabilistic Shortest-Path Tractography Hauberg, S., Schober, M., Liptrot, M., Hennig, P., Feragen, A. In 18th International Conference on Medical Image Computing and Computer Assisted Intervention, 9349:597-604, Lecture Notes in Computer Science, MICCAI, 2015 (Published) PDF DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Inference of Cause and Effect with Unsupervised Inverse Regression Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 38:847-855, JMLR Workshop and Conference Proceedings, (Editors: Lebanon, G. and Vishwanathan, S.V.N.), JMLR.org, AISTATS, 2015 (Published) Web PDF BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Probabilistic Line Searches for Stochastic Optimization Mahsereci, M., Hennig, P. In Advances in Neural Information Processing Systems 28, 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS 2015), 2015 (Published)
In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent. [You can find the matlab research code under `attachments' below. The zip-file contains a minimal working example. The docstring in probLineSearch.m contains additional information. A more polished implementation in C++ will be published here at a later point. For comments and questions about the code please write to mmahsereci@tue.mpg.de.]
Matlab research code URL BibTeX

Perceiving Systems Empirical Inference Probabilistic Numerics Conference Paper Probabilistic Progress Bars Kiefel, M., Schuler, C., Hennig, P. In Conference on Pattern Recognition (GCPR), 8753:331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014
Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.
website+code pdf DOI BibTeX

Perceiving Systems Empirical Inference Probabilistic Numerics Conference Paper Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics Hennig, P., Hauberg, S. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33:347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, AISTATS, April 2014
We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.
pdf Youtube Supplements BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Active Learning of Linear Embeddings for Gaussian Processes Garnett, R., Osborne, M., Hennig, P. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 230-239, (Editors: NL Zhang and J Tian), AUAI Press , Corvallis, Oregon, UAI2014, 2014, another link: http://arxiv.org/abs/1310.6740 PDF Web BibTeX

Autonomous Motion Empirical Inference Probabilistic Numerics Conference Paper Efficient Bayesian Local Model Learning for Control Meier, F., Hennig, P., Schaal, S. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2244 - 2249, IROS, 2014, clmc
Model-based control is essential for compliant controland force control in many modern complex robots, like humanoidor disaster robots. Due to many unknown and hard tomodel nonlinearities, analytical models of such robots are oftenonly very rough approximations. However, modern optimizationcontrollers frequently depend on reasonably accurate models,and degrade greatly in robustness and performance if modelerrors are too large. For a long time, machine learning hasbeen expected to provide automatic empirical model synthesis,yet so far, research has only generated feasibility studies butno learning algorithms that run reliably on complex robots.In this paper, we combine two promising worlds of regressiontechniques to generate a more powerful regression learningsystem. On the one hand, locally weighted regression techniquesare computationally efficient, but hard to tune due to avariety of data dependent meta-parameters. On the other hand,Bayesian regression has rather automatic and robust methods toset learning parameters, but becomes quickly computationallyinfeasible for big and high-dimensional data sets. By reducingthe complexity of Bayesian regression in the spirit of local modellearning through variational approximations, we arrive at anovel algorithm that is computationally efficient and easy toinitialize for robust learning. Evaluations on several datasetsdemonstrate very good learning performance and the potentialfor a general regression learning tool for robotics.
PDF DOI URL BibTeX

Autonomous Motion Empirical Inference Probabilistic Numerics Conference Paper Incremental Local Gaussian Regression Meier, F., Hennig, P., Schaal, S. In Advances in Neural Information Processing Systems 27, 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS 2014), 2014, clmc PDF URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers Schober, M., Kasenburg, N., Feragen, A., Hennig, P., Hauberg, S. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675, 265-272, (Editors: P. Golland, N. Hata, C. Barillot, J. Hornegger and R. Howe), Springer, Heidelberg, MICCAI, 2014 DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Probabilistic ODE Solvers with Runge-Kutta Means Schober, M., Duvenaud, D., Hennig, P. In Advances in Neural Information Processing Systems 27, 739-747, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS 2014), 2014 (Published) Web URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature Gunter, T., Osborne, M., Garnett, R., Hennig, P., Roberts, S. In Advances in Neural Information Processing Systems 27, 2789-2797, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS 2014), 2014 (Published) Web URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Analytical probabilistic proton dose calculation and range uncertainties Bangert, M., Hennig, P., Oelfke, U. In 17th International Conference on the Use of Computers in Radiation Therapy, 6-11, (Editors: A. Haworth and T. Kron), ICCR, 2013 BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Fast Probabilistic Optimization from Noisy Gradients Hennig, P. In Proceedings of The 30th International Conference on Machine Learning, JMLR W&CP 28(1), 62–70, (Editors: S Dasgupta and D McAllester), ICML, 2013 PDF BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Nonparametric dynamics estimation for time periodic systems Klenske, E., Zeilinger, M., Schölkopf, B., Hennig, P. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, 486-493 , 2013 PDF DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper The Randomized Dependence Coefficient Lopez-Paz, D., Hennig, P., Schölkopf, B. In Advances in Neural Information Processing Systems 26, 1-9, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), 2013 PDF BibTeX

Perceiving Systems Empirical Inference Probabilistic Numerics Conference Paper Quasi-Newton Methods: A New Direction Hennig, P., Kiefel, M. In Proceedings of the 29th International Conference on Machine Learning, 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.
website+code pdf URL BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Learning Tracking Control with Forward Models Bócsi, B., Hennig, P., Csató, L., Peters, J. In 259 -264, IEEE International Conference on Robotics and Automation (ICRA 2012), May 2012
Performing task-space tracking control on redundant robot manipulators is a difficult problem. When the physical model of the robot is too complex or not available, standard methods fail and machine learning algorithms can have advantages. We propose an adaptive learning algorithm for tracking control of underactuated or non-rigid robots where the physical model of the robot is unavailable. The control method is based on the fact that forward models are relatively straightforward to learn and local inversions can be obtained via local optimization. We use sparse online Gaussian process inference to obtain a flexible probabilistic forward model and second order optimization to find the inverse mapping. Physical experiments indicate that this approach can outperform state-of-the-art tracking control algorithms in this context.
PDF Web DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Approximate Gaussian Integration using Expectation Propagation Cunningham, J., Hennig, P., Lacoste-Julien, S. In 1-11, -, January 2012 (Submitted)
While Gaussian probability densities are omnipresent in applied mathematics, Gaussian cumulative probabilities are hard to calculate in any but the univariate case. We offer here an empirical study of the utility of Expectation Propagation (EP) as an approximate integration method for this problem. For rectangular integration regions, the approximation is highly accurate. We also extend the derivations to the more general case of polyhedral integration regions. However, we find that in this polyhedral case, EP's answer, though often accurate, can be almost arbitrarily wrong. These unexpected results elucidate an interesting and non-obvious feature of EP not yet studied in detail, both for the problem of Gaussian probabilities and for EP more generally.
Web BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Kernel Topic Models Hennig, P., Stern, D., Herbrich, R., Graepel, T. In Fifteenth International Conference on Artificial Intelligence and Statistics, 22:511-519, JMLR Proceedings, (Editors: Lawrence, N. D. and Girolami, M.), JMLR.org, AISTATS 2012 , 2012
Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.
PDF Web BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Optimal Reinforcement Learning for Gaussian Systems Hennig, P. In Advances in Neural Information Processing Systems 24, 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011), 2011
The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finitedimensional projection gives an impression for how this result may be helpful.
PDF Web BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Using an Infinite Von Mises-Fisher Mixture Model to Cluster Treatment Beam Directions in External Radiation Therapy Bangert, M., Hennig, P., Oelfke, U. In 746-751 , (Editors: Draghici, S. , T.M. Khoshgoftaar, V. Palade, W. Pedrycz, M.A. Wani, X. Zhu), IEEE, Piscataway, NJ, USA, Ninth International Conference on Machine Learning and Applications (ICMLA 2010), December 2010
We present a method for fully automated selection of treatment beam ensembles for external radiation therapy. We reformulate the beam angle selection problem as a clustering problem of locally ideal beam orientations distributed on the unit sphere. For this purpose we construct an infinite mixture of von Mises-Fisher distributions, which is suited in general for density estimation from data on the D-dimensional sphere. Using a nonparametric Dirichlet process prior, our model infers probability distributions over both the number of clusters and their parameter values. We describe an efficient Markov chain Monte Carlo inference algorithm for posterior inference from experimental data in this model. The performance of the suggested beam angle selection framework is illustrated for one intra-cranial, pancreas, and prostate case each. The infinite von Mises-Fisher mixture model (iMFMM) creates between 18 and 32 clusters, depending on the patient anatomy. This suggests to use the iMFMM directly for beam ensemble selection in robotic radio surgery, or to generate low-dimensional input for both subsequent optimization of trajectories for arc therapy and beam ensemble selection for conventional radiation therapy.
Web DOI BibTeX

Empirical Inference Probabilistic Numerics Conference Paper Coherent Inference on Optimal Play in Game Trees Hennig, P., Stern, D., Graepel, T. In JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010, 326-333, (Editors: Teh, Y.W. , M. Titterington ), JMLR, Cambridge, MA, USA, Thirteenth International Conference on Artificial Intelligence and Statistics, May 2010
Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
PDF Web BibTeX