Header logo is


2018


Thumb xl stco paper figure11
Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective

Tronarp, F., Kersting, H., Särkkä, S., Hennig, P.

ArXiv preprint 2018, arXiv:1810.03440 [stat.ME], October 2018 (article)

Abstract
We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.

pn

link (url) Project Page [BibTex]

2018



no image
Kernel Recursive ABC: Point Estimation with Intractable Likelihood

Kajihara, T., Kanagawa, M., Yamazaki, K., Fukumizu, K.

Proceedings of the 35th International Conference on Machine Learning, pages: 2405-2414, PMLR, July 2018 (conference)

Abstract
We propose a novel approach to parameter estimation for simulator-based statistical models with intractable likelihood. Our proposed method involves recursive application of kernel ABC and kernel herding to the same observed data. We provide a theoretical explanation regarding why the approach works, showing (for the population setting) that, under a certain assumption, point estimates obtained with this method converge to the true parameter, as recursion proceeds. We have conducted a variety of numerical experiments, including parameter estimation for a real-world pedestrian flow simulator, and show that in most cases our method outperforms existing approaches.

pn

Paper [BibTex]

Paper [BibTex]


no image
Convergence Rates of Gaussian ODE Filters

Kersting, H., Sullivan, T. J., Hennig, P.

arXiv preprint 2018, arXiv:1807.09737 [math.NA], July 2018 (article)

Abstract
A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution $x$ and its first $q$ derivatives a priori as a Gauss--Markov process $\boldsymbol{X}$, which is then iteratively conditioned on information about $\dot{x}$. We prove worst-case local convergence rates of order $h^{q+1}$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order $h^q$ in the case of $q=1$ and an integrated Brownian motion prior, and analyse how inaccurate information on $\dot{x}$ coming from approximate evaluations of $f$ affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence---in the sense that they globally contract at the same rate as the truncation error.

pn

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B. K.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

pn

arXiv [BibTex]

arXiv [BibTex]


Thumb xl tease pic
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

Balles, L., Hennig, P.

In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018 (inproceedings) Accepted

Abstract
The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner's toolbox for problems where ADAM fails.

pn

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract
This paper introduces a novel Hilbert space representation of a counterfactual distribution---called counterfactual mean embedding (CME)---with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical diagnosis, whose performance relies on certain interventions. To infer the outcomes of such interventions, we propose to embed the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel. Under appropriate assumptions, the CME allows us to perform causal inference over the entire landscape of the counterfactual distribution. The CME can be estimated consistently from observational data without requiring any parametric assumption about the underlying distributions. We also derive a rate of convergence which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Our framework can deal with not only real-valued outcome, but potentially also more complex and structured outcomes such as images, sequences, and graphs. Lastly, our experimental results on off-policy evaluation tasks demonstrate the advantages of the proposed estimator.

ei pn

arXiv [BibTex]

arXiv [BibTex]


no image
Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models

Nishiyama, Y., Kanagawa, M., Gretton, A., Fukumizu, K.

Arxiv e-prints, arXiv:1409.5178v2 [stat.ML], 2018 (article)

Abstract
Kernel Bayesian inference is a powerful nonparametric approach to performing Bayesian inference in reproducing kernel Hilbert spaces or feature spaces. In this approach, kernel means are estimated instead of probability distributions, and these estimates can be used for subsequent probabilistic operations (as for inference in graphical models) or in computing the expectations of smooth functions, for instance. Various algorithms for kernel Bayesian inference have been obtained by combining basic rules such as the kernel sum rule (KSR), kernel chain rule, kernel product rule and kernel Bayes' rule. However, the current framework only deals with fully nonparametric inference (i.e., all conditional relations are learned nonparametrically), and it does not allow for flexible combinations of nonparametric and parametric inference, which are practically important. Our contribution is in providing a novel technique to realize such combinations. We introduce a new KSR referred to as the model-based KSR (Mb-KSR), which employs the sum rule in feature spaces under a parametric setting. Incorporating the Mb-KSR into existing kernel Bayesian framework provides a richer framework for hybrid (nonparametric and parametric) kernel Bayesian inference. As a practical application, we propose a novel filtering algorithm for state space models based on the Mb-KSR, which combines the nonparametric learning of an observation process using kernel mean embedding and the additive Gaussian noise model for a state transition process. While we focus on additive Gaussian noise models in this study, the idea can be extended to other noise models, such as the Cauchy and alpha-stable noise models.

pn

arXiv [BibTex]

arXiv [BibTex]


Thumb xl hp teaser
A probabilistic model for the numerical solution of initial value problems

Schober, M., Särkkä, S., Philipp Hennig,

Statistics and Computing, Springer US, 2018 (article)

Abstract
We study connections between ordinary differential equation (ODE) solvers and probabilistic regression methods in statistics. We provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics. Adding to this picture, we show that several multistep methods of Nordsieck form can be recast as Kalman filtering on q-times integrated Wiener processes. Doing so provides a family of IVP solvers that return a Gaussian posterior measure, rather than a point estimate. We show that some such methods have low computational overhead, nontrivial convergence order, and that the posterior has a calibrated concentration rate. Additionally, we suggest a step size adaptation algorithm which completes the proposed method to a practically useful implementation, which we experimentally evaluate using a representative set of standard codes in the DETEST benchmark set.

pn

PDF Code DOI Project Page [BibTex]


no image
Probabilistic Approaches to Stochastic Optimization

Mahsereci, M.

Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

ei pn

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Probabilistic Ordinary Differential Equation Solvers — Theory and Applications

Schober, M.

Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

ei pn

[BibTex]

[BibTex]


no image
Geckos Race across Water using Multiple Mechanisms

Nirody, J., Jinn, J., Libby, T., Lee, T., Jusufi, A., Hu, D., Full, R.

Current Biology, 2018 (article)

bio

[BibTex]

[BibTex]

2014


Thumb xl ps page panel
Probabilistic Progress Bars

Kiefel, M., Schuler, C., Hennig, P.

In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014 (inproceedings)

Abstract
Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

ei ps pn

website+code pdf DOI [BibTex]

2014


website+code pdf DOI [BibTex]


Thumb xl aistats2014
Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Hennig, P., Hauberg, S.

In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, AISTATS, April 2014 (inproceedings)

Abstract
We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

ei ps pn

pdf Youtube Supplements Project page link (url) [BibTex]

pdf Youtube Supplements Project page link (url) [BibTex]


no image
Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

arXiv preprint, March 2014, clmc (misc)

Abstract
Abstract: Locally weighted regression was created as a nonparametric learning method that is computationally efficient, can learn from very large amounts of data and add data incrementally. An interesting feature of locally weighted regression is that it can work with ...

am pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Probabilistic ODE Solvers with Runge-Kutta Means

Schober, M., Duvenaud, D., Hennig, P.

In Advances in Neural Information Processing Systems 27, pages: 739-747, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

ei pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Active Learning of Linear Embeddings for Gaussian Processes

Garnett, R., Osborne, M., Hennig, P.

In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages: 230-239, (Editors: NL Zhang and J Tian), AUAI Press , Corvallis, Oregon, UAI2014, 2014, another link: http://arxiv.org/abs/1310.6740 (inproceedings)

ei pn

PDF Web [BibTex]

PDF Web [BibTex]


no image
Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers

Schober, M., Kasenburg, N., Feragen, A., Hennig, P., Hauberg, S.

In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675, pages: 265-272, (Editors: P. Golland, N. Hata, C. Barillot, J. Hornegger and R. Howe), Springer, Heidelberg, MICCAI, 2014 (inproceedings)

ei pn

DOI [BibTex]

DOI [BibTex]


no image
Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature

Gunter, T., Osborne, M., Garnett, R., Hennig, P., Roberts, S.

In Advances in Neural Information Processing Systems 27, pages: 2789-2797, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

ei pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Incremental Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

In Advances in Neural Information Processing Systems 27, pages: 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, clmc (inproceedings)

am ei pn

PDF link (url) [BibTex]

PDF link (url) [BibTex]


no image
Efficient Bayesian Local Model Learning for Control

Meier, F., Hennig, P., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages: 2244 - 2249, IROS, 2014, clmc (inproceedings)

Abstract
Model-based control is essential for compliant controland force control in many modern complex robots, like humanoidor disaster robots. Due to many unknown and hard tomodel nonlinearities, analytical models of such robots are oftenonly very rough approximations. However, modern optimizationcontrollers frequently depend on reasonably accurate models,and degrade greatly in robustness and performance if modelerrors are too large. For a long time, machine learning hasbeen expected to provide automatic empirical model synthesis,yet so far, research has only generated feasibility studies butno learning algorithms that run reliably on complex robots.In this paper, we combine two promising worlds of regressiontechniques to generate a more powerful regression learningsystem. On the one hand, locally weighted regression techniquesare computationally efficient, but hard to tune due to avariety of data dependent meta-parameters. On the other hand,Bayesian regression has rather automatic and robust methods toset learning parameters, but becomes quickly computationallyinfeasible for big and high-dimensional data sets. By reducingthe complexity of Bayesian regression in the spirit of local modellearning through variational approximations, we arrive at anovel algorithm that is computationally efficient and easy toinitialize for robust learning. Evaluations on several datasetsdemonstrate very good learning performance and the potentialfor a general regression learning tool for robotics.

am ei pn

PDF link (url) DOI [BibTex]

PDF link (url) DOI [BibTex]

2012


Thumb xl thumb hennigk2012
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012 (inproceedings)

Abstract
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

ei ps pn

website+code pdf link (url) [BibTex]

2012


website+code pdf link (url) [BibTex]


Thumb xl screen shot 2017 09 21 at 00.54.33
Entropy Search for Information-Efficient Global Optimization

Hennig, P., Schuler, C.

Journal of Machine Learning Research, 13, pages: 1809-1837, -, June 2012 (article)

Abstract
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.

ei pn

PDF Web Project Page [BibTex]

PDF Web Project Page [BibTex]


no image
Learning Tracking Control with Forward Models

Bócsi, B., Hennig, P., Csató, L., Peters, J.

In pages: 259 -264, IEEE International Conference on Robotics and Automation (ICRA), May 2012 (inproceedings)

Abstract
Performing task-space tracking control on redundant robot manipulators is a difficult problem. When the physical model of the robot is too complex or not available, standard methods fail and machine learning algorithms can have advantages. We propose an adaptive learning algorithm for tracking control of underactuated or non-rigid robots where the physical model of the robot is unavailable. The control method is based on the fact that forward models are relatively straightforward to learn and local inversions can be obtained via local optimization. We use sparse online Gaussian process inference to obtain a flexible probabilistic forward model and second order optimization to find the inverse mapping. Physical experiments indicate that this approach can outperform state-of-the-art tracking control algorithms in this context.

ei pn

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Approximate Gaussian Integration using Expectation Propagation

Cunningham, J., Hennig, P., Lacoste-Julien, S.

In pages: 1-11, -, January 2012 (inproceedings) Submitted

Abstract
While Gaussian probability densities are omnipresent in applied mathematics, Gaussian cumulative probabilities are hard to calculate in any but the univariate case. We offer here an empirical study of the utility of Expectation Propagation (EP) as an approximate integration method for this problem. For rectangular integration regions, the approximation is highly accurate. We also extend the derivations to the more general case of polyhedral integration regions. However, we find that in this polyhedral case, EP's answer, though often accurate, can be almost arbitrarily wrong. These unexpected results elucidate an interesting and non-obvious feature of EP not yet studied in detail, both for the problem of Gaussian probabilities and for EP more generally.

ei pn

Web [BibTex]

Web [BibTex]


no image
Kernel Topic Models

Hennig, P., Stern, D., Herbrich, R., Graepel, T.

In Fifteenth International Conference on Artificial Intelligence and Statistics, 22, pages: 511-519, JMLR Proceedings, (Editors: Lawrence, N. D. and Girolami, M.), JMLR.org, AISTATS , 2012 (inproceedings)

Abstract
Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.

ei pn

PDF Web [BibTex]

PDF Web [BibTex]


no image
Tail-assisted pitch control in lizards, robots and dinosaurs

Libby, T., Moore, T., Chang, E., Li, D., Cohen, D., Jusufi, A., Full, R.

Nature, 2012 (article)

bio

link (url) [BibTex]

link (url) [BibTex]


no image
Rapid Inversion: Running Animals and Robots Swing like a Pendulum under Ledges

Mongeau, J., McRae, B., Jusufi, A., Birkmeyer, P., Hoover, A., Fearing, R.

PLoS One, 2012 (article)

bio

link (url) [BibTex]

link (url) [BibTex]