Optimization and Large Scale Learning

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Causal Representation Learning

Astronomy

Large Language Models and Causality

Kernel Methods and Probabilistic Modeling

Causality

Deep Learning and Generative Modeling

Robotics

Empirische Inferenz Members Publications

Optimization and Large Scale Learning

Rosenbrock scaled — The Rosenbrock function, a non-convex function which serves as a test-bed for optimization algorithms (figure taken from Wikipedia)

2016 progress report

Optimization lies at the heart of most machine learning algorithms. The key aspects of the applications in which these algorithms are applied include: high-dimensional, noisy, and uncertain data; huge volumes of batch or streaming data; intractable models, low accuracy, and reliance on distributed computation or stochastic approximations. The success of most machine learning algorithms depends on how the optimization techniques can adapt and exploit these facets. Our interests are broadly divided into two categories, convex and non-convex methods.

Convex optimization In the realm of methods for convex optimization, we have addressed research challenges under various different problem settings. For large-scale problems, where scalability is an important aspect, a summary overview of large-scale aspects of convex optimization appears in our work []. A theoretically optimal large-scale convex method for problems with linear constraints is presented in [] which develops a new stochastic alternating direction method of multipliers (ADMM) method that combines Nesterov's accelerated gradient methods with ADMM.

For learning classifiers in extremely large output spaces, we have proposed a parallelizable mixed-norm regularization approach leading to convex but non-smooth optimization in our recent work []. We show that the resulting models can be orders of magnitude smaller than most state-of-the-art methods and also lead to better generalization performance.

A contribution that lies at the interface of combinatorial and convex optimization is presented in []. General purpose convex methods often rely on key subroutines such as projection and proximity operators. Continuing our effort from the previous SAB assessment towards developing a library of highly tuned subroutines, e.g., for total variation, we extend to multivariate total variation in []. A flourishing sub-field of convex optimization is ``Robust optimization'' which seeks to optimize models under parameters/data uncertainty. It intersects with the usual min-max theory in statistics, and can be used to offer a different view of regularization in machine learning. In [], we introduced the notion of robust optimization for matrix completion/recovery problems. The actual application was to recover correlation matrices under a simple bounded uncertainty model.

Non-convex Optimization In the domain of non-convex optimization for large-scale problems, our work [] presents a simplified analysis of what, to our knowledge, is the first non-convex, non-smooth incremental proximal method. This work started in 2011; interestingly, in recent years, the interest in incremental methods has sky-rocketed, though the analysis is limited only to the convex case. Finally, we mention a new direction in nonconvex optimization offered by our recent work [], which introduces "Geometric optimization" on the manifold of positive definite matrices. The underlying idea is to develop a theory of convexity along geodesics on the Positive Semi-Definite manifold. This work also identifies some basic calculus rules for detection and construction of geodesically convex functions on the Positive Definite manifold, and as an application presents new algorithms for solving maximum likelihood estimation for elliptically contoured distributions, which despite non-convexity remain tractable thanks to geodesic convexity.

2016 progress report

Read more

Members

Empirische Inferenz

Suvrit Sra

Empirische Inferenz

Rohit Babbar

Publications

Empirical Inference Article Positive definite matrices and the S-divergence Sra, S. Proceedings of the American Mathematical Society, 2015, Published electronically: October 22, 2015 (Published) DOI BibTeX

Empirical Inference Article Efficient nearest neighbors via robust sparse hashing Cherian, A., Sra, S., Morellas, V., Papanikolopoulos, N. IEEE Transactions on Image Processing, 23(8):3646-3655, 2014 (Published) DOI BibTeX

Empirical Inference Conference Paper Fast Newton methods for the group fused lasso Wytock, M., Sra, S., Kolter, J. Z. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 888-897, (Editors: Zhang, N. L. and Tian, J.), AUAI Press, UAI, 2014 (Published) URL BibTeX

Empirical Inference Article Modular proximal optimization with application to total variation regularization Barbero, A. J., Sra, S. 2014 (In revision) URL BibTeX

Empirical Inference Article Fast projection onto mixed-norm balls with applications Sra, S. Minining and Knowledge Discovery (DMKD), 25(2):358-377, September 2012 (Published) DOI BibTeX

Empirical Inference Conference Paper Accelerating Nearest Neighbor Search on Manycore Systems Cayton, L. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, 402-413, IPDPS, May 2012 (Published) Web DOI BibTeX

Empirical Inference Article A non-monotonic method for large-scale non-negative least squares Kim, D., Sra, S., Dhillon, I. S. Optimization Methods and Software, 28(5):1012-1039, February 2012 (Published) DOI BibTeX

Empirical Inference Ph.D. Thesis Combinatorial Problems with Submodular Coupling in Machine Learning and Computer Vision Jegelka, S. ETH Zürich, Switzerland, 2012 BibTeX

Empirical Inference Conference Paper Scalable nonconvex inexact proximal splitting Sra, S. In Advances of Neural Information Processing Systems 25, 539-547, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), 2012 PDF BibTeX

Empirical Inference Book Optimization for Machine Learning Sra, S., Nowozin, S., Wright, S. 494, Neural information processing series, MIT Press, Cambridge, MA, USA, December 2011 Web BibTeX

Empirical Inference Book Chapter Projected Newton-type methods in machine learning Schmidt, M., Kim, D., Sra, S. In Optimization for Machine Learning, 305-330, (Editors: Sra, S., Nowozin, S. and Wright, S. J.), MIT Press, Cambridge, MA, USA, December 2011 PDF Web BibTeX

Empirical Inference Article Analysis of Fixed-Point and Coordinate Descent Algorithms for Regularized Kernel Methods Dinuzzo, F. IEEE Transactions on Neural Networks, 22(10):1576-1587, October 2011 Web DOI BibTeX

Empirical Inference Conference Paper Approximation Bounds for Inference using Cooperative Cut Jegelka, S., Bilmes, J. In 577-584, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML 2011), July 2011 PDF Web BibTeX

Empirical Inference Conference Paper Online submodular minimization for combinatorial structures Jegelka, S., Bilmes, J. In 345-352, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML 2011), July 2011 PDF PDF Web BibTeX

Empirical Inference Conference Paper Submodularity beyond submodular energies: coupling edges in graph cuts Jegelka, S., Bilmes, J. In 1897-1904, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), June 2011 PDF Web DOI BibTeX

Empirical Inference Conference Paper Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N. In IEEE International Conference on Computer Vision, ICCV 2011, 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV 2011), 2011 DOI BibTeX

Empirical Inference Conference Paper Fast Newton-type Methods for Total-Variation with Applications Barbero, A., Sra, S. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML 2011), 2011 BibTeX

Empirical Inference Article Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach Kim, D., Sra, S., Dhillon, I. SIAM Journal on Scientific Computing, 32(6):3548-3563 , December 2010 Web DOI BibTeX

Empirical Inference Conference Paper A scalable trust-region algorithm with application to mixed-norm regression Kim, D., Sra, S., Dhillon, I. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 519-526, (Editors: Fürnkranz, J. , T. Joachims), International Machine Learning Society, Madison, WI, USA, 27th International Conference on Machine Learning (ICML 2010), June 2010 PDF Web BibTeX

Empirical Inference Conference Paper Convex Perturbations for Scalable Semidefinite Programming Kulis, B., Sra, S., Dhillon, I. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AIStats 2009), JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, 296-303, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 PDF Web BibTeX

Empirical Inference Conference Paper Efficient Bregman Range Search Cayton, L. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009, Advances in Neural Information Processing Systems 22, 243-251, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS 2009), 2009 PDF Web BibTeX