Empirical Inference Members Publications

2016 progress report

Members

Empirical Inference
Empirical Inference

Publications

Empirical Inference Article Positive definite matrices and the S-divergence Sra, S. Proceedings of the American Mathematical Society, 2015, Published electronically: October 22, 2015 (Published) DOI BibTeX

Empirical Inference Article Efficient nearest neighbors via robust sparse hashing Cherian, A., Sra, S., Morellas, V., Papanikolopoulos, N. IEEE Transactions on Image Processing, 23(8):3646-3655, 2014 (Published) DOI BibTeX

Empirical Inference Conference Paper Fast Newton methods for the group fused lasso Wytock, M., Sra, S., Kolter, J. Z. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 888-897, (Editors: Zhang, N. L. and Tian, J.), AUAI Press, UAI, 2014 (Published) URL BibTeX

Empirical Inference Article Fast projection onto mixed-norm balls with applications Sra, S. Minining and Knowledge Discovery (DMKD), 25(2):358-377, September 2012 (Published) DOI BibTeX

Empirical Inference Conference Paper Accelerating Nearest Neighbor Search on Manycore Systems Cayton, L. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, 402-413, IPDPS, May 2012 (Published)
We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop.
Web DOI BibTeX

Empirical Inference Article A non-monotonic method for large-scale non-negative least squares Kim, D., Sra, S., Dhillon, I. S. Optimization Methods and Software, 28(5):1012-1039, February 2012 (Published) DOI BibTeX

Empirical Inference Conference Paper Scalable nonconvex inexact proximal splitting Sra, S. In Advances of Neural Information Processing Systems 25, 539-547, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), 2012 PDF BibTeX

Empirical Inference Book Optimization for Machine Learning Sra, S., Nowozin, S., Wright, S. 494, Neural information processing series, MIT Press, Cambridge, MA, USA, December 2011
The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.
Web BibTeX

Empirical Inference Book Chapter Projected Newton-type methods in machine learning Schmidt, M., Kim, D., Sra, S. In Optimization for Machine Learning, 305-330, (Editors: Sra, S., Nowozin, S. and Wright, S. J.), MIT Press, Cambridge, MA, USA, December 2011
We consider projected Newton-type methods for solving large-scale optimization problems arising in machine learning and related fields. We first introduce an algorithmic framework for projected Newton-type methods by reviewing a canonical projected (quasi-)Newton method. This method, while conceptually pleasing, has a high computation cost per iteration. Thus, we discuss two variants that are more scalable, namely, two-metric projection and inexact projection methods. Finally, we show how to apply the Newton-type framework to handle non-smooth objectives. Examples are provided throughout the chapter to illustrate machine learning applications of our framework.
PDF Web BibTeX

Empirical Inference Article Analysis of Fixed-Point and Coordinate Descent Algorithms for Regularized Kernel Methods Dinuzzo, F. IEEE Transactions on Neural Networks, 22(10):1576-1587, October 2011
In this paper, we analyze the convergence of two general classes of optimization algorithms for regularized kernel methods with convex loss function and quadratic norm regularization. The first methodology is a new class of algorithms based on fixed-point iterations that are well-suited for a parallel implementation and can be used with any convex loss function. The second methodology is based on coordinate descent, and generalizes some techniques previously proposed for linear support vector machines. It exploits the structure of additively separable loss functions to compute solutions of line searches in closed form. The two methodologies are both very easy to implement. In this paper, we also show how to remove non-differentiability of the objective functional by exactly reformulating a convex regularization problem as an unconstrained differentiable stabilization problem.
Web DOI BibTeX

Empirical Inference Conference Paper Approximation Bounds for Inference using Cooperative Cut Jegelka, S., Bilmes, J. In 577-584, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML 2011), July 2011
We analyze a family of probability distributions that are characterized by an embedded combinatorial structure. This family includes models having arbitrary treewidth and arbitrary sized factors. Unlike general models with such freedom, where the “most probable explanation” (MPE) problem is inapproximable, the combinatorial structure within our model, in particular the indirect use of submodularity, leads to several MPE algorithms that all have approximation guarantees.
PDF Web BibTeX

Empirical Inference Conference Paper Online submodular minimization for combinatorial structures Jegelka, S., Bilmes, J. In 345-352, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML 2011), July 2011
Most results for online decision problems with structured concepts, such as trees or cuts, assume linear costs. In many settings, however, nonlinear costs are more realistic. Owing to their non-separability, these lead to much harder optimization problems. Going beyond linearity, we address online approximation algorithms for structured concepts that allow the cost to be submodular, i.e., nonseparable. In particular, we show regret bounds for three Hannan-consistent strategies that capture different settings. Our results also tighten a regret bound for unconstrained online submodular minimization.
PDF PDF Web BibTeX

Empirical Inference Conference Paper Submodularity beyond submodular energies: coupling edges in graph cuts Jegelka, S., Bilmes, J. In 1897-1904, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), June 2011
We propose a new family of non-submodular global energy functions that still use submodularity internally to couple edges in a graph cut. We show it is possible to develop an efficient approximation algorithm that, thanks to the internal submodularity, can use standard graph cuts as a subroutine. We demonstrate the advantages of edge coupling in a natural setting, namely image segmentation. In particular, for finestructured objects and objects with shading variation, our structured edge coupling leads to significant improvements over standard approaches.
PDF Web DOI BibTeX

Empirical Inference Conference Paper Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N. In IEEE International Conference on Computer Vision, ICCV 2011, 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV 2011), 2011 DOI BibTeX

Empirical Inference Conference Paper Fast Newton-type Methods for Total-Variation with Applications Barbero, A., Sra, S. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML 2011), 2011 BibTeX

Empirical Inference Article Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach Kim, D., Sra, S., Dhillon, I. SIAM Journal on Scientific Computing, 32(6):3548-3563 , December 2010
Numerous scientific applications across a variety of fields depend on box-constrained convex optimization. Box-constrained problems therefore continue to attract research interest. We address box-constrained (strictly convex) problems by deriving two new quasi-Newton algorithms. Our algorithms are positioned between the projected-gradient [J. B. Rosen, J. SIAM, 8 (1960), pp. 181–217] and projected-Newton [D. P. Bertsekas, SIAM J. Control Optim., 20 (1982), pp. 221–246] methods. We also prove their convergence under a simple Armijo step-size rule. We provide experimental results for two particular box-constrained problems: nonnegative least squares (NNLS), and nonnegative Kullback–Leibler (NNKL) minimization. For both NNLS and NNKL our algorithms perform competitively as compared to well-established methods on medium-sized problems; for larger problems our approach frequently outperforms the competition.
Web DOI BibTeX

Empirical Inference Conference Paper A scalable trust-region algorithm with application to mixed-norm regression Kim, D., Sra, S., Dhillon, I. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 519-526, (Editors: Fürnkranz, J. , T. Joachims), International Machine Learning Society, Madison, WI, USA, 27th International Conference on Machine Learning (ICML 2010), June 2010
We present a new algorithm for minimizing a convex loss-function subject to regularization. Our framework applies to numerous problems in machine learning and statistics; notably, for sparsity-promoting regularizers such as ℓ1 or ℓ1, ∞ norms, it enables efficient computation of sparse solutions. Our approach is based on the trust-region framework with nonsmooth objectives, which allows us to build on known results to provide convergence analysis. We avoid the computational overheads associated with the conventional Hessian approximation used by trust-region methods by instead using a simple separable quadratic approximation. This approximation also enables use of proximity operators for tackling nonsmooth regularizers. We illustrate the versatility of our resulting algorithm by specializing it to three mixed-norm regression problems: group lasso [36], group logistic regression [21], and multi-task lasso [19]. We experiment with both synthetic and real-world large-scale data—our method is seen to be competitive, robust, and scalable.
PDF Web BibTeX

Empirical Inference Conference Paper Convex Perturbations for Scalable Semidefinite Programming Kulis, B., Sra, S., Dhillon, I. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AIStats 2009), JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, 296-303, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009
Many important machine learning problems are modeled and solved via semidefinite programs; examples include metric learning, nonlinear embedding, and certain clustering problems. Often, off-the-shelf software is invoked for the associated optimization, which can be inappropriate due to excessive computational and storage requirements. In this paper, we introduce the use of convex perturbations for solving semidefinite programs (SDPs), and for a specific perturbation we derive an algorithm that has several advantages over existing techniques: a) it is simple, requiring only a few lines of Matlab, b) it is a first-order method, and thereby scalable, and c) it can easily exploit the structure of a given SDP (e.g., when the constraint matrices are low-rank, a situation common to several machine learning SDPs). A pleasant byproduct of our method is a fast, kernelized version of the large-margin nearest neighbor metric learning algorithm. We demonstrate that our algorithm is effective in finding fast approximations to large-scale SDPs arising in some machine learning applications.
PDF Web BibTeX

Empirical Inference Conference Paper Efficient Bregman Range Search Cayton, L. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009, Advances in Neural Information Processing Systems 22, 243-251, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS 2009), 2009
We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences.
PDF Web BibTeX