Empirical Inference Poster 2010

PAC-Bayesian Bounds for Discrete Density Estimation and Co-clustering Analysis

PDF Web
no image
Empirical Inference

We applied PAC-Bayesian framework to derive gen- eralization bounds for co-clustering1. The analysis yielded regularization terms that were absent in the preceding formulations of this task. The bounds sug- gested that co-clustering should optimize a trade-off between its empirical performance and the mutual in- formation that the cluster variables preserve on row and column indices. Proper regularization enabled us to achieve state-of-the-art results in prediction of the missing ratings in the MovieLens collaborative filtering dataset. In addition a PAC-Bayesian bound for discrete den- sity estimation was derived. We have shown that the PAC-Bayesian bound for classification is a spe- cial case of the PAC-Bayesian bound for discrete den- sity estimation. We further introduced combinatorial priors to PAC-Bayesian analysis. The combinatorial priors are more appropriate for discrete domains, as opposed to Gaussian priors, the latter of which are suitable for continuous domains. It was shown that combinatorial priors lead to regularization terms in the form of mutual information.

Author(s): Seldin, Y. and Tishby, N.
Links:
Journal: Workshop "Foundations and New Trends of PAC Bayesian Learning"
Volume: 2010
Year: 2010
Month: March
Day: 0
Bibtex Type: Poster (poster)
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

BibTex

@poster{6329,
  title = {PAC-Bayesian Bounds for Discrete Density Estimation and Co-clustering Analysis},
  journal = {Workshop "Foundations and New Trends of PAC Bayesian Learning"},
  abstract = {We applied PAC-Bayesian framework to derive gen-
  eralization bounds for co-clustering1. The analysis
  yielded regularization terms that were absent in the
  preceding formulations of this task. The bounds sug-
  gested that co-clustering should optimize a trade-off
  between its empirical performance and the mutual in-
  formation that the cluster variables preserve on row
  and column indices. Proper regularization enabled
  us to achieve state-of-the-art results in prediction of
  the missing ratings in the MovieLens collaborative
  filtering dataset.
  In addition a PAC-Bayesian bound for discrete den-
  sity estimation was derived. We have shown that
  the PAC-Bayesian bound for classification is a spe-
  cial case of the PAC-Bayesian bound for discrete den-
  sity estimation. We further introduced combinatorial
  priors to PAC-Bayesian analysis. The combinatorial
  priors are more appropriate for discrete domains, as
  opposed to Gaussian priors, the latter of which are
  suitable for continuous domains. It was shown that
  combinatorial priors lead to regularization terms in
  the form of mutual information.},
  volume = {2010},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = mar,
  year = {2010},
  slug = {6329},
  author = {Seldin, Y. and Tishby, N.},
  month_numeric = {3}
}