Empirical Inference Article 2002

Tracking a Small Set of Experts by Mixing Past Posteriors

PDF PostScript
no image
Empirical Inference

In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k+1 sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size m. Since k >> m, the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying log n for choosing the best expert in each section we first pay log (n choose m) bits in the bounds for identifying the pool of m experts and then log m bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.

Author(s): Bousquet, O. and Warmuth, M.
Links:
Journal: Journal of Machine Learning Research
Volume: 3
Pages: 363-396
Year: 2002
Day: 0
Editors: Long, P.
Bibtex Type: Article (article)
Digital: 0
Electronic Archiving: grant_archive
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

BibTex

@article{1440,
  title = {Tracking a Small Set of Experts by Mixing Past Posteriors},
  journal = {Journal of Machine Learning Research},
  abstract = {In this paper, we examine on-line learning problems in which the target
  concept is allowed to change over time. In each trial a master algorithm
  receives predictions from a large set of n experts. Its goal is to predict
  almost as well as the best sequence of such experts chosen off-line by
  partitioning the training sequence into k+1 sections and then choosing
  the best expert for each section. We build on methods developed by
  Herbster and Warmuth and consider an open problem posed by
  Freund where the experts in the best partition are from a small
  pool of size m.
  Since k >> m, the best expert shifts back and forth
  between the experts of the small pool.
  We propose algorithms that solve
  this open problem by mixing the past posteriors maintained by the master
  algorithm. We relate the number of bits needed for encoding the best
  partition to the loss bounds of the algorithms. 
  Instead of paying log n for
  choosing the best expert in each section we first pay log (n choose m)
  bits in the bounds for identifying the pool of m experts 
  and then log m bits per new section. 
  In the bounds we also pay twice for encoding the
  boundaries of the sections.},
  volume = {3},
  pages = {363-396},
  editors = {Long, P.},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  year = {2002},
  slug = {1440},
  author = {Bousquet, O. and Warmuth, M.}
}