Cluster Identification in Nearest-Neighbor Graphs

PDF PDF

Empirical Inference

Matthias Hein

Statistical Learning Theory

Ulrike von Luxburg

Professor, University of Tübingen
Max Planck Fellow

Empirical Inference

Markus Maier

Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are ``identified&amp;lsquo;&amp;lsquo;: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict ``optimal&amp;lsquo;&amp;lsquo; values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem.

Author(s):	Maier, M. and Hein, M. and von Luxburg, U.
Links:	PDF PDF
Book Title:	ALT 2007
Journal:	Algorithmic Learning Theory: Proceedings of the 18th International Confererence (ALT 2007)
Pages:	196-210
Year:	2007
Month:	October
Day:	0
Editors:	Hutter, M. , R. A. Servedio, E. Takimoto
Publisher:	Springer

BibTeX Type:	Conference Paper (inproceedings)

Address:	Berlin, Germany
DOI:	10.1007/978-3-540-75225-7_18
Event Name:	18th International Conference on Algorithmic Learning Theory
Event Place:	Sendai, Japan

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

BibTeX

@inproceedings{4590,
  title = {Cluster Identification in Nearest-Neighbor Graphs},
  journal = {Algorithmic Learning Theory: Proceedings of the 18th International Confererence (ALT 2007)},
  booktitle = {ALT 2007},
  abstract = {Assume we are given a sample of points from some underlying
  distribution which contains several distinct clusters. Our goal is
  to construct a neighborhood graph on the sample points such that
  clusters are ``identified&amp;amp;lsquo;&amp;amp;lsquo;: that is, the subgraph induced by points
  from the same cluster is connected, while subgraphs corresponding to
  different clusters are not connected to each other. We derive bounds
  on the probability that cluster identification is successful, and
  use them to predict ``optimal&amp;amp;lsquo;&amp;amp;lsquo; values of k for the mutual and
  symmetric k-nearest-neighbor graphs. We point out different
  properties of the mutual and symmetric nearest-neighbor graphs
  related to the cluster identification problem.},
  pages = {196-210},
  editors = {Hutter, M. , R. A. Servedio, E. Takimoto},
  publisher = {Springer},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Berlin, Germany},
  month = oct,
  year = {2007},
  author = {Maier, M. and Hein, M. and von Luxburg, U.},
  doi = {10.1007/978-3-540-75225-7_18},
  month_numeric = {10}
}