Modeling data using directional distributions: Part II

Institute Homepage

Institute Homepage DE Sign In

Back

Empirical Inference Technical Report 2007

PDF

Empirical Inference

Suvrit Sra

High-dimensional data is central to most data mining applications, and only recently has it been modeled via directional distributions. In [Banerjee et al., 2003] the authors introduced the use of the von Mises-Fisher (vMF) distribution for modeling high-dimensional directional data, particularly for text and gene expression analysis. The vMF distribution is one of the simplest directional distributions. TheWatson, Bingham, and Fisher-Bingham distributions provide distri- butions with an increasing number of parameters and thereby commensurately increased modeling power. This report provides a followup study to the initial development in [Banerjee et al., 2003] by presenting Expectation Maximization (EM) procedures for estimating parameters of a mixture of Watson (moW) distributions. The numerical challenges associated with parameter estimation for both of these distributions are significantly more difficult than for the vMF distribution. We develop new numerical approximations for estimating the parameters permitting us to model real- life data more accurately. Our experimental results establish that for certain data sets improved modeling power translates into better results.

Author(s):	Sra, S. and Jain, P. and Dhillon, I.
Links:	PDF
Number (issue):	TR-07-05
Year:	2007
Month:	February
Day:	0

Bibtex Type:	Technical Report (techreport)

Digital:	0
Electronic Archiving:	grant_archive
Institution:	University of Texas, Austin, TX, USA
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

BibTex

@techreport{5506,
  title = {Modeling data using directional distributions: Part II},
  abstract = {High-dimensional data is central to most data mining applications, and only recently has it
  been modeled via directional distributions. In [Banerjee et al., 2003] the authors introduced the
  use of the von Mises-Fisher (vMF) distribution for modeling high-dimensional directional data,
  particularly for text and gene expression analysis. The vMF distribution is one of the simplest
  directional distributions. TheWatson, Bingham, and Fisher-Bingham distributions provide distri-
  butions with an increasing number of parameters and thereby commensurately increased modeling
  power. This report provides a followup study to the initial development in [Banerjee et al., 2003]
  by presenting Expectation Maximization (EM) procedures for estimating parameters of a mixture
  of Watson (moW) distributions. The numerical challenges associated with parameter estimation
  for both of these distributions are significantly more difficult than for the vMF distribution. We
  develop new numerical approximations for estimating the parameters permitting us to model real-
  life data more accurately. Our experimental results establish that for certain data sets improved
  modeling power translates into better results.},
  number = {TR-07-05},
  organization = {Max-Planck-Gesellschaft},
  institution = {University of Texas, Austin, TX, USA},
  school = {Biologische Kybernetik},
  month = feb,
  year = {2007},
  slug = {5506},
  author = {Sra, S. and Jain, P. and Dhillon, I.},
  month_numeric = {2}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives