Header logo is


2006


no image
Sampling for non-conjugate infinite latent feature models

Görür, D., Rasmussen, C.

(Editors: Bernardo, J. M.), 8th Valencia International Meeting on Bayesian Statistics (ISBA), June 2006 (talk)

Abstract
Latent variable models are powerful tools to model the underlying structure in data. Infinite latent variable models can be defined using Bayesian nonparametrics. Dirichlet process (DP) models constitute an example of infinite latent class models in which each object is assumed to belong to one of the, mutually exclusive, infinitely many classes. Recently, the Indian buffet process (IBP) has been defined as an extension of the DP. IBP is a distribution over sparse binary matrices with infinitely many columns which can be used as a distribution for non-exclusive features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described, however requiring conjugacy restricts the use of IBP. We describe an MCMC algorithm for non-conjugate IBP models. Modelling the choice behaviour is an important topic in psychology, economics and related fields. Elimination by Aspects (EBA) is a choice model that assumes each alternative has latent features with associated weights that lead to the observed choice outcomes. We formulate a non-parametric version of EBA by using IBP as the prior over the latent binary features. We infer the features of objects that lead to the choice data by using our sampling scheme for inference.

ei

PDF [BibTex]

2006


PDF [BibTex]


no image
An Inventory of Sequence Polymorphisms For Arabidopsis

Clark, R., Ossowski, S., Schweikert, G., Rätsch, G., Shinn, P., Zeller, G., Warthmann, N., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Ecker, J., Weigel, D.

17th International Conference on Arabidopsis Research, April 2006 (talk)

Abstract
We have used high-density oligonucleotide arrays to characterize common sequence variation in 20 wild strains of Arabidopsis thaliana that were chosen for maximal genetic diversity. Both strands of each possible SNP of the 119 Mb reference genome were represented on the arrays, which were hybridized with whole genome, isothermally amplified DNA to minimize ascertainment biases. Using two complementary approaches, a model based algorithm, and a newly developed machine learning method, we identified over 550,000 SNPs with a false discovery rate of ~ 0.03 (average of 1 SNP for every 216 bp of the genome). A heuristic algorithm predicted in addition ~700 highly polymorphic or deleted regions per accession. Over 700 predicted polymorphisms with major functional effects (e.g., premature stop codons, or deletions of coding sequence) were validated by dideoxy sequencing. Using this data set, we provide the first systematic description of the types of genes that harbor major effect polymorphisms in natural populations at moderate allele frequencies. The data also provide an unprecedented resource for the study of genetic variation in an experimentally tractable, multicellular model organism.

ei

[BibTex]

[BibTex]