Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Course: Neural Networks, Instructor: Professor L.Behera.
Hierarchical Dirichlet Process (HDP)
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
METHODS FOR HAPLOTYPE RECONSTRUCTION
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Sampling distributions of alleles under models of neutral evolution.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Informative SNP Selection Based on Multiple Linear Regression
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Stick-Breaking Constructions
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Bayesian Semi-Parametric Multiple Shrinkage
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Nonparametric Bayesian Learning of Switching Dynamical Processes
Bayesian Generalized Product Partition Model
Omiros Papaspiliopoulos and Gareth O. Roberts
Kernel Stick-Breaking Process
Collapsed Variational Dirichlet Process Mixture Models
Multitask Learning Using Dirichlet Process
Chinese Restaurant Representation Stick-Breaking Construction
The coalescent with recombination (Chapter 5, Part 1)
Outline Cancer Progression Models
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Pier Francesco Palamara, Laurent C. Francioli, Peter R
SNPs and CNPs By: David Wendel.
Yu Zhang, Tianhua Niu, Jun S. Liu 
Presentation transcript:

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August 24, 2006 Paper by E. Xing, K. Sohn, M. Jordan and Y. Teh, ICML 2006

Outline Background Dirichlet Processe mixture Hierarchical Dirichlet Process mixture Application on haplotype inference

Motivation Problem – Uncovering the haplotypes of single nucleotide polymorphisms (SNP) within and between populations. Methods – Coalescence, finite and infinite mixtures, and maximal parsimony. Application –Biological and medical analysis; –Genetic demography study.

Background A SNP haplotype is a list of alleles at contiguous sites in a local region of a single chromosome. A haplotype is inherited as a unit. For diploid organisms, two haplotypes go together to make up a genotype, which is a list of unordered pairs of alleles in a region. Haplotype inference from genotype data can be formulated as a mixture model. HDP mixture is used in this paper.

Dirichlet Processes A single clustering problem can be analyzed as a Dirichlet processes (DP).

DP mixture model G can be looked as an mixture model with infinite components.

DP-Haplotyper denotes the genotype of T contiguous SNPs of individual i from ethnic group j. The corresponding paternal/maternal haplotypes of the individual genotype is denoted by H is assume to be a random perturbation of an ancestral haplotype A, or founder. DP-Haplotyper is a DP mixture model to model a single population group.

Graph model of DP-Haplotyper

Hierarchical Dirichlet Process Each group is modeled as a DP G j and the group-specific DPs are linked via a global DP G 0. G 0 defines the set of mixture components used by all the groups. Different groups share the same set of mixture components (underlying clusters ), but with different mixture proportions.

HDP can be used as the prior distribution over the factors for nested group data. Consider a two-level DPs. G 0 links the child G j DPs and forces them to share components. G j is conditionally independent given G 0 HDP mixture model

HDP – Chinese Restaurant Franchise First level: within each group, DP mixture – –Φ j1, …,Φ j(i-1), i.i.d., r.v., distributed according to G j ; Ѱ j1, …, Ѱ jT j to be the values taken on by Φ j1, …,Φ j(i-1), n jk be # of Φ ji ’ = Ѱ jt, 0<i ’ <i. Second level: across group, sharing clusters –Base measure of each group is a draw from DP: –Ө 1, …, Ө K to be the values taken on by Ѱ j1, …, Ѱ jT j, m k be # of Ѱ jt =Ө k, all j, t.

HDP-Haplotyper model

Parameterization form of the model Underlying mixture component A k := [A k,1, …, A k,T ] – founding haplotype configuration Base measure, where p(A) is uniform distribution and p( ) is a beta distribution. Inheritance model Genotyping model

Gibbs Sampling Gibbs sampling variants includes: Sampling scheme is similar to a two-level urn model: – –

Simulated data 100 individuals from 5 groups (20 each). Each group has 2 shared founders and 3 unique founders, in a total of 17 founders.

Real data International HapMap Project, containing four population of genotypes.

Conclusion The author proposed a HDP mixture model for haplotype inference for multiple populations. HDP prior couples multiple heterogeneous populations and facilitates sharing mixture components across multiple infinite mixture models. In the future, longer SNP sequences will be considered. Also HDP can be generalized to the problem in which the group labels are unknown and to be inferred.