A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Hierarchical Dirichlet Process (HDP)
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Sampling distributions of alleles under models of neutral evolution.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.
Machine Learning CMPT 726 Simon Fraser University
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Empirical Bayes approaches to thresholding Bernard Silverman, University of Bristol (joint work with Iain Johnstone, Stanford) IMS meeting 30 July 2002.
DNA Analysis Techniques for Molecular Genealogy Luke Hutchison Project Supervisor: Scott R. Woodward.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
CS Statistical Machine learning Lecture 24
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
© 2007 Thomson Brooks/Cole, a part of The Thomson Corporation. FIGURES FOR CHAPTER 8 ESTIMATION OF PARAMETERS AND FITTING OF PROBABILITY DISTRIBUTIONS.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Constrained Hidden Markov Models for Population-based Haplotyping
Bayesian Generalized Product Partition Model
Computer vision: models, learning and inference
Michael Epstein, Ben Calderhead, Mark A. Girolami, Lucia G. Sivilotti 
Imputation-based local ancestry inference in admixed populations
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Mohammed El-Kebir, Gryte Satas, Layla Oesper, Benjamin J. Raphael 
Model-free Estimation of Recent Genetic Relatedness
David H. Spencer, Kerry L. Bubb, Maynard V. Olson 
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
A Switching Observer for Human Perceptual Estimation
Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1  Yuzhi Chen, Eyal Seidemann  Neuron  Volume.
Nonparametric Bayesian Texture Learning and Synthesis
Keegan E. Hines, John R. Bankston, Richard W. Aldrich 
Volume 173, Issue 1, Pages e9 (March 2018)
Haplotype Fine Mapping by Evolutionary Trees
A Switching Observer for Human Perceptual Estimation
Biointelligence Laboratory, Seoul National University
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History  Charla A. Lambert, Caitlin F.
Complex Signatures of Natural Selection at the Duffy Blood Group Locus
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
Pier Francesco Palamara, Todd Lencz, Ariel Darvasi, Itsik Pe’er 
by Kenneth W. Latimer, Jacob L. Yates, Miriam L. R
High-Definition Reconstruction of Clonal Composition in Cancer
Yu Zhang, Tianhua Niu, Jun S. Liu 
Tao Wang, Robert C. Elston  The American Journal of Human Genetics 
Modeling Endoplasmic Reticulum Network Maintenance in a Plant Cell
Reward associations do not explain transitive inference performance in monkeys by Greg Jensen, Yelda Alkan, Vincent P. Ferrera, and Herbert S. Terrace.
An Introduction to Infinite HMMs for Single-Molecule Data Analysis
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University February 26, 2007 Paper by E. P. Xing and K-A. Sohn

Outline Terminology and Introduction DP Mixtures for Non-recombination Inheritance HMDP for Recombination Results Conclusions

Allele: a viable DNA coding on a chromosome – observation Locus : the location of an allele – index of an observation Haplotype: a sequence of alleles – data sequence Recombination: exchange pieces of paired chromosome – state-transition Mutation: any change to a haplotype during inheritance – emission Terminology and Introduction (1)

Terminology and Introduction (2) Ancestors Descendants

Terminology and Introduction (3) Problems: 1. Ancestral inference: recovering ancestral haplotypes; 2. Recombination analysis: inferring the recombination hotspots; 3. Ancestral mapping: inferring the ancestral origin of each allele in each modern haplotype.

DP Mixtures for Non-recombination Inheritance (1) Non-recombination: Only mutation may occur during inheritance; Each modern haplotype is originated from a single ancestor. Only true for haplotypes spanning a short region in a chromosome.

DP Mixtures for Non-recombination Inheritance (2) where, the distinct values of, denote the joint of the k th ancestor and the mutation parameter corresponding to the k th ancestor.

DP Mixtures for Non-recombination Inheritance (3)

HMDP for Recombination (1) For long haplotypes possibly bearing multiple ancestors, we consider recombinations (state-transitions across discrete space-interval).

 Each row of the transition matrix in HMM is a DP. Also these DPs are linked by the top level master DP, and have the same set of target states.  The mixing proportions for each lower level DP are denoted as, then the j th row of the transition matrix is. HMDP for Recombination (2)

HMDP for Recombination (3) Modern haplotype Ancestor haplotype The indicators of ith modern haplotype for all the loci, which specify the corresponding ancestral haplotype when no recombination takes place during the inheritance process producing haplotype H i, when a recombination occurs between loci t and t+1,

HMDP for Recombination (4) Introduce a Poisson point process to control the duration of non-recombinant inheritance (space-inhomogeneous) Denote d: the physical distance between loci t and t+1 ; r: recombination rate per unit distance. Then x-the number of recombinations

HMDP for Recombination (5) Combine with the standard stationary HMDP, the non-stationary state transition probability: While d or r goes to infinity,,, the inhomogeneous HMDP model goes back to a standard HMDP.

HMDP for Recombination (6) Inference: The emission function: where The prior base: uniform Integrate over, the marginal likelihood:

HMDP for Recombination (7) Inference: Two sampling stages: 1.Sample given all haplotypes h and the most recently sampled ancestor pool a; 2.Sample every ancestor A k given all haplotypes h and the current Combine the HDP prior and the marginal likelihood, we can infer the posterior for and, which are the variables of interest.

Results (1) Simulated data: 30 populations, each includes 200 haplotypes from K=5 ancestral haplotypes. T=100 Compare: HMDP, HMMs with K=3,5 and 10 The average ancestor reconstruction errors for the five ancestors Even the HMM with K=5 cannot beat the HMDP

Results (2) Box plot of the empirical recombination rates The vertical gray lines - the pre-specified recombination hotspots Threshold 1 Threshold 2

Results (3) Population maps: 1. true map; 2. HMDP; 3-5. HMMs with K=3,5,10 Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype. Measure for accuracy: the mean squared distance to the true map

Results (4) Real haplotype data sets 1: Daly data – single population 512 haplotypes. T=103 Bottom: empirical recombination rates Upper vertical lines: recombination hotspots. Red dotted lines: HMM; blue dashed lines: MDL; black solid lines: HMDP

Results (5) A Gaussian mixture fitting of empirical recombination rates Choose the threshold

Results (6) Estimated population map Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype.

Conclusions This HMDP model is an application and extension of the HDP into the population genetics field; The HDP allows the space of states in HMM to be infinite so that it is suitable for inferring unknown number of ancestral haplotypes; The HMDP model also allows the recombination rates to be non-stationary; The HMDP model can jointly infer a number of important genetic variables.