Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
A Tutorial on Learning with Bayesian Networks
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Bayesian Methods with Monte Carlo Markov Chains III
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Quantitative Genetics
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Probability and Statistics of DNA Fingerprinting.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Department of Geography, Florida State University
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
AM Recitation 2/10/11.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Statistical Decision Theory
Model Inference and Averaging
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Bayesian Analysis and Applications of A Cure Rate Model.
Corinne Introduction/Overview & Examples (behavioral) Giorgia functional Brain Imaging Examples, Fixed Effects Analysis vs. Random Effects Analysis Models.
MStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Lecture 13: Population Structure October 5, 2015.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Computational Identification of Tumor heterogeneity
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Lecture 13: Population Structure
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Correlation & Regression Analysis
Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Computing with R & Bayesian Statistical Inference P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/11/2016: Lecture 02-1.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
CHAPTER 8 Estimating with Confidence
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
OVERVIEW OF BAYESIAN INFERENCE: PART 1
PSY 626: Bayesian Statistics for Psychological Science
Product moment correlation
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Parametric Methods Berlin Chen, 2005 References:
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.
Bayesian vision Nisheeth 14th February 2019.
Jonathan K. Pritchard, Joseph K. Pickrell, Graham Coop  Current Biology 
Genotype distance distribution summaries for soft and hard sweeps.
Applied Statistics and Probability for Engineers
Presentation transcript:

Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler

Geographical genetics is the field of population genetics that focuses on describing the distribution of genetic variation within and among populations and understanding the processes that produce those patterns. Statistical sampling uncertainty arises from the process of constructing allele frequency estimates from population samples. Genetic sampling uncertainty arises from the underlying stochastic evolutionary process that gave rise to the population we sampled. –Note: increasing the sample size of alleles with each population reduces statistical uncertainty, but it cannot reduce the magnitude of genetic uncertainty. Weir and Cockerham approach is the most widely used approach for analysis of genetic diversity in hierarchically structured populations. Bayesian approach provides a model-based approach to inference that is enormously powerful and flexible. Hierarchical Bayesian models provide a natural approach to inference in geographical genetics.

Weir and Cockerham Approach To illustrate the formalism, consider a set of populations segregating for 2 alleles, A 1 and A 2 at a single locus p k frequency of allele at A 1 X ij,k frequency of genotype A i A j in the k th population k=1,…,K Variance F st can be interpreted as the fraction of genetic diversity due to differences in allele frequencies among populations. where and

Hierarchical Bayesian Models A hierarchical Bayesian model uses the full power of the data for simultaneous estimators of the parameters while accounting for both statistical and genetic uncertainty. To account for statistical uncertainty assume that alleles are sampled independently within populations. Also assume the samples are drawn independently across loci and population. Likelihood of the sample from a single population is binomial.

To account for genetic uncertainty we must assume a parametric form for the among-population allele frequency distribution. It is natural to assume that population allele frequencies follow a Beta distribution, where E(p ik ) = π and Var(p ik ) = θπ(1 - π). Thus, θ is equivalent to F st. The posterior distribution for the parameters is where P(π i ) and P(θ) are the prior distributions for π i and θ, respectively.

To estimate the correlation of allele frequencies within loci, we need to add an additional level to the hierarchy that describes the distribution of mean allele frequencies across loci P(π i | π,θ y ). Regard the loci in the sample as a sample from a larger universe of loci from which we might have sampled. Regard the populations in our sample as a sample from a larger universe of populations from which we might have sampled. The likelihood is unchanged. The posterior becomes where is the Beta distribution for θ x, and is the Beta distribution for θ y. A fully hierarchical model

Developing an MCMC sampler The process begins by picking an initial value for p, called p 0, then p 0 is updated until we have a large sample of values p t using either –Metropolis-Hastings algorithm (Figure 2.2) –Slice algorithm (Figure 2.3) Estimate any property of the posterior to an arbitrary degree of accuracy. Ensure that the MC has converged the values from an initial burn-in period are discarded. Values retained from the following sample period represent the full posterior distribution and summary statistics are calculated directly from this sample. Reduce the autocorrelation of values in the sample, it is sometimes useful to thin the sample.