Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Estimation of Means and Proportions
Mapping genes with LOD score method
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Chapter 10 Parameter Estimation. Alternatives to Hypothesis Testing? Some people say that the analysis I just presented, as well as some other things,
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
Joint Linkage and Linkage Disequilibrium Mapping
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
1. Estimation ESTIMATION.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Sample size computations Petter Mostad
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Evaluating Hypotheses
Inference about a Mean Part II
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Random Sampling, Point Estimation and Maximum Likelihood.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Confidence intervals and hypothesis testing Petter Mostad
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
LECTURE 25 THURSDAY, 19 NOVEMBER STA291 Fall
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
© 2004 Prentice-Hall, Inc.Chap 9-1 Basic Business Statistics (9 th Edition) Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Machine Learning 5. Parametric Methods.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
1 Probability and Statistics Confidence Intervals.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Sample Size Needed to Achieve High Confidence (Means)
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Pairwise comparisons: Confidence intervals Multiple comparisons Marina Bogomolov and Gili Baumer.
Christopher, Anna, and Casey
Estimating standard error using bootstrap
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses
9.3 Hypothesis Tests for Population Proportions
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Classification of unlabeled data:
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
CONCEPTS OF ESTIMATION
Chapter 7 Sampling Distributions.
CHAPTER 22: Inference about a Population Proportion
Discrete Event Simulation - 4
EM for Inference in MV Data
Chapter 7 Sampling Distributions.
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
Chapter 7 Sampling Distributions.
EM for Inference in MV Data
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Chapter 7 Sampling Distributions.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
How Confident Are You?.
Presentation transcript:

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density

True Multi-Locus Mapping  True multi-locus mapping would use all the data to build an order and distance between loci. BUT...  Large number of unknown parameters.  There are 2 l-1 gamete types and the sample size is usually not large enough to populate all of these types.  Computationally intensive as there are l!/2 possible orders.

Least Squares Method  r ij is the recombination fraction between loci i and j.  M ij is the map distance between loci i and j.  s rij is the standard deviation of r ij.  m i is the map distance between loci i and i+1. m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 r 12 r 23 r 34 r 45 r 56 r recomb. fraction map distance

Least Squares Method (cont)

Least Squares: Haldane Map Function  Recall the map function.  Find the inverse map function F(  ).  Take the first derivative of F(  ).  Plug first derivative into approximate formula for S M.

Least Squares: Kosambi Map Function  Recall the inverse map function F(  ).  Take the first derivative of F(  ).  Plug first derivative into approximate formula for S M.

Least Squares Method (cont)

Least Squares: Data Markersr M HaldaneExpected (0.03)0.11 (0.038)m1m (0.04)0.18 (0.057)m2m (0.13)0.46 (0.325)m 1+ m 2

Least Squares: Calculation

Least Squares: Variance Estimation

Least Squares: Variance Calculation

Why is this Least Squares?

Alternative Weighting  Use LOD score for linkage as weight. Then the equation becomes:

EM Algorithm (Lander-Green)  Make an initial guess for  0 = (  1,  2,...,  l-1 ).  E Step: Compute the expected number of recombinants for each interval assuming current  old.  M Step: Treating the expected values as true, compute maximum likelihood estimate  new.  Iterate EM until likelihood converges.

EM Algorithm ABBCAC True recombination fraction 11 22 True number of recombinantst1t1 t2t2 Total observed gametesN 12 N 23 N 13 Number observed recombinantsR 12 R 23 R 13

EM Algorithm: E Step  t 1 = R 12 + P(rec. in AB | rec. in AC)R 13 + P(rec. in AB | no rec. in AC)(N 13 – R 13 )  t 2 = R 23 + P(rec. in BC | rec. in AC)R 13 + P(rec. in BC | no rec. in AC)(N 13 – R 13 )

EM Algorithm: E Step (cont)

EM Algorithm: M Step

Simulation  Find map function which fits the data well by comparing the likelihoods of the data.  Distribution of likelihood difference is unknown, so simulation is needed to obtain it empirically.

Simulation: Evidence for Interference  Recall that if you are given pairwise recombination fractions  ij and a map function, you know how to find the gametic frequencies .  Then the log likelihood is given by (m = 2 l-1 )

Simulation: Implementation  To simulate under the null hypothesis of no interference, we assume the neighbor pairwise recombination fractions and simulate gametes under the assumption of no interference

Marker Coverage and Map Density  Proportion of genome covered by markers is the marker coverage. It is simply the genomic map length divided by total genome length.  The maximum genome segment between two adjacent markers is an indicator of map density. It is the average or maximum map distance between two adjacent markers.

Random Distribution of Markers  Markers are generally assumed to be distributed randomly throughout the genome.  Nonrandom distribution will generally decrease coverage and lower density.  Unfortunately markers may be non- randomly distributed. Name some reasons.

Mapping Population  Even if you have many markers, if your sample is small you may have insufficient information to achieve high coverage and density.  Unattached genome segments are most common coverage problem.  Solutions: increase sample size or using mapping population with more information (greater polymorphism).

Data Analysis and Models  Wrong gene order can overestimate the map length thus overestimating map coverage and underestimating density.  The wrong mapping function may convert recombination fractions into the wrong map distance, causing over/underestimation.  Different grouping criteria can lead to different linkage groups. The more stringent, the more linkage groups and the lower the coverage and higher the density.

Prediction of Marker Coverage and Density  A method for predicting marker coverage and density are based on the assumption of random distribution: confidence probability P is the probability that at least one marker is located in a 2d M genome segment.

Calculations  Suppose the genome is a total L long.  P(a marker not fall on 2d segment) = 1-2d/L.  P(n markers don’t fall on 2d segment) = (1- 2d/L) n.

Calculations  P(at least one marker on 2d segment) = 1-(1- 2d/L) n

Calculations  When 2d/L < 0.1, then

Predicted Number of Markers Needed

Prediction when Genome Length Unknown  Use all (500) markers to estimate a genetic map and assume the genome length is the length of this map, say L 500.  Randomly draw 100 markers from the dataset with replacement. Estimate the genome length for 100 makers only, say L 100.

Advantages of the Simulation Approach  No assumptions on marker distribution needed.  No prior information about actual genome length is needed.  Approach can be used to test other factors that might affect marker coverage along as those factors can be resampled.

Summary  Least squares method for building genetic maps.  EM algorithm method for building genetic maps.  Simulated likelihood ratio statistic distribution for hypothesis tests.  Predicting marker coverage and density.