Lecture 15: Linkage Analysis VII

Slides:



Advertisements
Similar presentations
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.
Advertisements

Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Joint Linkage and Linkage Disequilibrium Mapping
Visual Recognition Tutorial
Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
MMLS-C By : Laurence Bisht References : The Power to Detect Linkage in Complex Diseases Means of Simple LOD-score Analyses. By David A.,Paula Abreu and.
Elementary hypothesis testing
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
Lecture 5: Learning models using EM
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
Introduction to Interference By: Nickolay Dovgolevsky Itai Sharon 29/05/03.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
Testing Hypotheses about Differences among Several Means.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Recombination and Linkage
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 HMM in crosses and small pedigrees Lecture 8, Statistics 246, February 17, 2004.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Lecture 22: Quantitative Traits II
Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
Other Models for Time Series. The Hidden Markov Model (HMM)
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
BINARY LOGISTIC REGRESSION
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Hidden Markov Models Part 2: Algorithms
Mathematical Foundations of BME Reza Shadmehr
I. Statistical Tests: Why do we use them? What do they involve?
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
Presentation transcript:

Lecture 15: Linkage Analysis VII Date: 10/14/02 Correction: power calculation Lander-Green Algorithm (Titles on updated or added slides highlighted)

Sample Size Calculation What is the sample size needed in order to achieve a particular statistical power for an estimate? We shall assume the relevant statistic is distributed as chi-square statistic.

Sample Size Calculation (cont.) g is the statistical power is the critical value to reject H0 with significance level a. c is the non-centrality parameter, usually the expectation of the log-likelihood ratio test statistic under particular HA and experimental conditions. df is the degrees of freedom

Sample Size Calculation (cont.)

Modeling Test your modeling skills. Propose a model for the following family ascertainment situation. What if you knew that probands were detected independently and with the same probability in each family, except all secondary probands are more easily detected (second, third, etc all to the same degree) than the first proband in a family. The model formulation and calculation of pr probabilities for families with 3 affected are now posted to the website.

Lander-Green Algorithm Like the Elston-Stewart algorithm, the Lander-Green algorithm models the pedigree and data as a Hidden Markov Model (HMM), except that the hidden states are the so-called inheritance vectors. Like the Elston-Stewart algorithm, the Lander-Green algorithm assumes that there is no interference.

LG – (Dis)Advantages The Lander-Green algorithm is linear in the number of loci and exponential in the number of members in the pedigree. Recall that the Elston-Stewart algorithm is complementary, linear in the number of members, but exponential in the number of loci. Simulation methods (MCMC in particular) are used to deal with pedigrees with both high numbers of members and loci.

LG – Inheritance Vector The inheritance vector is a vector defined for each locus i in the dataset. It is a binary vector with two components for each non-founder individual in the pedigree. Thus, it is of length 2(n – f). The entry in the inheritance vector is 0 if the individual’s allele at that position is grandmaternal. If grandpaternal, it is 1. There are 22(n – f) possible inheritance vectors for each locus.

LG – Inheritance Vector (cont) The inheritance vector holds information about the number of crossovers that occurred to produce each non-founder in the population. Thus, it is appropriate for estimating recombination fractions as is our goal here with the LG algorithm.

LG – Inheritance Vector Example Gamete v 4M 0|1 4P 5M 5P 7M 1 7P 8M 8P 9M 9P 1 2 AA aa 3 4 5 6 aa aA aa aA 9 7 8 Aa aa Aa

LG – Simplification by Conditioning Fortunately, conditional on the inheritance vectors, the genotypes of each offspring are independent. Of course, conditional on the genotype, the phenotype probabilities are independent. Thus, we can calculate the probability for each individual in the pedigree independently of the others once we condition on the inheritance vectors.

LG – Hidden States The inheritance vector constitutes the unknown hidden state for each allele. We must define transition probabilities among the hidden states (from locus-to-locus). Begin, by considering the transition probability between loci within a single individual, where the inheritance vector is of length 2. Therefore, the hidden state at each locus is a binary vector of length 2.

LG – Initial State We must define the initial state of the first marker locus. Prior to viewing the genotypes, all inheritance vectors are equally likely. Assume the initial state of the inheritance vector at marker 1 is uniform over {(0,0), (1,0), (0,1), (1,1)}, where we list the maternal status first. In other words, marker 1 has ¼ probability of being in each of these possible states.

LG – Pairwise Transition Probabilities Because of the assumption of no interference, the transition probabilities from the state at locus i to the state at locus i+1 are given by: where qi is the recombination fraction between locus i and locus i+1.

LG – Switch in Notation From this point on, assume there are n non-founders (rather than n – f). The reason for this change is simplification of the equations.

LG – Inheritance Vector Transition Probabilities The transition probabilities between inheritance vectors defined on full pedigrees with n relevant members, are given by where d(v,w) is the Hamming distance between inheritance vectors v and w, i.e. the number of discordances between them.

LG – Forward Variable

LG – Backward Variable

LG – xi(v,w) transition probability penetrance parameter

LG – Baum’s Lemma Baum’s Lemma: Let If then

LG – Proof of Baum’s Lemma

LG – Jensen’s Inequality

LG – EM Algorithm We maximize Q(q,q’) over q’ to maximize the likelihood P(O|q) conditional on the current parameter estimates q. This may sound familiar. It is the M step of the EM algorithm, and the EM algorithm is how we maximize q over a pedigree. Details are shown below. Maximization is the difficult step. We show it first.

LG - Maximization Key step: by conditional independence, this probability becomes a product of conditional probabilities.

LG - Maximization

LG – EM Agorithm (M Step)

LG – EM Algorithm (E Step) the usual conditional probabilities needed to calculate expectation sum over all pairs of inheritance vectors

Heterogeneity in Recombination Fraction Allow for two recombination fraction parameters in each interval. Allow for one recombination fraction in each interval and a universal constant relating male and female recombination fractions. Use nested models to test for evidence of sex-based differences.

Model Misspecification Penetrance parameters, allele frequencies may be incorrectly specified. The model is robust to misspecification such that the false positive rate for linkage is unaffected by misspecification of these parameters.

Model Misspecification and Ascertainment When ascertainment is made independent of disease state and marker loci, the method remains robust to misspecification in both. When ascertainment is made with respect to disease state, then the method is robust to misspecification of the disease parameters.

Effects on Power Power in two-point linkage analysis is largely unaffected as long as the dominance is specified correctly. Multipoint linkage analysis is much more sensitive to misspecification of the model. However, there is more information when model parameters are jointly estimated along with position.