. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.

Slides:



Advertisements
Similar presentations
Tutorial #8 by Ma’ayan Fishelson. Computational Difficulties Algorithms that perform multipoint likelihood computations sum over all the possible ordered.
Advertisements

. Exact Inference in Bayesian Networks Lecture 9.
Unit 5 Genetics Terry Kotrla, MS, MT(ASCP)BB. Terminology  Genes  Chromosomes  Autosome  Sex chromosome  Locus  Alleles  Homozygous  Heterozygous.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Statistical NLP: Lecture 11
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.
What is a chromosome?.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
Parametric and Non-Parametric analysis of complex diseases Lecture #8
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Linkage stuff Vibhav Gogate. A Review of the Genetic Model X1X1 X2X3Xi-1XiXi+1Y1Y1 Y2Y2 Y3Y3 Y i-1 YiYi Y i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Tutorial #5 by Ma’ayan Fishelson
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
Bellringer November 19th Need compbook and pencil! Copy the following into your agenda 1. I can describe the relationship among genes, chromosomes, and.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
1 Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Pedigree Analysis.
Non-Mendelian Genetics
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Markov Chain Monte Carlo Hadas Barkay Anat Hashavit.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Dominant and Recessive Dominance Table 3. Alleles sequence of DNA any of several forms of a gene determine the genotype (genetic constitution of an organism.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
9 Genes, chromosomes and patterns of inheritance.
Living things inherit traits in patterns
1 HMM in crosses and small pedigrees Lecture 8, Statistics 246, February 17, 2004.
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
**An Austrian monk who was the first person to observe different inherited traits such as color and height using the reproduction of pea plants I’m a.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Genes – Basics A gene is an inherited instruction consisting of a sequence of DNA. Genes control a variety of structures and functions. Human genetic material.
Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Recombination (Crossing Over)
Phenotype the set of observable characteristics of an individual resulting from their DNA information.
Error Checking for Linkage Analyses
7.L.4A.3 Develop and use models (Punnett squares) to describe and predict patterns of the inheritance of single genetic traits from parent to offspring.
Basic Model For Genetic Linkage Analysis Lecture #3
Linkage Analysis Problems
Genetic linkage analysis
Presentation transcript:

. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger

2 Using the Maximum Likelihood Approach The probability of pedigree data Pr(data |  ) is a function of the known and unknown recombination fractions denoted collectively by . How can we construct this likelihood function ? The maximum likelihood approach is to seek the value of  which maximizes the likelihood function Pr(data |  ). This is the ML estimate.

3 Constructing the Likelihood function L ijm = Maternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i. First, we need to determine the variables that describe the problem. There are many possible choices. Some variables we can observe and some we cannot. X ij = Unordered allele pair at locus i of person j. The values are pairs of i th -locus alleles (l i,l’ i ). L ijf = Paternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i (Same as for L ijm ). As a starting point, We assume that the data consists of an assignment to a subset of the variables {X ij }. In other words some (or all) persons are genotyped at some (or all) loci.

4 What is the relationships among the variables for a specific individual ? L 11f L 11m X 11 Paternal allele at locus 1 of person 1 Unordered allele pair at locus 1 of person 1 = data Maternal allele at locus 1 of person 1 P(L 11m = a) is the frequency of allele a. We use lower case letters for states writing, in short, P(l 11m ). P(x 11 | l 11m, l 11f ) = 0 or 1 depending on consistency

5 What is the relationships among the variables across individuals ? L 11f L 11m L 13m X 11 P(l 13m | l 11m, l 11f ) = 1/2 if l 13m = l 11m or l 13m = l 11f P(l 13m | l 11m, l 11f ) = 0 otherwise L 12f L 12m L 13f X 12 X 13 First attempt: correct but not efficient as we shall see. Mother Father Offspring

6 Probabilistic model for two loci L 11f L 11m L 13m X 11 L 12f L 12m L 13f X 12 X 13 Model for locus 1 L 21f L 21m L 23m X 21 L 22f L 22m L 23f X 22 X 23 Model for locus 2 L 23m depends on whether L 13m got the value from L 11m or L 11f, whether a recombination occurred, and on the values of L 21m and L 21f. This is quite complex.

7 Adding a selector variable L 11f L 11m L 13m X 11 S 13m Selector of maternal allele at locus 1 of person 3 Maternal allele at locus 1 of person 3 (offspring) Selector variables S ijm are 0 or 1 depending on whose allele is transmitted to offspring i at maternal locus j. P(s 13m ) = ½ P(l 13m | l 11m, l 11f,,S 13m =0) = 1 if l 13m = l 11m P(l 13m | l 11m, l 11f,,S 13m =1) = 1 if l 13m = l 11f P(l 13m | l 11m, l 11f,,s 13m ) = 0 otherwise

8 Probabilistic model for two loci S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13 Model for locus 1 S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2

9 Probabilistic Model for Recombination S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13  is the recombination fraction between loci 2 & 1.

10 Constructing the likelihood function I P(l 11m, l 11f,, x 11, s 13m,l 13m ) = P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) P(s 13m ) P(l 13m | s 13m, l 11m, l 11f ) Joint probability Prob(data) = P(x 11 ) =  l11m  l11f  s13m  l13m P(l 11m, l 11f,, x 11, s 13m,l 13m ) Probability of data (sum over all states of all hidden variables) All other variables are not-observed (hidden) Observed variable S 13m L 11f L 11m L 13m X 11

11 Constructing the likelihood function II = P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) … P(s 13m ) P(s 13f ) P(s 23m | s 13m,  ) P(s 23m | s 13m,  ) P(l 11m,l 11f,x 11,l 12m,l 12f,x 12,l 13m,l 13f,x 13, l 21m,l 21f,x 21,l 22m,l 22f,x 22,l 23m,l 23f,x 23, s 13m,s 13f,s 23m,s 23f,  ) = Product over all local probability tables Prob(data|  2 ) = P(x 11, x 12, x 13, x 21, x 22, x 23 ) = Probability of data (sum over all states of all hidden variables) Prob(data|  2 ) = P(x 11, x 12, x 13, x 21, x 22, x 23 ) =  l11m, l11f … s23f [ P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) … P(s 13m ) P(s 13f ) P(s 23m | s 13m,  ) P(s 23m | s 13m,  ) ] The result is a function of the recombination fraction. The ML estimate is the  value that maximizes this function.

12 The Disease Locus I L 11f L 11m L 13m X 11 S 13m Phenotype variables Y ij are 0 or 1 depending on whether a phenotypic trait associated with locus i of person j is observed. E.g., sick versus healthy. For example model of perfect recessive disease yields the penetrance probabilities: P(y 11 = sick | X 11 = (a,a)) = 1 P(y 11 = sick | X 11 = (A,a)) = 0 P(y 11 = sick | X 11 = (A,A)) = 0 Y 11

13 The Disease Locus II L 11f L 11m L 13m X 11 S 13m Note that in this model we assume the phenotype/disease depends only on the alleles of one locus. Also we did not model levels of sickness. Y 11

14 Introducing a tentative disease Locus S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13 The recombination fraction  is unknown. Finding it can help determine whether a gene causing the disease lies in the vicinity of the marker locus. Disease locus: assume sick means x ij =(a,a) Marker locus Y 22 Y 21 Y 23

15 Locus-by-Locus Summation order Sum over locus i vars before summing over locus i+1 vars Sum over orange vars (L ijt ) before summing selector vars (S ijt ). This order yields a Hidden Markov Model (HMM).

16 Hidden Markov Models in General Application in communication: message sent is (s 1,…,s m ) but we receive (r 1,…,r m ). Compute what is the most likely message sent ? Application in speech recognition: word said is (s 1,…,s m ) but we recorded (r 1,…,r m ). Compute what is the most likely word said ? Application in Genetic linkage analysis: to be discussed now. X1X1 X2X3Xi-1XiXi+1R1R1 R2R2 R3R3 R i-1 RiRi R i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 Which depicts the factorization:

17 Hidden Markov Model In our case X1X1 X2X3Xi-1XiXi+1 X1X1 X2X2 X3X3 Y i-1 XiXi X i+1 X1X1 X2X3Xi-1XiXi+1 S1S1 S2S2 S3S3 S i-1 SiSi S i+1 The compounded variable S i = (S i,1,m,…,S i,2n,f ) is called the inheritance vector. It has 2 2n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable X i = (X i,1,m,…,X i,2n,f ) is the data regarding locus i. Similarly for the disease locus we use Y i. To specify the HMM we need to write down the transition matrices from S i-1 to S i and the matrices P(x i |S i ). Note that these quantities have already been implicitly defined.

18 The transition matrix Recall that: Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 2 2n  2 2n, encoding four options of recombination/non-recombination for the two parental meiosis: (The Kronecker product) For n non-founders, the transition matrix is the n-fold Kronecker product:

19 Probability of data in one locus given an inheritance vector S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2 P(x 21, x 22, x 23 |s 23m,s 23f ) = =  P(l 21m ) P(l 21f ) P(l 22m ) P(l 22f ) P(x 21 | l 21m, l 21f ) P(x 22 | l 22m, l 22f ) P(x 23 | l 23m, l 23f ) P(l 23m | l 21m, l 21f, S 23m ) P(l 23f | l 22m, l 22f, S 23f ) l 21m,l 21f,l 22m,l 22f l 22m,l 22f The five last terms are always zero-or-one, namely, indicator functions.