Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.

Similar presentations


Presentation on theme: ". Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger."— Presentation transcript:

1 . Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger

2 2 Using the Maximum Likelihood Approach The probability of pedigree data Pr(data |  ) is a function of the known and unknown recombination fractions denoted collectively by . How can we construct this likelihood function ? The maximum likelihood approach is to seek the value of  which maximizes the likelihood function Pr(data |  ). This is the ML estimate.

3 3 Constructing the Likelihood function L ijm = Maternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i. First, we need to determine the variables that describe the problem. There are many possible choices. Some variables we can observe and some we cannot. X ij = Unordered allele pair at locus i of person j. The values are pairs of i th -locus alleles (l i,l’ i ). L ijf = Paternal allele at locus i of person j. The values of this variables are the possible alleles l i at locus i (Same as for L ijm ). As a starting point, We assume that the data consists of an assignment to a subset of the variables {X ij }. In other words some (or all) persons are genotyped at some (or all) loci.

4 4 What is the relationships among the variables for a specific individual ? L 11f L 11m X 11 Paternal allele at locus 1 of person 1 Unordered allele pair at locus 1 of person 1 = data Maternal allele at locus 1 of person 1 P(L 11m = a) is the frequency of allele a. We use lower case letters for states writing, in short, P(l 11m ). P(x 11 | l 11m, l 11f ) = 0 or 1 depending on consistency

5 5 What is the relationships among the variables across individuals ? L 11f L 11m L 13m X 11 P(l 13m | l 11m, l 11f ) = 1/2 if l 13m = l 11m or l 13m = l 11f P(l 13m | l 11m, l 11f ) = 0 otherwise L 12f L 12m L 13f X 12 X 13 First attempt: correct but not efficient as we shall see. Mother Father Offspring

6 6 Probabilistic model for two loci L 11f L 11m L 13m X 11 L 12f L 12m L 13f X 12 X 13 Model for locus 1 L 21f L 21m L 23m X 21 L 22f L 22m L 23f X 22 X 23 Model for locus 2 L 23m depends on whether L 13m got the value from L 11m or L 11f, whether a recombination occurred, and on the values of L 21m and L 21f. This is quite complex.

7 7 Adding a selector variable L 11f L 11m L 13m X 11 S 13m Selector of maternal allele at locus 1 of person 3 Maternal allele at locus 1 of person 3 (offspring) Selector variables S ijm are 0 or 1 depending on whose allele is transmitted to offspring i at maternal locus j. P(s 13m ) = ½ P(l 13m | l 11m, l 11f,,S 13m =0) = 1 if l 13m = l 11m P(l 13m | l 11m, l 11f,,S 13m =1) = 1 if l 13m = l 11f P(l 13m | l 11m, l 11f,,s 13m ) = 0 otherwise

8 8 Probabilistic model for two loci S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13 Model for locus 1 S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2

9 9 Probabilistic Model for Recombination S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13  is the recombination fraction between loci 2 & 1.

10 10 Constructing the likelihood function I P(l 11m, l 11f,, x 11, s 13m,l 13m ) = P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) P(s 13m ) P(l 13m | s 13m, l 11m, l 11f ) Joint probability Prob(data) = P(x 11 ) =  l11m  l11f  s13m  l13m P(l 11m, l 11f,, x 11, s 13m,l 13m ) Probability of data (sum over all states of all hidden variables) All other variables are not-observed (hidden) Observed variable S 13m L 11f L 11m L 13m X 11

11 11 Constructing the likelihood function II = P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) … P(s 13m ) P(s 13f ) P(s 23m | s 13m,  ) P(s 23m | s 13m,  ) P(l 11m,l 11f,x 11,l 12m,l 12f,x 12,l 13m,l 13f,x 13, l 21m,l 21f,x 21,l 22m,l 22f,x 22,l 23m,l 23f,x 23, s 13m,s 13f,s 23m,s 23f,  ) = Product over all local probability tables Prob(data|  2 ) = P(x 11, x 12, x 13, x 21, x 22, x 23 ) = Probability of data (sum over all states of all hidden variables) Prob(data|  2 ) = P(x 11, x 12, x 13, x 21, x 22, x 23 ) =  l11m, l11f … s23f [ P(l 11m ) P(l 11f ) P(x 11 | l 11m, l 11f, ) … P(s 13m ) P(s 13f ) P(s 23m | s 13m,  ) P(s 23m | s 13m,  ) ] The result is a function of the recombination fraction. The ML estimate is the  value that maximizes this function.

12 12 The Disease Locus I L 11f L 11m L 13m X 11 S 13m Phenotype variables Y ij are 0 or 1 depending on whether a phenotypic trait associated with locus i of person j is observed. E.g., sick versus healthy. For example model of perfect recessive disease yields the penetrance probabilities: P(y 11 = sick | X 11 = (a,a)) = 1 P(y 11 = sick | X 11 = (A,a)) = 0 P(y 11 = sick | X 11 = (A,A)) = 0 Y 11

13 13 The Disease Locus II L 11f L 11m L 13m X 11 S 13m Note that in this model we assume the phenotype/disease depends only on the alleles of one locus. Also we did not model levels of sickness. Y 11

14 14 Introducing a tentative disease Locus S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 S 13m L 11f L 11m L 13m X 11 S 13f L 12f L 12m L 13f X 12 X 13 The recombination fraction  is unknown. Finding it can help determine whether a gene causing the disease lies in the vicinity of the marker locus. Disease locus: assume sick means x ij =(a,a) Marker locus Y 22 Y 21 Y 23

15 15 Locus-by-Locus Summation order Sum over locus i vars before summing over locus i+1 vars Sum over orange vars (L ijt ) before summing selector vars (S ijt ). This order yields a Hidden Markov Model (HMM).

16 16 Hidden Markov Models in General Application in communication: message sent is (s 1,…,s m ) but we receive (r 1,…,r m ). Compute what is the most likely message sent ? Application in speech recognition: word said is (s 1,…,s m ) but we recorded (r 1,…,r m ). Compute what is the most likely word said ? Application in Genetic linkage analysis: to be discussed now. X1X1 X2X3Xi-1XiXi+1R1R1 R2R2 R3R3 R i-1 RiRi R i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 Which depicts the factorization:

17 17 Hidden Markov Model In our case X1X1 X2X3Xi-1XiXi+1 X1X1 X2X2 X3X3 Y i-1 XiXi X i+1 X1X1 X2X3Xi-1XiXi+1 S1S1 S2S2 S3S3 S i-1 SiSi S i+1 The compounded variable S i = (S i,1,m,…,S i,2n,f ) is called the inheritance vector. It has 2 2n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable X i = (X i,1,m,…,X i,2n,f ) is the data regarding locus i. Similarly for the disease locus we use Y i. To specify the HMM we need to write down the transition matrices from S i-1 to S i and the matrices P(x i |S i ). Note that these quantities have already been implicitly defined.

18 18 The transition matrix Recall that: Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 2 2n  2 2n, encoding four options of recombination/non-recombination for the two parental meiosis: (The Kronecker product) For n non-founders, the transition matrix is the n-fold Kronecker product:

19 19 Probability of data in one locus given an inheritance vector S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2 P(x 21, x 22, x 23 |s 23m,s 23f ) = =  P(l 21m ) P(l 21f ) P(l 22m ) P(l 22f ) P(x 21 | l 21m, l 21f ) P(x 22 | l 22m, l 22f ) P(x 23 | l 23m, l 23f ) P(l 23m | l 21m, l 21f, S 23m ) P(l 23f | l 22m, l 22f, S 23f ) l 21m,l 21f,l 22m,l 22f l 22m,l 22f The five last terms are always zero-or-one, namely, indicator functions.


Download ppt ". Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger."

Similar presentations


Ads by Google