Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.

Similar presentations


Presentation on theme: "1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004."— Presentation transcript:

1 1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004

2 2 Let’s estimate the recombination fraction r between D12Mit51 and D12Mit132 132 51 AH BTotal A2610 036 H1046965 B0 52328 Total366132129 2-locus genotypes at D12Mit51 and D12Mit132. 129 offspring from H  H, where A  B  H.

3 3 Estimation of r First note that we can’t simply count recombinants. Why? Because recombination can occur in the paternal or the maternal meiosis, or both, and all we see are the genotypes of the offspring. In most cases, the parental origin of the recombination can be inferred, but not in every case. Denoting the two markers by 1 and 2, the NOD alleles by a, and B6 alleles by b, then the parental haplotypes are a 1 a 2 on one chromosome, and b 1 b 2 on the other. Each parent passes on a 1 a 2 with probability(1-r)/2, and similarly for b 1 b 2, while they pass on each of the recombinant haplotypes a 1 b 2 and b 1 a 2 with probability r/2. In practice, recombinations have slightly different frequencies in male and female meioses, but we ignore this refinement.

4 4 Probabilities of parentally transmitted haplotype combinations (  4) Haplotype combinations resulting from crossing doubly heterozygous parents, each a 1 /b 1 at locus 1 and a 2 /b 2 at locus 2. This table is for coupling: the parental haplotypes are a 1 a 2 and b 1 b 2, i.e. the mother and father are both a 1 a 2 /b 1 b 2. Here P and M denote the Paternally and Maternally transmitted haplotypes, respectively. P Ma1a2a1a2 a1b2a1b2 b1a2b1a2 b1b2b1b2 a1a2a1a2 (1-r) 2 r(1-r) (1-r) 2 a1b2a1b2 r(1-r) r 2 r(1-r) b1a2b1a2 r 2 r(1-r) b1b2b1b2 (1-r) 2 r(1-r) (1-r) 2

5 5 From the Punnett square to the table of 2-locus genotype probabilities Terms in the Punnett square table can be summed to build up a table of probabilities for the 9 different 2-locus genotype probabilities. For example, we observe A (=a 1 /a 1 ) at locus 1 and H (=a 2 /b 2 ) at locus 2, if and only if the transmitted male and female haplotypes are the pairs a 1 a 2 &a 1 b 2 or a 1 b 2 &a 1 a 2, and this occurs with a combined probability of 2r(1-r)/4. The other terms are built up similarly, the most complex case being the 2-locus genotype HH, where 4 different terms need to be considered, corresponding to the fact that a double heterozygote can result from 4 different combinations of parental or recombinant haplotypes.

6 6 Probabilities of 2-locus genotypes (  4) L1 L2AHB A(1-r) 2 2r(1-r)r2r2 H 2[r 2 +(1-r) 2 ]2r(1-r) Br2r2 (1-r) 2 Looking at this table, we see that recombinations (or not) can be inferred, apart from the parent, in all but the HH case. We can almost count recombinants.

7 7 Estimation of r, cont. Using the table of probabilities we can write down a log likelihood function for any set of 2-locus frequencies. Label the cells of the table 1,…,9, and denote the corresponding probabilities by p 1 (r) …,.p 9 (r), and the frequencies by n 1, …, n 9. Then the log-likelihood for the resulting multinomial model is log L =  i n i log p i (r). The parameter r is then estimated by maximizing this function, and an approximate standard error or confidence interval obtained using the Fisher information or the asymptotic chi-square approximation.

8 8 A frill: the M-step of an EM-algorithm The function log L(r) can be maximized in a number of ways, but in general there is no closed form expression for the maximum likelihood estimate r^. If we were able to decompose the count n 5 of HHs into the n 5 P that are pairs of parental haplotypes, and n 5 R that are pairs of recombinant haplotypes, with frequencies (1-r) 2 and r 2, resp, the recombinant haplotypes can then be counted directly and the MLE is = 2(n 3 + n 7 + n 5 R )+ n 2 + n 4 + n 6 + n 8 )/2n.

9 9 The E-step In general we don’t know n 5 R but can estimate it using the following formula: In practice, we need a value of r to begin with. Next we use the above estimate, then get the next, and then iterate. Exercise: Prove the above formula, and that the iteration is an instance of the EM-algorithm.

10 10 2-locus genotype frequencies for D12Mit132 and D13Mit6 132 | 6 AH BTotal A1021 7 38 H152917 61 B 521 6 32 Total307130131 Exercise: Estimate r for these two loci. Is it different from 1/2?


Download ppt "1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004."

Similar presentations


Ads by Google