Download presentation
Presentation is loading. Please wait.
Published byDana Dawson Modified over 9 years ago
1
1 How many genes? Mapping mouse traits, cont. Lecture 3, Statistics 246 January 27, 2004
2
2 Inferring linkage and mapping markers We now turn to deciding when two marker loci are linked, and if so, estimating the map distance between them. Then we go on and create a full (marker) map of each chromosome, relative to which we can map trait genes. With these preliminaries completed, we can map trait loci.
3
3 The LOD score Suppose that we have two marker loci, and we don’t know whether or not they are linked. A natural way to address this question is to carry out a formal test of the null hypothesis H: r=1/2 against the alternative K: r< 1/2, using the marker data from our cross. The test statistic almost always used in this context is log 10 of the ratio of the likelihood at the maximum likelihood estimate to that at the null, r=1/2, i.e.
4
4 Calculating the LOD score Recall that the (log) likelihood here is based on the multinomial distribution for the allocation of n=132 intercross mice into their nine 2-locus genotypic categories. As we saw earlier, it can be written and so we take the difference between this function evaluated at and at r=1/2, which is where q i is 1/16, 1/8 or 1/4, depending on i.
5
5 Null probabilities of 2-locus genotypes L1 L2AHB A1/161/81/16 H1/81/41/8 B1/161/81/16 This is just putting r = 1/2 in an earlier table. Exercise: Suggest some different test statistics to discriminate between the null H and the alternative K. How do they perform in comparison to the LOD?
6
6 Using the LOD score Normal statistical practice would have us setting a type 1 error in a given context (cross, sample size), and determining the cut-off for the LOD which would achieve approximately the desired error under the null hypothesis. This approach is rarely adopted in genetics, where tradition dictates the use of more stringent thresholds, which take into a account the multiple testing common on linkage mapping. It was originally motivated by a Bayesian argument, and in fact, Bayesian approaches to linkage analysis are increasingly popular. Let us use of Bayes’ formula in the form log 10 posterior odds = log 10 prior odds + LOD, where the odds are for linkage. With 20 chromosomes, which we might assume approx the same size, and not too long, the prior probability of two random loci being on the same chromosome and hence linked, is about 1/20. In order to overcome these prior odds against linkage, and achieve reasonable posterior odds, say 100:1, we would want a LOD of at least 3.
7
7 Linkage groups And so it has come to pass that a LOD must be >3 to get people’s attention. We’ll be a little more precise later. The next step is to define what are called linkage groups. These partition the markers into classes, every pair of markers being either closely linked (i.e. r 0), or being connected by a chain of markers, each consecutive pair of which is closely linked. In practice, we might define closely linked to be something like a) c 2, where e.g. c 1 = 0.2, c 2 = 3.
8
8 Forming linkage groups, cont. When one tries to form linkage groups, it is not unusual to have to vary c 1 and c 2 a little, until all markers fall into a group of more than just one marker. When this is done, it is hoped that the linkage groups correspond to chromosomes. If the chromosome number of the species is known, and that coincides with the number of linkage groups, this is a reasonable presumption. But much can happen to dash this hope: one may have two linkage groups corresponding to different arms of the same chromosome, and not know that; one can have a marker at the end of one chromosome “linked” to a marker at the end of another chromosome, though this should be rare if there is plenty of data; and so on.
9
9 Ordering linkage groups Next we want to order the markers in a linkage group( ideally, on a chromosome). How do we do that? An initial ordering can be done by starting one of the markers, M 1 say, on the most distant pair, here distance being recombination fraction, or map distance. Call M 2 the closest marker to M 1 and continue in this way. Now we want to confirm our ordering. One way is to calculate a (maximized) log likelihood for every ordering, and select the one with the largest log likelihood. But if we have (say) 11 markers on a chromosome, this is 11! = 4 10 7 orders. What people often do is take moving k-tuples of markers, and optimize the order of each, e.g. with k = 3 or 4. Whichever strategy one adopts, multi (i.e. >2) locus methods are needed.
10
10 Likelihoods for 3-locus data Suppose that we have 3 markers M 1, M 2 and M 3 in that order. How do we calculate the log likelihood of the associated 3-locus marker data from our intercross? Recalling the discussion preceding the Punnett square of the last lecture, the parental haplotypes here are a 1 a 2 a 3 and b 1 b 2 b 3 while are would no fewer than 6 forms of recombinant haplotypes: the four single recombinants a 1 a 2 b 3, a 1 b 2 b 3, b 1 b 2 a 3 and b 1 a 2 a 3, and the two double recombinants a 1 b 2 a 3 and b 1 a 2 b 3. Proceeding as before, we calculate the probability of each of these in terms of the recombination fractions r 1 and r 2 across intervals M 1 -M 2, and M 2 -M 3, respectively. For simplicity, we assume the Poisson model, with independence of recombination across disjoint intervals. For example, a 1 a 2 a 3 would have probability (1- r 1 )(1- r 2 )/4, a 1 a 2 b 3 would have probability (1- r 1 )r 2 /4, while a 1 b 2 a 3 would have probability r 1 r 2. We would do this for every one of the 8 paternal and 8 maternal haplotypes, and then collect them up to assign probabilities for each of the 3 3 3-locus genotypes (AAA, AAH, …, BBB), and maximize the multinomial likelihood in the parameters r 1 and r 2. This is just as in the 2-locus case.
11
11 Multilocus linkage: #loci >3 It should have become clear by now that the strategy just outlined is not going to work too easily when there are (say) 11 loci in a linkage group. In that case, haplotypes are strings of the form a 1 a 2 b 3 … a 10 b 11, where there are just 2 parental and 2 10 -2 distinct recombinant haplotypes. The number of parental haplotype combinations is the square of this number, and they must be mapped into 3 11 11-locus genotypes, and a multinomial MLE carried out to estimate 10 recombination fractions. What can be done? In 1987 the first large scale human genetic map was published, and at the same time a new algorithm was announced for both human pedigrees and experimental crosses, such as our intercross. This algorithm made use of hidden Markov models, and for the first time allowed full likelihood calculations in our current context without the exponential blow-up just described.
12
12 Multilocus mapping: no details I’m not going to cover this topic in detail this year, as I discussed it a few years ago, and those interested can read it there: www.stat.berkeley.edu/users/terry/Classes/s260.1998/index.html We will meet hidden Markov models again pretty soon, as they are have become a common feature of statistical genetics and computational biology since the early 1980s. Now suppose that we have ordered our marker loci as just described, either by maximizing the likelihood within linkage groups over all orders, or by doing so in moving windows of size 3-5. How do we look at the result?
13
13 Checking the map, after removal of bad markers est.rf, plot.rf (from an R package) Top triangle is a transform of the recombination fraction, namely -4(1+log 2 r ). Bottom triangle contains the LOD scores at the maximum likelihood estimate of recombination fraction. Notice the “bad” bits in the top LH and bottom RH corners.
14
14 Checking existing genetic maps As indicated earlier, the markers in our cross came from MIT, and they were already mapped. Most researchers would simply use the pre-existing map, as this would usually (but not always) be based on many more recombinations than could be expected in a single cross. Why might we not just do the same? Well, existing maps are rarely completely error-free, and one should always look at one’s own data. An added benefit of looking at one’s own data in relation to an existing map is that this should bring to light markers with a large numbers of genotyping errors, assuming the map is correct.
15
15 Interplay between error detection and maps Genotyping errors in mouse crosses can usually only be detected with the appearance of unusual numbers of close recombination events This depends entirely on the quality of the map The availability of the mouse genome sequence allows us to check genetic maps against the physical maps: we locate the (unique) PCR primers for our microsatellite markers. This has brought a new era in quality of maps (includes human genetic maps!). The next slide depicts the genetic map we used.
16
16 Locations of our markers After a commercial, we move on to mapping coat color genes.
17
17 R
18
18 R/qtl Authors: Karl Broman, Hao Wu, Gary Churchill, Saunak Sen, & Brian Yandell
19
19 Benefits of using R/qtl Lots of graphics Good error detection with accompanying graphics Single and two qtl mapping (and interaction terms) Choice of several input formats –Includes Mapmaker format Many alternatives for mapping methods Many different models for phenotypes, e.g. standard normal, nonparametric model, binary traits
20
20 Why map coat color genes in our C57/BL6 x NOD F 2 intercross? the locations of these genes are known even with a modest number of mice we should be able to map these genes easily it is a useful check that everything is as it should be with our data and finally, it is a good exercise for us. Exercise. Look up the agouti and albino loci at the Mouse Genome Informatics database.
21
21 Recall our earlier Punnett square
22
22 Segregation data at a “random” marker Phenotype by genotype at D12Mit51 (complete data only) A B H Agouti 19 18 35 Black 8 3 18 White 9 7 12
23
23 Mapping a segregating trait We turn now to mapping the two coat color genes segregating in our cross, beginning with the albino locus, and then the agouti locus. To do so, we need a genetic model, that is, we need to know or guess the relation between genotypes at our trait loci and phenotypes, which is embodied in the notion of a penetrance function. Looking at the preceding table, the albino trait segregates just as though governed by a recessive gene, so we postulate a locus with a recessive and a dominant allele for it. Although this is not precisely the case for the non-agouti trait, it is almost, and we do likewise. Later we will consider their interaction.
24
24 Probabilities of albino-marker genotypes ( 4) Recall that the NOD mouse (A) is homozygous for the albino allele, while the C57/BL6 (B) is homozygous for the non-albino allele. We can collapse an earlier table to get: Colour MAHB Albino(1-r) 2 2r(1-r)r2r2 Full color1-(1-r) 2 2 - 2r(1-r)1-r 2 Here r is the rec. fr. between a marker and the albino locus.
25
25 Segregation data at the marker closest to Tyr c Phenotype by genotype at D7Mit126 @ 50 cM (the Tyr c locus is at 44 cM) A B H Agouti 3 19 47 Black 0 10 19 White 21 0 1
26
26 Plot of LOD score at each marker along the genome Mapping the albino locus
27
27 Chromosome 7 genotypes for the albino mice. Pale blue shading is conserved NOD haplotype. D7Mit128 is near the Tyr c locus, A: homozygous NOD, B: homozygous B6, H: heterozygote. Genotypes are read down.
28
28 Honesty in advertising, and LOD thresholds There is more material in preparation here. Please revisit this space in a day or so.
29
29 Approximate probabilities of agouti-marker genotypes ( 4) Recall that the C57/BL6 (B) is homozygous for non-agouti, while the NOD (A) is homozygous agouti. Ignoring the 1/16 of the intercross who would exhibit the non-agouti trait (and be black) if they weren’t albino, we get the following approximate table, where 1/16 of the mice will be misclassified. Here r is the recombination fraction between a marker and the agouti locus. Colour MAHB Non-black1-r 2 2-2r(1-r)1- (1-r) 2 Blackr2r2 2r(1-r)(1-r) 2
30
30 Segregation data at the marker closest to the agouti locus Phenotype by genotype at D2Mit48 @ 87 cM (agouti locus is at 89 cM) A B H Agouti 24 2 46 Black 0 28 1 White 5 6 14
31
31 Mapping the agouti locus Plot of LOD score at each marker along the genome
32
32 Chromosome 2 genotypes for the black progeny. Mauve shading indicates conserved C57/BL6 haplotype. Marker D2Mit48 is very close to the agouti locus.
33
33 Conclusion: single locus mapping agouti locus (A,a alleles) on Chr 2 at 89.9 cM albino locus (C,c alleles) on Chr 7 at 44 cM (now known as Tyr c gene) In the data set: –at 89 cM on Chr 2 with a LOD score > 20 Marker D2M48 (8 th marker on Chr 2) –at 43 cM on Chr 7 with a LOD score > 20 Marker D7M126 (4 th marker on Chr 7) The method worked for agouti, even though 1/16th of the mice were misclassified
34
34 Acknowledgement These last 3 lectures would not have been possible without the very substantial input of Melanie Bahlo and Tom Brodnicki of the Walter & Eliza Hall Institute of Medical Research, Melbourne Australia. Tom (together with people from the WEHI mouse facility) carried out the cross, and did all the phenotyping, while Melanie did all the data analysis presented, and contributed a lot to the presentation. Overall, responsibility for the presentation (especially all the errors!) remains mine.
35
35 General exercise Go through the last 3 lectures and redo all the calculations as you can for the case of a backcross rather than an intercross. You will find it all simpler, and in every case, closed form expressions appear, where we needed iterative methods for the intercross.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.