The genomes of recombinant inbred lines

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Basics of Linkage Analysis
Meiosis and Genetic Variation
QTL Mapping R. M. Sundaram.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
A Tale of Two Fishes Delving into genetic inheritance Continue here.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
The genomes of recombinant inbred lines
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Finding a gene based on phenotype Model organisms ’s of DNA markers mapped onto each chromosome – high density linkage map. 2. identify markers linked.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Sequences Lecture 11. L62 Sequences Sequences are a way of ordering lists of objects. Java arrays are a type of sequence of finite size. Usually, mathematical.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Genetic Algorithms 11-Oct-17.
Year 9 Mathematics Algebra and Sequences
Introduction to Genetic Algorithms
Ch 11.6: Series of Orthogonal Functions: Mean Convergence
Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
Do we really need theory?
The Stream Model Sliding Windows Counting 1’s
The genomes of recombinant inbred lines
3. Genetic Drift The change in allele frequencies as a result of chance processes. Directly related to the population numbers. These changes are much more.
upstream vs. ORF binding and gene expression?
Genetic Variation Notes
Quantitative traits Lecture 13 By Ms. Shumaila Azam
Intro to Punnett Squares
Objective: I Can……. A) explain the differences between dominant and recessive traits. B) explain the differences between phenotypes and.
Modern Synthesis concepts from Laboratory Genetics
Gene mapping in mice Karl W Broman Department of Biostatistics
Dihybrid Crosses and Linked Genes
Recombination (Crossing Over)
The genomes of recombinant inbred lines
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Bellwork: What indicates that a population is evolving
Heredity Chapter 5.
Hidden Markov Models Part 2: Algorithms
The breakpoint process on RI chromosomes
The breakpoint process on RI chromosomes
Statistical issues in QTL mapping in mice
Conclusions of Hardy-Weinberg Law
Mapping Quantitative Trait Loci
How to Solve Linkage Map Problems
The genomes of recombinant inbred lines
Genes and Variation EQ: How is the gene pool affected by selection pressure? Read the lesson title aloud to students.
11.2 Applying Mendel’s Principles
Solving Linear Equations
The F2 Generation  1. F2 Population Mean and Variance (p = q = 0.5) 
Genetic Algorithms 25-Feb-19.
Sequential Steps in Genome Mapping
DIHYBRID CROSSES & GENE LINKAGE
A Gentle introduction Richard P. Simpson
Pedigree Analysis.
Lecture 9: QTL Mapping II: Outbred Populations
Super computing for “classical” genetics
The genomes of recombinant inbred lines
Chapter 7 Beyond alleles: Quantitative Genetics
Genetic Algorithms 26-Apr-19.
Biologically Inspired Computing: Operators for Evolutionary Algorithms
EGR 2131 Unit 12 Synchronous Sequential Circuits
The Inheritance of Spore Color
Lesson Overview 17.1 Genes and Variation.
Modern Synthesis concepts from Laboratory Genetics
A population shares a common gene pool.
Population Genetics: The Hardy-Weinberg Law
Presentation transcript:

The genomes of recombinant inbred lines Obsession of mine from a couple of years ago. Not of any practical use to any of you, but I hope you will find it interesting or at least entertaining. I’m in a school of public health, and so my underlying goal is to improve human health.

Inbred mice Inbred mice: - result of many generations of sibling mating - genetically identical - two genomes identical - reproducible Advantages of mice: cheap, inbred lines, controlled env’t, interventions, proof of cause (e.g. KO)

C57BL/6 To do genetics, you need a second strain that differs from the first. We look at some aspect of how this inbred strain differs from the previous one. (e.g., coat color, or cuteness, or survival time following bacterial infection) Can’t learn anything by just studying at these two strains; we do a cross.

The intercross Explain how to do mapping in intercross Advantages: - simple and easy - extensible Disadvantages: - must genotype each individual - can only do one nasty thing to each mouse

Recombinant inbred lines (by sibling mating) Mapping here just like with intercross Advantages: - reduce individual variation by phenotyping multiple ind’ls from each strain - multiple invasive phenotypes - longitudinal data - greater density of breakpoints Disadv: - time, cost - inadequate panels available - just two alleles, only homozygotes

The RIX design Create heterozygotes; no further genotyping needed

The Collaborative Cross 8-way recombinant inbred lines Hope to obtain 1000 such lines (with initial crosses in varied orders) A boon for mouse geneticists Complex Trait Consortium (2004) Nat Genet 36:1133-1137

Genome of an 8-way RI

The goal (for the rest of this talk) Characterize the breakpoint process along a chromosome in 8-way RILs. Understand the two-point haplotype probabilities. Study the clustering of the breakpoints, as a function of crossover interference in meiosis. Why? We’ll need to reconstruct the pattern of colors along each chromosome in each 8-way RIL. Our data is likely to be genotypes at diallelic markers (such as SNPs).

2 points in an RIL 1 2 r = recombination fraction = probability of a recombination in the interval in a random meiotic product. R = analogous thing for the RIL = probability of different alleles at the two loci on a random RIL chromosome. Start with the simplest problem. One idea I had: If you are A at one point, you’re more (or less) like to be B at the next point than to be H.

Haldane & Waddington 1931 Genetics 16:357-374 Haldane and Waddington solved this for 2-way RILs in 1931. The results are well known to those working with RILs, but the manuscript has not been widely read. You’ll see why in a moment.

Recombinant inbred lines (by selfing) Start with the simplest possible case: selfing. Not possible in mice, but can be done in some plants: Brush the pollen from a plant back onto the female parts of the same plant.

Pr(Xn+1 | X0, X1, …, Xn) = Pr(Xn+1 | Xn) Markov chain Sequence of random variables {X0, X1, X2, …} satisfying Pr(Xn+1 | X0, X1, …, Xn) = Pr(Xn+1 | Xn) Transition probabilities Pij = Pr(Xn+1=j | Xn=i) Here, Xn = “parental type” at generation n We are interested in absorption probabilities Pr(Xn  j | X0)

Equations for selfing 24=16 possible states. Reduce to 10 states by the usual symmetries. Reduce to 5 states by some further symmetries: - switch orientation of chromosome - switch A’s and B’s Here, they give the transition matrix of the Markov chain. Note that the result indicates a 2x map expansion.

Absorption probabilities Let Pij = Pr(Xn+1 = j | Xn = i) where Xn = state at generation n. Consider the case of absorption into the state AA|AA. Let hi = probability, starting at i, eventually absorbed into AA|AA. Then hAA|AA = 1 and hAB|AB = 0. Condition on the first step: hi = ∑k Pik hk For selfing, this gives a system of 3 linear equations.

Recombinant inbred lines (by sibling mating) Now we need to follow 4 chromosomes at each generation. 2 colors at each of 2 points on each of 4 chromosomes gives 2^8 =256 states

Equations for sib-mating Reduce the 256 states to 55 by the obvious symmetries Then to 22 states by switching A’s and B’s and the orientation of the chromosome. Here they give the transition matrix of the Markov chain. A footnote on the first page of the manuscript thanks a funding source for the mathematical typesetting of the paper.

Result for sib-mating Relatively famous equation. Given the previous slide, you may be a surprise to learn that tedious algebra was omitted.

The “Collaborative Cross” We now wish to do the same for the 8-way RILs. 8 possible colors a each of 2 points on each of 4 chromosomes 8^8, which is quite a bit bigger than 2^8

8-way RILs Autosomes X chromosome Pr(G1 = i) = 1/8 Pr(G2 = j | G1 = i) = r / (1+6r) for i  j Pr(G2  G1) = 7r / (1+6r) X chromosome Pr(G1=A) = Pr(G1=B) = Pr(G1=E) = Pr(G1=F) =1/6 Pr(G1=C) = 1/3 Pr(G2=B | G1=A) = r / (1+4r) Pr(G2=C | G1=A) = 2r / (1+4r) Pr(G2=A | G1=C) = r / (1+4r) Pr(G2  G1) = (14/3) r / (1+4r) And here’s the answer. Note the 7x map expansion. Note that the 2-point transition matrix has all off-diagonal elements the same: If you’re A at one point, and you’re to switch to something else, each of the other colors are equally likely. The X chromosome is a bit more complicated (explain on the next slide).

The X chromosome At generation G_2, one intact C chromosome plus recombinant AB and EF chromosomes. So you get 1/3 C and 1/6 each of A, B, E, F. No D, G, or H on X chromosome. The Y chromosome comes from the H strain.

Computer simulations Here’s how I got the results: - simulate the process for various values of r - results form a smooth curve. H&W had 2r/(1+2r) and 4r/(1+6r). Maybe our result is a r / (1 + b r). Use non-linear regression. Rather dissatisfying. But with a combination of perl and mathematica, I’ve since proven these results symbolically.

r  R, autosomes Selfing Sib-mating 2-way 2r / (1+2r) 4r / (1+6r) 2n-way 1 – (1 – r)n-1 / (1+2r) 1 – (1 – r)n-2 / (1+6r) Map expansion for 2^n way: Selfing: (n+1) for n ≥ 1 Sib-mating: (n+4) for n ≥ 2

3-point coincidence 1 3 2 rij = recombination fraction for interval i,j; assume r12 = r23 = r Coincidence = c = Pr(double recombinant) / r2 = Pr(rec’n in 23 | rec’n in 12) / Pr(rec’n in 23) No interference  = 1 Positive interference  < 1 Negative interference  > 1 Generally c is a function of r. But by that point, I was obsessed with the problem, and had to go on to the 3-point case. Crossover interference and the 3-point coincidence.

3-points in 2-way RILs 1 3 2 r13 = 2 r (1 – c r) R = f(r); R13 = f(r13) Pr(double recombinant in RIL) = { R + R – R13 } / 2 Coincidence (in 2-way RIL) = { 2 R – R13 } / { 2 R2 } This is included at the end of Haldane & Waddington (1931) (though I’m sure few readers got to this part; I read it only after rediscovering the trick). With no extra effort, can get the 3-pt coincidence.

Coincidence No interference Note these are entirely above 1, which indicates clustering of breakpoints: If there’s a recombination event in the first interval, recombination in the 2nd interval is more likely. Related to the banding pattern on RIL chromosomes (see slide 5).

Coincidence Even in the case of very strong crossover interference, RILs by sib mating show a clustering of breakpoints.

Why the clustering of breakpoints? The really close breakpoints occur in different generations. Breakpoints in later generations can occur only in regions that are not yet fixed. The regions of heterozygosity are, of course, surrounded by breakpoints.

Coincidence in 8-way RILs The trick that allowed us to get the coincidence for 2-way RILs doesn’t work for 8-way RILs. It’s sufficient to consider 4-way RILs. Calculations for 3 points in 4-way RILs is still astoundingly complex. 2 points in 2-way RILs by sib-mating: 55 parental types  22 states by symmetry 3 points in 4-way RILs by sib-mating: 2,164,240 parental types  137,488 states Even counting the states was difficult. Getting the numbers 2 million & 137k required 1.5 days of computer time.

Coincidence But I did it. Some clustering, but relatively close to 1, especially with high crossover interference. I like smooth curves, so the two green curves each have 250 points. Each point was about 1.5 days of computer time. So that’s two years of computer time. (But done in 3 months using 8 nodes of our Linux cluster.) (I did crash our server twice while trying to find faster approaches, due to some errors of multiple orders of magnitude in my estimates of memory and disk usage. It turns out that the number of files allowed in /tmp is a not-too-large finite number.)

3-point symmetry The 3-point results give us the full distribution of haplotypes, and so allow a further investigation. Recall that if you’re A at one point and you switch to something else at a 2nd point, each of the other 8 colors is equally likely. Suppose the first and third markers are both A, and the middle marker is not A. What’s the chance for each of the other possibilities? Clear assymetry: Tends not to be B; somewhat more likely to be E,F,G,H.

Markov property A more detailed investigation of whether the process is Markov: If you’re A at the middle point, how does information about the first point affect the chance that you’re A at the third point? If a Markov chain, this would be flat at 0.

Markov property If you’re B at the middle point, how does information about the first point affect The chance that you’re A at the third point. If high level of interference, very unlikely to go A-B-A.

Markov property Somewhat more likely to go A-C and then back to A. Unlikely to go B-C and then to A.

Markov property More likely to go A-E and then back to A. Unlikely to go B-C and then to A. Basic results: tend not to see A-B-A; Seem to get E in the midst of a segment of A rather often. (See slide 8.)

Whole genome simulations 2-way selfing, 2-way sib-mating, 8-way sib-mating Mouse-like genome, 1665 cM Strong positive crossover interference Inbreed to complete fixation 10,000 simulation replicates Some things are not amenable to analytic analysis: need simulations how many generations to fixation? How dense will markers need to be?

No. generations to fixation This was a bit of a surprise. Usual answer is 8 generations for selfing and 20 for sib-mating. Need 3 more for 8-way by sib-mating.

No. gen’s to 99% fixation This is probably where the 8 and 20 came from.

Percent genome not fixed

Number of breakpoints Number of breakpoints = 2x, 4x, 7x the genome size.

Segment lengths Spikes = whole chromosomes inherited intact.

Probability a segment is inherited intact

Length of smallest segment 95 % of the time you have a segment that is < 1/4 cM.

No. segments < 1 cM Thus we need very dense genotype data to discover all possible segments.

Summary The Collaborative Cross could provide “one-stop shopping” for gene mapping in the mouse. Use of such 8-way RILs requires an understanding of the breakpoint process. We’ve extended Haldane & Waddington’s results to the case of 8-way RILs: R = 7 r / (1 + 6 r). We’ve shown clustering of breakpoints in RILs by sib- mating, even in the presence of strong crossover interference. Broman KW (2005) The genomes of recombinant inbred lines. Genetics 169:1133-1146