Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

The genetic dissection of complex traits
Planning breeding programs for impact
Experimental crosses. Inbred Strain Cross Backcross.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
QTL Mapping R. M. Sundaram.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Piyaporn Phansak 1, Watcharin Soonsuwan 1, James E. Specht 1, George L. Graef 1, Perry B. Cregan 2, and David L. Hyten 2 1 Department of Agronomy and Horticulture,
Lecture 9: QTL Mapping I:
Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.
Quantitative Genetics
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
What is a QTL? What are QTL?. Current methods for QTL  Single Marker Methods ( Student, 17?? )  t-tests  Interval Mapping Method (Lander and Botstein,
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
From QTL to QTG: Are we getting closer? Sagiv Shifman and Ariel Darvasi The Hebrew University of Jerusalem.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 How many genes? Mapping mouse traits Lecture 1, Statistics 246 January 20, 2004.
QTL Cartographer A Program Package for finding Quantitative Trait Loci C. J. Basten Z.-B. Zeng and B. S. Weir.
Association between genotype and phenotype
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
The genomes of recombinant inbred lines
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
What is a QTL? Quantitative trait locus (loci) Region of chromosome that contributes to variation in a quantitative trait Generally used to study “complex.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
1 QTL mapping in mice, completed. Lecture 12, Statistics 246 March 2, 2004.
High resolution QTL mapping in genotypically selected samples from experimental crosses Selective mapping (Fig. 1) is an experimental design strategy for.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Identifying QTLs in experimental crosses
Complex Trait Genetics in Animal Models
The genomes of recombinant inbred lines
upstream vs. ORF binding and gene expression?
Statistical Methods for Quantitative Trait Loci (QTL) Mapping
Genome Wide Association Studies using SNP
Recombination and Linkage
Gene mapping in mice Karl W Broman Department of Biostatistics
The breakpoint process on RI chromosomes
The breakpoint process on RI chromosomes
Statistical issues in QTL mapping in mice
Mapping Quantitative Trait Loci
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
The genomes of recombinant inbred lines
Inferring Genetic Architecture of Complex Biological Processes Brian S
Lecture 9: QTL Mapping II: Outbred Populations
Super computing for “classical” genetics
The genomes of recombinant inbred lines
Quantitative Trait Locus (QTL) Mapping
Presentation transcript:

Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms

2 Goal Identify genes that contribute to common human diseases.

3 Inbred mice

4 Advantages of the mouse Small and cheap Inbred lines Large, controlled crosses Experimental interventions Knock-outs and knock-ins

5 The mouse as a model Same genes? –The genes involved in a phenotype in the mouse may also be involved in similar phenotypes in the human. Similar complexity? –The complexity of the etiology underlying a mouse phenotype provides some indication of the complexity of similar human phenotypes. Transfer of statistical methods. –The statistical methods developed for gene mapping in the mouse serve as a basis for similar methods applicable in direct human studies.

6 The intercross

7 The data Phenotypes, y i Genotypes, x ij = AA/AB/BB, at genetic markers A genetic map, giving the locations of the markers.

8 Phenotypes 133 females (NOD  B6)  (NOD  B6)

9 NOD

10 C57BL/6

11 Agouti coat

12 Genetic map

13 Genotype data

14 Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs.

15 Statistical structure Missing data: markers  QTL Model selection: genotypes  phenotype

16 Models: recombination No crossover interference –Locations of breakpoints according to a Poisson process. –Genotypes along chromosome follow a Markov chain. Clearly wrong, but super convenient.

17 Models: gen  phe Phenotype = y, whole-genome genotype = g Imagine that p sites are all that matter. E(y | g) =  (g 1,…,g p )SD(y | g) =  (g 1,…,g p ) Simplifying assumptions: SD(y | g) = , independent of g y | g ~ normal(  (g 1,…,g p ),  )  (g 1,…,g p ) =  + ∑  j 1{g j = AB} +  j 1{g j = BB}

18 Before you do anything… Check data quality Genetic markers on the correct chromosomes Markers in the correct order Identify and resolve likely errors in the genotype data

19 The simplest method “Marker regression” Consider a single marker Split mice into groups according to their genotype at a marker Do an ANOVA (or t-test) Repeat for each marker

20 Marker regression Advantages +Simple +Easily incorporates covariates +Easily extended to more complex models Disadvantages –Must exclude individuals with missing genotypes data –Imperfect information about QTL location –Suffers in low density scans –Only considers one QTL at a time

21 Interval mapping Lander and Botstein 1989 Imagine that there is a single QTL, at position z. Let q i = genotype of mouse i at the QTL, and assume y i | q i ~ normal(  (q i ),  ) We won’t know q i, but we can calculate (by an HMM) p ig = Pr(q i = g | marker data) y i, given the marker data, follows a mixture of normal distributions with known mixing proportions (the p ig ). Use an EM algorithm to get MLEs of  = (  AA,  AB,  BB,  ). Measure the evidence for a QTL via the LOD score, which is the log 10 likelihood ratio comparing the hypothesis of a single QTL at position z to the hypothesis of no QTL anywhere.

22 Interval mapping Advantages +Takes proper account of missing data +Allows examination of positions between markers +Gives improved estimates of QTL effects +Provides pretty graphs Disadvantages –Increased computation time –Requires specialized software –Difficult to generalize –Only considers one QTL at a time

23 LOD curves

24 LOD thresholds To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. The 95th percentile of this distribution is used as a significance threshold. Such a threshold may be estimated via permutations (Churchill and Doerge 1994).

25 Permutation test Shuffle the phenotypes relative to the genotypes. Calculate M* = max LOD*, with the shuffled data. Repeat many times. LOD threshold = 95th percentile of M* P-value = Pr(M* ≥ M)

26 Permutation distribution

27 Chr 9 and 11

28 Epistasis

29 Going after multiple QTLs Greater ability to detect QTLs. Separate linked QTLs. Learn about interactions between QTLs (epistasis).

30 Multiple QTL mapping Simplistic but illustrative situation: –No missing genotype data –Dense markers (so ignore positions between markers) –No gene-gene interactions Which  j  0?  Model selection in regression

31 Model selection Choose a class of models –Additive; pairwise interactions; regression trees Fit a model (allow for missing genotype data) –Linear regression; ML via EM; Bayes via MCMC Search model space –Forward/backward/stepwise selection; MCMC Compare models –BIC  (  ) = log L(  ) + (  /2) |  | log n Miss important loci  include extraneous loci.

32 Special features Relationship among the covariates Missing covariate information Identify the key players vs. minimize prediction error

33 Opportunities for improvements Each individual is unique. –Must genotype each mouse. –Unable to obtain multiple invasive phenotypes (e.g., in multiple environmental conditions) on the same genotype. Relatively low mapping precision.  Design a set of inbred mouse strains. –Genotype once. –Study multiple phenotypes on the same genotype.

34 Recombinant inbred lines

35 AXB/BXA panel

36 AXB/BXA panel

37 LOD curves

38 Chr 7 and 19

39 Pairwise recombination fractions Upper-tri: rec. fracs. Lower-tri: lik. ratios Red = association Blue = no association

40 RI lines Advantages Each strain is a eternal resource. –Only need to genotype once. –Reduce individual variation by phenotyping multiple individuals from each strain. –Study multiple phenotypes on the same genotype. Greater mapping precision. Disadvantages Time and expense. Available panels are generally too small (10-30 lines). Can learn only about 2 particular alleles. All individuals homozygous.

41 The RIX design

42 The “Collaborative Cross”

43 Genome of an 8-way RI

44 The “Collaborative Cross” Advantages Great mapping precision. Eternal resource. –Genotype only once. –Study multiple invasive phenotypes on the same genotype. Barriers Advantages not widely appreciated. –Ask one question at a time, or Ask many questions at once? Time. Expense. Requires large-scale collaboration.

45 To be worked out Breakpoint process along an 8-way RI chromosome. Reconstruction of genotypes given multipoint marker data. QTL analyses. –Mixed models, with random effects for strains and genotypes/alleles. Power and precision (relative to an intercross).

46 Haldane & Waddington 1931 r = recombination fraction per meiosis between two loci Autosomes Pr(G 1 =AA) = Pr(G 1 =BB) = 1/2 Pr(G 2 =BB | G 1 =AA) = Pr(G 2 =AA | G 1 =BB) = 4r / (1+6r) X chromosome Pr(G 1 =AA) = 2/3Pr(G 1 =BB) = 1/3 Pr(G 2 =BB | G 1 =AA) = 2r / (1+4r) Pr(G 2 =AA | G 1 =BB) = 4r / (1+4r) Pr(G 2  G 1 ) = (8/3) r / (1+4r)

47 8-way RILs Autosomes Pr(G 1 = i) = 1/8 Pr(G 2 = j | G 1 = i) = r / (1+6r) for i  j Pr(G 2  G 1 ) = 7r / (1+6r) X chromosome Pr(G 1 =AA) = Pr(G 1 =BB) = Pr(G 1 =EE) = Pr(G 1 =FF) =1/6 Pr(G 1 =CC) = 1/3 Pr(G 2 =AA | G 1 =CC) = r / (1+4r) Pr(G 2 =CC | G 1 =AA) = 2r / (1+4r) Pr(G 2 =BB | G 1 =AA) = r / (1+4r) Pr(G 2  G 1 ) = (14/3) r / (1+4r)

48 Areas for research Model selection procedures for QTL mapping Gene expression microarrays + QTL mapping Combining multiple crosses Association analysis: mapping across mouse strains Analysis of multi-way recombinant inbred lines

49 References Broman KW (2001) Review of statistical methods for QTL mapping in experimental crosses. Lab Animal 30:44–52 Jansen RC (2001) Quantitative trait loci in inbred lines. In Balding DJ et al., Handbook of statistical genetics, Wiley, New York, pp 567–597 Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185 – 199 Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971 Kruglyak L, Lander ES (1995) A nonparametric approach for mapping quantitative trait loci. Genetics 139: Broman KW (2003) Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163:1169–1175 Miller AJ (2002) Subset selection in regression, 2nd edition. Chapman & Hall, New York

50 More references Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J R Statist Soc B 64: , Zeng Z-B, Kao C-H, Basten CJ (1999) Estimating the genetic architecture of quantitative traits. Genet Res 74: Mott R, Talbot CJ, Turri MG, Collins AC, Flint J (2000) A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci U S A 97: Mott R, Flint J (2002) Simultaneous detection and fine mapping of quantitative trait loci in mice using heterogeneous stocks. Genetics 160: The Complex Trait Consortium (2004) The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nature Genetics 36: Broman KW. The genomes of recombinant inbred lines. Genetics, in press

51 Software R/qtl Mapmaker/QTL Mapmanager QTX QTL Cartographer Multimapper