QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.

Slides:



Advertisements
Similar presentations
Planning breeding programs for impact
Advertisements

QTL Mapping in Natural Populations Basic theory for QTL mapping is derived from linkage analysis in controlled crosses There is a group of species in which.
Functional Mapping A statistical model for mapping dynamic genes.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Basics of Linkage Analysis
Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population OR.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Joint Linkage and Linkage Disequilibrium Mapping
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.
QTL Mapping R. M. Sundaram.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Statistical association of genotype and phenotype.
Lecture 9: QTL Mapping I:
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Quantitative Genetics
Maximum likelihood (ML) and likelihood ratio (LR) test
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
What is a QTL? What are QTL?. Current methods for QTL  Single Marker Methods ( Student, 17?? )  t-tests  Interval Mapping Method (Lander and Botstein,
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Gene, Allele, Genotype, and Phenotype
Random Sampling, Point Estimation and Maximum Likelihood.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.
Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Functional Mapping of QTL and Recent Developments
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.
Quantitative genetics Many traits that are important in agriculture, biology and biomedicine are continuous in their phenotypes. For example, Crop Yield.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
(1) Schedule Mar 15Linkage disequilibrium (LD) mapping Mar 17LD mapping Mar 22Guest speaker, Dr Yang Mar 24Overview Attend ENAR Biometrical meeting in.
Identifying QTLs in experimental crosses
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Modern Synthesis concepts from Laboratory Genetics
Gene mapping in mice Karl W Broman Department of Biostatistics
Lecture 7: QTL Mapping I:
Mapping Quantitative Trait Loci
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
Linkage Disequilibrium Mapping - Natural Population
Chapter 14 Inference for Regression
Lecture 9: QTL Mapping II: Outbred Populations
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Modern Synthesis concepts from Laboratory Genetics
Presentation transcript:

QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype

Maize Teosinte tb-1/tb-1 mutant maize

Mapping Quantitative Trait Loci (QTL) in the F2 hybrids between maize and teosinte

Nature 432, (02 December 2004) The role of barren stalk1 in the architecture of maize ANDREA GALLAVOTTI(1,2), QIONG ZHAO(3), JUNKO KYOZUKA(4), ROBERT B. MEELEY(5), MATTHEW K. RITTER1,*, JOHN F. DOEBLEY(3), M. ENRICO PÈ(2) & ROBERT J. SCHMIDT(1) 1 Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, California , USA 2 Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Milan, Italy 3 Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706, USA 4 Graduate School of Agriculture and Life Science, The University of Tokyo, Tokyo , Japan 5 Crop Genetics Research, Pioneer-A DuPont Company, Johnston, Iowa 50131, USA * Present address: Biological Sciences Department, California Polytechnic State University, San Luis Obispo, California 93407, USA

Effects of ba1 mutations on maize development Mutant Wild type No tassel Tassel

A putative QTL affecting height in BC Sam- HeightQTL ple(cm, y) genotype 1184Qq (1) 2185Qq (1) 3180Qq (1) 4182Qq (1) 5167qq (0) 6169qq (0) 7165qq (0) 8166qq (0) If the QTL genotypes are known for each sample, as indicated at the left, then a simple ANOVA can be used to test statistical significance.

Suppose a backcross design ParentQQ (P1) x qq (P2) F1 Qq x qq (P2) BCQq qq Genetic effect a* 0 Genotypic value  +a* 

QTL regression model The phenotypic value for individual i affected by a QTL can be expressed as, y i =  + a* x* i + e i where is the overall mean, x* i is the indicator variable for QTL genotypes, defined as x* i =1 for Qq 0 for qq, a* is the “real” effect of the QTL and e i is the residual error, e i ~ N(0,  2 ). x* i is missing

Data format for a backcross Sam- Height Marker genotype QTL ple(cm, y) M1M2Aaaa 1184Mm (1)Nn (1)½ ½ 2185Mm (1)Nn (1) ½ ½ 3180Mm (1)Nn (1) ½ ½ 4182Mm (1)nn (0) ½ ½ 5167mm (0)nn (1) ½ ½ 6169mm (0)nn (0) ½ ½ 7165mm (0)nn (0) ½ ½ 8166mm (0)Nn (0) ½ ½ Observed dataMissing dataComplete data = +

Two statistical models I - Marker regression model y i =  + ax i + e i where x i is the indicator variable for marker genotypes defined as x i = 1 for Mm 0 for mm, a is the “effect” of the marker (but the marker has no effect. There is the a because of the existence of a putative QTL linked with the marker) e i ~ N(0,  2 )

Heights classified by markers (say marker 1) MarkerSampleSampleSample groupsizemeanvariance Mm n 1 = 4m 1 =182.75s 2 1 = mm n 0 = 4m 0 =166.75s 2 0 =

The hypothesis for the association between the marker and QTL H 0 : m 1 = m 0 H 1 : m 1  m 0 Calculate the test statistic: t = (m 1 –m 0 )/  [s 2 (1/n 1 +1/n 0 )], where s 2 = [(n 1 -1)s 2 1 +(n 0 -1)s 2 0 ]/(n 1 +n 0 –2) Compare t with the critical value t df=n1+n2-2 (0.05) from the t-table. If t > t df=n1+n2-2 (0.05), we reject H 0 at the significance level 0.05  there is a QTL If t < t df=n1+n2-2 (0.05), we accept H 0 at the significance level 0.05  there is no QTL

Why can the t-test probe a QTL? Assume a backcross with two genes, one marker (alleles M and m) and one QTL (allele Q and q). These two genes are linked with the recombination fraction of r. MmQqMmqqmmQqmmqq Frequency(1-r)/2r/2r/2(1-r)/2 Mean effectm+amm+am Mean of marker genotype Mm: m 1 = (1-r)/2 (m+a) + r/2 m = m + (1-r)a Mean of marker genotype mm: m 0 = r/2 (m+a) + (1-r)/2 m = m + ra The difference m 1 – m 0 = m + (1-r)a – m – ra = (1-2r)a

The difference of marker genotypes can reflect the size of the QTL, This reflection is confounded by the recombination fraction Based on the t-test, we cannot distinguish between the two cases, - Large QTL genetic effect but loose linkage with the marker - Small QTL effect but tight linkage with the marker

Example: marker analysis for body weight in a backcross of mice _____________________________________________________________________ Marker class 1Marker class 0 ___________________________________________ Markern1m1 s 2 1 n1m1 s 2 1 tP value _____________________________________________________________________________ 1 Hmg1-rs < DXMit < Rps17-rs < _____________________________________________________________________

Marker analysis for the F2 In the F2 there are three marker genotypes, MM, Mm and mm, which allow for the test of additive and dominant genetic effects. GenotypeMeanVariance MM:m 2 s 2 2 Mm:m 1 s 2 1 mm:m 0 s 2 0

Testing for the additive effect H0: m 2 = m 0 H1: m 2  m 0 t 1 = (m 2 –m 0 )/  [s 2 (1/n 2 +1/n 0 )], where s 2 = [(n 2 -1)s 2 2 +(n 0 -1)s 2 0 ]/(n 1 +n 0 –2) Compare it with t df=n2+n0-2 (0.05)

Testing for the dominant effect H0: m 1 = (m 2 + m 0 )/2 H1: m 1  (m 2 + m 0 )/2 t 2 = [m 1 –(m 2 + m 0 )/2]/  {[s 2 [1/n 1 +1/(4n 2 )+1/(4n 0 )]], where s 2 = [(n 2 -1)s 2 2 +(n 1 -1)s 2 1 +(n 0 -1)s 2 0 ]/(n 2 +n 1 +n 0 –3) Compare it with t df=n2+n1+n0-3 (0.05)

Example: Marker analysis in an F2 of maize ______________________________________________________________________________________________ Marker class 2Marker class 1Marker class 0AdditiveDominant ________________________________________ Mn 2 m 2 s 2 2 n 1 m 1 s 2 1 n 0 m 0 s 2 0 t 1 P t 2 P _______________________________________________________________________________________________ < _______________________________________________________________________________________________

II – QTL regression model based on markers (interval mapping) Suppose gene order Marker 1 – QTL – Marker 2 y i =  + a*z i + e i where a* is the “real” effect of a QTL, z i is an indicator variable describing the probability of individual i to carry the QTL genotype, Qq or qq, given a possible marker genotype, e i ~ N(0,  2 )

Indicators for a backcross Sam- Height MarkersThree-locusQTLMarkerQTL|marker ple (cm, y i ) 12genotypex* i x i z i P(1|11)  P(1|11)  P(1|11)  P(1|10)  1-  P(1|01)  P(1|00)  P(1|00)  P(1|00) 

Conditional probabilities (  1|i or  0|i ) of the QTL genotypes (missing) based on marker genotypes (observed) Marker QTL genotype GenotypeFreq.Qq(1)qq(0) 11½(1-r)(1-r 1 )(1-r 2 )/r 1 r 2 / (1-r)  1(1-r)  0 10½r(1-r 1 )r 2 /rr 1 (1-r 2 )/r  1-  = 1-r 1 /r   = r 1 /r 01½r r 1 (1-r 2 )/r (1-r 1 )r 2 /r   1 -  00½(1-r)r 1 r 2 / (1-r 1 )(1-r 2 )/ (1-r)  0 (1-r)  1 r is the recombination fraction between two markers r 1 is the recombination fraction between marker 1 and QTL r 2 is the recombination fraction between QTL and marker 2 Order Marker 1–QTL–Marker 2

Interval mapping with regression approach Consider a marker interval M 1 -M 2. We assume that a QTL is located at a particular position between the two markers (r 1 and  are fixed) With response variable, y i, and dependent variable, z i, a regression model is constructed as y i =  + a*z i + e i Statistical software, like SAS, can be used to estimate the parameters ( , a*,  2 ) for a particular QTL position contained in the regression model Move the QTL position every 2cM from M 1 to M 2 and draw the profile of the F value. The peak of the profile corresponds to the best estimate of the QTL position. F-value M 1 M 2 M 3 M 4 M 5 Testing position

Interval mapping with maximum likelihood Linear regression model for specifying the effect of a putative QTL on a quantitative trait Mixture model-based likelihood Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) Normal distributions of phenotypic values for each QTL genotype group Log-likelihood equations (via differentiation) EM algorithm Log-likelihood ratios The profile of log-likelihood ratios across a linkage group The determination of thresholds Result interpretations

Linear regression model for specifying the effect of a QTL on a quantitative trait y i =  + a*z i + e i, i = 1, …, n (latent model) a* is the (additive) effect of the putative QTL on the trait, z i is the indicator variable and defined as 1 when QTL genotype is Qq and 0 when QTL genotype is qq, e i  N(0,  2 ) Observed data:y i and marker genotypes M Missing data: QTL genotypes Parameters:  = ( , a*,  2,  =r 1 /r) Observed marker genotypes and missing QTL genotypes are connected in terms of the conditional probability (  1|i or  0|i ) of QTL genotypes (Qq or qq), conditional upon marker genotypes (11, 10, 01 or 00).

Mixture model-based likelihood without marker information L(y|  ) =  i=1 n [½f 1 (y i ) + ½f 0 (y i )] Sam- Height ple(cm, y) QTL genotype 1184Qq (1) 2185Qq (1) 3180Qq (1) 4182Qq (1) 5167qq (0) 6169qq (0) 7165qq (0) 8166qq (0)

Mixture model-based likelihood with marker information L(y,M|  ) =  i=1 n [  1|i f 1 (y i ) +  0|i f 0 (y i )] Sam- Height Marker genotype QTL ple(cm, y) M1M2Aaaa 1184Mm (1)Nn (1)½ ½ 2185Mm (1)Nn (1) ½ ½ 3180Mm (1)Nn (1) ½ ½ 4182Mm (1)nn (0)½ ½ 5167mm (0)nn (1)½ ½ 6169mm (0)nn (0)½ ½ 7165mm (0)nn (0)½ ½ 8166mm (0)Nn (0) ½ ½

Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(y,M|  ) =  i=1 n [  1|i f 1 (y i ) +  0|i f 0 (y i )] =  i=1 n1 [1 f 1 (y i ) + 0 f 0 (y i )]Conditional on 11   i=1 n2 [(1-  ) f 1 (y i ) +  f 0 (y i )]Conditional on 10   i=1 n3 [  f 1 (y i ) + (1-  ) f 0 (y i )]Conditional on 01   i=1 n4 [0 f 1 (y i ) + 1 f 0 (y i )]Conditional on 00

Normal distributions of phenotypic values for each QTL genotype group f 1 (y i ) = 1/(2  2 ) 1/2 exp[-(y i -  1 ) 2 /(2  2 )],  1 =  + a* f 0 (y i ) = 1/(2  2 ) 1/2 exp[-(y i -  0 ) 2 /(2  2 )],  0 = 

Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(y,M|  ) =  i=1 n [  1|i f 1 (y i ) +  0|i f 0 (y i )] log L(y,M|  ) =  i=1 n log[  1|i f 1 (y i ) +  0|i f 0 (y i )] Define  1|i =  1|i f 1 (y i )/[  1|i f 1 (y i ) +  0|i f 0 (y i )](1)  0|i =  0|i f 1 (y i )/[  1|i f 1 (y i ) +  0|i f 0 (y i )](2)  1 =(3)  0 =(4)  2 =(5)  =(6)

EM algorithm (1) Give initiate values  (0) = (  1,  0,  2,  ) (0), (2) Calculate  1|i (1) and  0|i (1) using Eqs. 1 and 2, (3) Calculate  (1) using  1|i (1) and  0|i (1), (4) Repeat (2) and (3) until convergence.

Two approaches for estimating the QTL position (  ) View  as a variable being estimated (derive the log-likelihood equation for the MLE of  ), View  as a fixed parameter by assuming that the QTL is located at a particular position.

Log-likelihood ratio (LR) test statistics H0: There is no QTL (  1 =  0 or a* = 0) – reduced model H1: There is a QTL (  1   0 or a*  0) – full model Under H0: L0 = L(y,M|, a*=0, ) Under H1: L1 = L(y,M|^  1, ^  0, ^  2, ) LR = -2(log L0 – log L1)

The profile of log-likelihood ratios across a linkage group LR Testing position

The determination of thresholds Permutation SampleOriginal12 …1000M1M2QTL x…xMm (1)Nn (1)? x…xMm (1)Nn (1)? x…xMm (1)Nn (1) ? x…xMm (1)nn (0)? x…xmm (0)nn (1)? x…xmm (0)nn (0)? x…xmm (0)nn (0)? x…xmm (0)Nn (0)? LRLR1LR2…LR1000 The critical value is the 95th or 99th percentiles of the 1000 LRs

Result interpretations A poplar genome project Objectives: Identify QTL affecting stemwood growth and production using molecular markers; Develop fast-growing cultivars using marker-assisted selection

Materials and Methods Poplar hybrids F1 hybrids from eastern cottonwood (D)  euramerican poplar (E) (a hybrid between eastern cottonwood  black poplar) Four hundred fifty (450) F1 hybrids were planted in a field trial DNA extraction and marker arrays A total of 560 markers were detected from a subset of F1 hybrids (90) Genetic linkage map construction

Profile of the log-likelihood ratios across the length of a linkage group Critical value determined from permutation tests

Advantages and disadvantages Compared with single marker analysis, interval mapping has several advantages: The position of the QTL can be inferred by a support interval; The estimated position and effects of the QTL tend to be asymptotically unbiased if there is only one segregating QTL on a chromosome; The method requires fewer individuals than single marker analysis for the detection of QTL Disadvantages: The test is not an interval test (a test that can distinguish whether or not there is a QTL within a defined interval and should be independent of the effects of QTL that are outside a defined region). Even when there is no QTL within an interval, the likelihood profile on the interval can still exceed the threshold (ghost QTL) if there is QTL at some nearby region on the chromosome. If there is more than one QTL on a chromosome, the test statistic at the position being tested will be affected by all QTL and the estimated positions and effects of “QTL” identified by this method are likely to be biased. It is not efficient to use only two markers at a time for testing, since the information from other markers is not utilized.