Quantitative Trait Locus (QTL) Mapping

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

Planning breeding programs for impact
Introduction Materials and methods SUBJECTS : Balb/cJ and C57BL/6J inbred mouse strains, and inbred fruit fly strains number 11 and 70 from the recombinant.
Gene by environment effects. Elevated Plus Maze (anxiety)
Experimental crosses. Inbred Strain Cross Backcross.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Statistical association of genotype and phenotype.
Lecture 9: QTL Mapping I:
An quick survey of human 2-point linkage analysis Terry Speed Division of Genetics & Bioinformatics, WEHI Department of Statistics, UCB NWO/IOP Genomics.
Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.
Quantitative Genetics
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
QTL Cartographer A Program Package for finding Quantitative Trait Loci C. J. Basten Z.-B. Zeng and B. S. Weir.
Association between genotype and phenotype
An quick overview of human genetic linkage analysis
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Pedigree and linkage analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
The genomes of recombinant inbred lines
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
1 QTL mapping in mice, completed. Lecture 12, Statistics 246 March 2, 2004.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Identifying QTLs in experimental crosses
The genomes of recombinant inbred lines
upstream vs. ORF binding and gene expression?
Statistical Methods for Quantitative Trait Loci (QTL) Mapping
Model Selection for Multiple QTL
Recombination and Linkage
New Methods for Analyzing Complex Traits
An quick survey of human 2-point linkage analysis
Relationship between quantitative trait inheritance and
Gene mapping in mice Karl W Broman Department of Biostatistics
CHAPTER 29: Multiple Regression*
Lecture 7: QTL Mapping I:
The breakpoint process on RI chromosomes
Statistical issues in QTL mapping in mice
Mapping Quantitative Trait Loci
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
Genome-wide Association Studies
The genomes of recombinant inbred lines
Lecture 9: QTL Mapping II: Outbred Populations
Chapter 7 Beyond alleles: Quantitative Genetics
Presentation transcript:

Quantitative Trait Locus (QTL) Mapping Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: Quantitative Trait Locus (QTL) Mapping Eric Xing Lecture 17, March 22, 2006 Reading: DTW book, Chap 13

Phenotypical Traits Body measures: Disease susceptibility and drug response Gene expression (microarray)

Backcross experiment

F2 intercross experiment

Trait distributions: a classical view

Another representation of a trait distribution Note the equivalent of dominance in our trait distributions.

A second example Note the approximate additivity in our trait distributions here.

QTL mapping Data Goals Phenotypes: yi = trait value for mouse i Genotype: xij = 1/0 (i.e., A/H) of mouse i at marker j(backcross); need three states for intercross Genetic map: Locations of markers Goals Identify the (or at least one) genomic region, called quantitative trait locus = QTL, that contributes to variation in the trait Form confidence intervals for the QTL location Estimate QTL effects

QTL mapping (BC)

QTL mapping (F2)

Models: Recombination We assume no chromatid or crossover interference.  points of exchange (crossovers) along chromosomes are distributed as a Poisson process, rate 1 in genetic distance  the marker genotypes {xij} form a Markov chain along the chromosome for a backcross; what do they form in an F2 intercross?

Models: Genotype  Phenotype Let y = phenotype, g = whole genome genotype Imagine a small number of QTL with genotypes g1,…., gp (2p or 3p distinct genotypes for BC, IC resp, why?). We assume E(y|g) = (g1,…gp ), var(y|g) = 2(g1,…gp)

Models: Genotype  Phenotype Homoscedacity (constant variance)  2(g1,…gp) = 2 (constant) Normality of residual variation y|g ~ N(g ,2 ) Additivity: (g1,…gp ) =  + ∑j gj (gj = 0/1 for BC) Epistasis: Any deviations from additivity. (g1,…gp ) =  + ∑j gj +∑wij gi gj

Additivity, or non-additivity (BC) The effect of QTL 1 is the same, irrespective of the genotype of QTL 2, and vice versa. Epistatic QTLs

Additivity or non-additivity: F2

The simplest method: ANOVA Split subjects into groups according to genotype at a marker Do a t-test/ANOVA Repeat for each marker t-test/ANOVA will tell whether there is sufficient evidence to say that measurements from one condition (i.e., genotype) differ significantly from another LOD score = log10 likelihood ratio, comparing single-QTL model to the “no QTL anywhere” model.

ANOVA at marker loci Advantages Simple Easily incorporate covariates (sex, env, treatment ...) Easily extended to more complex models Disadvantages Must exclude individuals with missing genotype data Imperfect information about QTL location Suffers in low density scans Only considers one QTL at a time

Interval mapping (IM) Consider any one position in the genome as the location for a putative QTL For a particular mouse, let z = 1/0 if (unobserved) genotype at QTL is AB/AA Calculate Pr(z = 1 | marker data of an interval bracketing the QTL) Assume no meiotic interference Need only consider flanking typed markers May allow for the presence of genotyping errors Given genotype at the QTL, phenotype is distributed as yi | zi ~ Normal( zi , 2 ) Given marker data, phenotype follows a mixture of normal distributions

IM: the mixture model AA AB AB

IM: estimation and LOD scores Use a version of the EM algorithm to obtain estimates of μAA, μAB, and σ (an iterative algorithm) Calculate the LOD score Repeat for all other genomic positions (in practice, at 0.5 cM steps along genome)

LOD score curves

LOD thresholds To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. LOD threshold = 95th %ile of the distribution of genome-wide maxLOD, when there are no QTL anywhere Derivations: Analytical calculations (Lander & Botstein, 1989) Simulations Permutation tests (Churchill & Doerge, 1994).

Permutation distribution for trait4

Interval mapping Advantages Make proper account of missing data Can allow for the presence of genotyping errors Pretty pictures Higher power in low-density scans Improved estimate of QTL location Disadvantages Greater computational effort Requires specialized software More difficult to include covariates? Only considers one QTL at a time

Multiple QTL methods Why consider multiple QTL at once? To separate linked QTL. If two QTL are close together on the same chromosome, our one-at-a-time strategy may have problems finding either (e.g. if they work in opposite directions, or interact). Our LOD scores won’t make sense either. To permit the investigation of interactions. It may be that interactions greatly strengthen our ability to find QTL, though this is not clear. To reduce residual variation. If QTL exist at loci other than the one we are currently considering, they should be in our model. For if they are not, they will be in the error, and hence reduce our ability to detect the current one. See below.

The problem n backcross subjects; M markers in all, with at most a handful expected to be near QTL xij = genotype (0/1) of mouse i at marker j yi = phenotype (trait value) of mouse i Yi =  + ∑j=1M jxij + j Which j  0 ?  Variable selection in linear models (regression)

Finding QTL as model selection Select class of models Additive models Additive plus pairwise interactions Regression trees Compare models () BIC() = logRSS()+ (log n/n) Sequential permutation tests Search model space Forward selection (FS) Backward elimination (BE) FS followed by BE MCMC Assess performance Maximize no QTL found; control false positive rate

Acknowledgements Melanie Bahlo, WEHI Hongyu Zhao, Yale Karl Broman, Johns Hopkins Nusrat Rabbee, UCB

References www.netspace.org/MendelWeb HLK Whitehouse: Towards an Understanding of the Mechanism of Heredity, 3rd ed. Arnold 1973 Kenneth Lange: Mathematical and statistical methods for genetic analysis, Springer 1997 Elizabeth A Thompson: Statistical inference from genetic data on pedigrees, CBMS, IMS, 2000. Jurg Ott : Analysis of human genetic linkage, 3rd edn Johns Hopkins University Press 1999 JD Terwilliger & J Ott : Handbook of human genetic linkage, Johns Hopkins University Press 1994