Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

Planning breeding programs for impact
Basic Mendelian Principles
 Read Chapter 6 of text  Brachydachtyly displays the classic 3:1 pattern of inheritance (for a cross between heterozygotes) that mendel described.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Quantitative Genetics
Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms.
Quantitative Genetics
 Read Chapter 6 of text  We saw in chapter 5 that a cross between two individuals heterozygous for a dominant allele produces a 3:1 ratio of individuals.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Broad-Sense Heritability Index
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Population Genetics is the study of the genetic
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
HS-LS-3 Apply concepts of statistics and probability to support explanations that organisms with an advantageous heritable trait tend to increase in proportion.
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Measuring Evolutionary Change Over Time
Identifying QTLs in experimental crosses
upstream vs. ORF binding and gene expression?
MENDEL AND THE GENE IDEA Gregor Mendel’s Discoveries
Relationship between quantitative trait inheritance and
Gene mapping in mice Karl W Broman Department of Biostatistics
Quantitative Traits in Populations
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Washington State University
Statistical issues in QTL mapping in mice
Mapping Quantitative Trait Loci
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
Basic concepts on population genetics
Maximum Likelihood Estimation & Expectation Maximization
Haplotype Reconstruction
General Animal Biology
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Linkage analysis and genetic mapping
MENDEL AND THE GENE IDEA Gregor Mendel’s Discoveries
Washington State University
General Animal Biology
Unit 5: Heredity Review Lessons 1, 3, 4 & 5.
Lecture 9: QTL Mapping II: Outbred Populations
The Evolution of Populations
Quantitative Trait Locus (QTL) Mapping
Presentation transcript:

Statistical Methods for Quantitative Trait Loci (QTL) Mapping Lectures 4 – Oct 10, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 Many of the techniques we will be discussing in class are based on the probabilistic models. Examples include Bayesian networks, Hidden markov models or Markov Random Fields. So, today, before we start discussing computational methods in active areas of research, we’d like to spend a couple of lectures for some basics on probabilistic models.

Outline Learning from data Basic concepts Maximum likelihood estimation (MLE) Maximum a posteriori (MAP) Expectation-maximization (EM) algorithm Basic concepts Allele, allele frequencies, genotype frequencies Hardy-Weinberg equilibrium Statistical methods for mapping QTL What is QTL? Experimental animals Analysis of variance (marker regression) Interval mapping (EM)

Continuous Space Revisited... Assuming sample x1, x2,…, xn is from a mixture of parametric distributions, X x1 x2 … xm xm+1 … xn x

A Real Example CpG content of human gene promoters GC frequency “A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters” Saxonov, Berg, and Brutlag, PNAS 2006;103:1412-1417

Mixture of Gaussians Parameters θ means variances mixing parameters P.D.F

A What-If Puzzle No closed form solution known for finding θ maximizing L. However, what if we knew the hidden data? Likelihood

EM as Chicken vs Egg IF zij known, could estimate parameters θ e.g., only points in cluster 2 influence μ2, σ2. IF parameters θ known, could estimate zij e.g., if |xi - μ1|/σ1 << |xi – μ2|/σ2, then zi1 >> zi2 BUT we know neither; (optimistically) iterate: E-step: calculate expected zij, given parameters M-step: do “MLE” for parameters (μ,σ), given E(zij) Overall, a clever “hill-climbing” strategy Convergence provable? YES

Simple Version: “Classification EM” If zij < 0.5, pretend it’s 0; zij > 0.5, pretend it’s 1 i.e., classify points as component 0 or 1 Now recalculate θ, assuming that partition Then recalculate zij , assuming that θ Then recalculate θ, assuming new zij , etc., etc.

EM summary Fundamentally an MLE problem EM steps E-step: calculate expected zij, given parameters M-step: do “MLE” for parameters (μ,σ), given E(zij) EM is guaranteed to increase likelihood with every E-M iteration, hence will converge. But may converge to local, not global, max. Nevertheless, widely used, often effective

Outline Basic concepts Statistical methods for mapping QTL Allele, allele frequencies, genotype frequencies Hardy-Weinberg equilibrium Statistical methods for mapping QTL What is QTL? Experimental animals Analysis of variance (marker regression) Interval mapping (Expectation Maximization)

Alleles Alternative forms of a particular sequence Each allele has a frequency, which is the proportion of chromosomes of that type in the population C, G and -- are alleles …ACTCGGTTGGCCTTAATTCGGCCCGGACTCGGTTGGCCTAAATTCGGCCCGG … …ACCCGGTAGGCCTTAATTCGGCCCGGACCCGGTAGGCCTTAATTCGGCCCGG … …ACCCGGTTGGCCTTAATTCGGCCGGGACCCGGTTGGCCTTAATTCGGCCGGG … …ACTCGGTTGGCCTTAATTCGGCCCGGACTCGGTTGGCCTAAATTCGGCCCGG … …ACCCGGTAGGCCTTAATTCGGCC--GGACCCGGTAGGCCTTAATTCGGCCCGG … …ACCCGGTTGGCCTTAATTCGGCCGGGACCCGGTTGGCCTTAATTCGGCCGGG … single nucleotide polymorphism (SNP) allele frequencies for C, G, --

Allele frequency notations For two alleles Usually labeled p and q = 1 – p e.g. p = frequency of C, q = frequency of G For more than 2 alleles Usually labeled pA, pB, pC ... … subscripts A, B and C indicate allele names

Genotype The pair of alleles carried by an individual Homozygotes If there are n alternative alleles … … there will be n(n+1)/2 possible genotypes In most cases, there are 3 possible genotypes Homozygotes The two alleles are in the same state (e.g. CC, GG, AA) Heterozygotes The two alleles are different (e.g. CG, AC)

Genotype frequencies Since alleles occur in pairs, these are a useful descriptor of genetic data. However, in any non-trivial study we might have a lot of frequencies to estimate. pAA, pAB, pAC,… pBB, pBC,… pCC …

The simple part Genotype frequencies lead to allele frequencies. For example, for two alleles: pA = pAA + ½ pAB pB = pBB + ½ pAB However, the reverse is also possible!

Hardy-Weinberg Equilibrium Relationship described in 1908 Hardy, British mathematician Weinberg, German physician Shows n allele frequencies determine n(n+1)/2 genotype frequencies Large populations Random union of the two gametes produced by two individuals

Random Mating: Mating Type Frequencies Denoting the genotype frequency of AiAj by pij, p112 2p11p12 2p11p22 p122 2p12p22 p222

Mendelian Segregation: Offspring Genotype Frequencies 1 0 0 2p11p12 0.5 0.5 0 2p11p22 0 1 0 p122 0.25 0.5 0.25 2p12p22 0 0.5 0.5 p222 0 0 1

Required Assumptions Diploid (2 sets of DNA sequences), sexual organism Autosomal locus Large population Random mating Equal genotype frequencies among sexes Absence of natural selection

Conclusion: Hardy-Weinberg Equilibrium Allele frequencies and genotype ratios in a randomly-breeding population remain constant from generation to generation. Genotype frequencies are function of allele frequencies. Equilibrium reached in one generation Independent of initial genotype frequencies Random mating, etc. required Conform to binomial expansion. (p1 + p2)2 = p12 + 2p1p2 + p22

Outline Basic concepts Statistical methods for mapping QTL Allele, allele frequencies, genotype frequencies Hardy-Weinberg Equilibrium Statistical methods for mapping QTL What is QTL? Experimental animals Analysis of variance (marker regression) Interval mapping

Quantitative Trait Locus (QTL) Definition of QTLs The genomic regions that contribute to variation in a quantitative phenotype (e.g. blood pressure) Mapping QTLs Finding QTLs from data Experimental animals Backcross experiment (only 2 genotypes for all genes) F2 intercross experiment

Backcross experiment X Inbred strains Advantage Disadvantage parental generation first filial (F1) generation Inbred strains Homozygous genomes Advantage Only two genotypes Disadvantage Relatively less genetic diversity X gamete AB AA Karl Broman, Review of statistical methods for QTL mapping in experimental crosses

F2 intercross experiment parental generation F1 generation X F2 generation gametes AA BB AB Karl Broman, Review of statistical methods for QTL mapping in experimental crosses

Trait distributions: a classical view X

QTL mapping Data Goals Phenotypes: yi = trait value for mouse i Genotypes: xik = 1/0 (i.e. AB/AA) of mouse i at marker k (backcross) Genetic map: Locations of genetic markers Goals Identify the genomic regions (QTLs) contributing to variation in the phenotype. Identify at least one QTL. Form confidence interval for QTL location. Estimate QTL effects.

The simplest method: ANOVA “Analysis of variance”: assumes the presence of single QTL For each marker: Split mice into groups according to their genotypes at each marker. Do a t-test/F-statistic Repeat for each typed marker t-test/F-statistic will tell us whether there is sufficient evidence to believe that measurements from one condition (i.e. genotype) is significantly different from another. LOD score (“Logarithm of the odds favoring linkage”) = log10 likelihood ratio, comparing single-QTL model to the “no QTL anywhere” model.

ANOVA at marker loci Advantages Disadvantages Simple. Easily incorporate covariates (e.g. environmental factors, sex, etc). Easily extended to more complex models. Disadvantages Must exclude individuals with missing genotype data. Imperfect information about QTL location. Suffers in low density scans. Only considers one QTL at a time (assumes the presence of a single QTL).

Interval mapping [Lander and Botstein, 1989] Consider any one position in the genome as the location for a putative QTL. For a particular mouse, let z = 1/0 if (unobserved) genotype at QTL is AB/AA. Calculate P(z = 1 | marker data). Need only consider nearby genotyped markers. May allow for the presence of genotypic errors. Given genotype at the QTL, phenotype is distributed as N(µ+∆z, σ2). Given marker data, phenotype follows a mixture of normal distributions.

IM: the mixture model Nearest flanking markers M1/M2 99% AB 0 7 20 M1 QTL M2 65% AB 35% AA Let’s say that the mice with QTL genotype AA have average phenotype µA while the mice with QTL genotype AB have average phenotype µB. The QTL has effect ∆ = µB - µA. What are unknowns? µA and µB Genotype of QTL 35% AB 65% AA 99% AA

References Prof Goncalo Abecasis (Univ of Michigan)’s lecture note Broman, K.W., Review of statistical methods for QTL mapping in experimental crosses Doerge, R.W., et al. Statistical issues in the search for genes affecting quantitative traits in experimental populations. Stat. Sci.; 12:195-219, 1997. Lynch, M. and Walsh, B. Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, MA, pp. 431-89, 1998. Broman, K.W., Speed, T.P. A review of methods for identifying QTLs in experimental crosses, 1999.