Functional Mapping A statistical model for mapping dynamic genes.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
HSRP 734: Advanced Statistical Methods July 24, 2008.
3 rd Place Winning Project, 2009 USPROC Author: Kinjal Basu Sujayam Saha Sponsor Professor: S. Ghosh A.K. Ghosh Indian Statistical Institute, Kolkata,
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.
QTL Mapping R. M. Sundaram.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
The loss function, the normal equation,
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing
Maximum likelihood (ML) and likelihood ratio (LR) test
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
DATA ANALYSIS Module Code: CA660 Lecture Block 5.
Linear and generalised linear models
Today Concepts underlying inferential statistics
Maximum likelihood (ML)
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
The Practice of Social Research
Absolute error. absolute function absolute value.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Gene, Allele, Genotype, and Phenotype
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
Functional Mapping of QTL and Recent Developments
Association between genotype and phenotype
QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Developmental Models: Latent Growth Models Brad Verhulst & Lindon Eaves.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review of statistical modeling and probability theory Alan Moses ML4bio.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
A simple method to localise pleiotropic QTL using univariate linkage analyses of correlated traits Manuel Ferreira Peter Visscher Nick Martin David Duffy.
Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.
Nonparametric Statistics
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
(1) Schedule Mar 15Linkage disequilibrium (LD) mapping Mar 17LD mapping Mar 22Guest speaker, Dr Yang Mar 24Overview Attend ENAR Biometrical meeting in.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Nonparametric Statistics
Genome Wide Association Studies using SNP
Nonparametric Statistics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Detecting variance-controlling QTL
EM for Inference in MV Data
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Parametric Methods Berlin Chen, 2005 References:
EM for Inference in MV Data
Presentation transcript:

Functional Mapping A statistical model for mapping dynamic genes

Simple regression model for univariate trait Phenotype = Genotype + Error y i = x i  j + e i x i is the indicator for QTL genotype  j is the mean for genotype j e i ~ N(0,  2 ) Recall: Interval mapping for a univariate trait ! QTL genotype is unobservable (missing data)

A simulation example (F2) qq Qq QQ Overall trait distribution The overall trait distribution is composed of three distributions, each one coming from one of the three QTL genotypes, QQ, Qq, and qq.

Solution: consider a finite mixture model With  QQ =m+a,  Qq =m+d,  qq =m-a

We use finite mixture model for estimating genotypic effects (F 2 ) y i ~ p(y i | ,  ) =  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i ) QTL genotype (j) QQ Qq qq Code f j (y i ) is a normal distribution density with mean  j and variance  2  = (  2,  1,  0 )  = QTL conditional probability given on flanking markers where

Subject Marker (M) Conditional probability M 1 M 2 … M m Phenotype (y) of QTL genotype QQ( 2 ) Qq( 1 ) qq( 0 ) 1 AA(2) BB(2) …y1y1  2 |1  1 |1  0 |1 2 AA(2) BB(2)...y2y2  2 |2  1 |2  0 |2 3 Aa(1) Bb(1)...y3y3  2 |3  1 |3  0 |3 4 Aa(1) Bb(1)...y4y4  2 |4  1 |4  0 |4 5 Aa(1) Bb(1)...y5y5  2 |5  1 |5  0 |5 6 Aa(1) bb(0)...y6y6  2 |6  1 |6  0 |6 7 aa(0) Bb(1)...y7y7  2 |7  1 |7  0 |7 8 aa(0) bb(0) …y8y8  2 |8  1 |8  0 |8 Data Structure

Human Development Robbins 1928, Human Genetics, Yale University Press

Tree growth Looks mess, but there are simple rules underlying the complexity.

The dynamics of gene expression Gene expression displays in a dynamic fashion throughout lifetime. There exist genetic factors that govern the development of an organism involving: –Those constantly expressed throughout the lifetime (called deterministic genes) –Those periodically expressed (e.g., regulation genes) Also environment factors such as nutrition, light and temperature. We are interested in identifying which gene(s) govern(s) the dynamics of a developmental trait using a procedure called Functional Mapping.

Stem diameter growth in poplar trees Ma et al. (2002) Genetics

Poplar tree - height & diameter

Mouse growth A: male; B: female

Developmental Pattern of Genetic Effects Wu and Lin (2006) Nat. Rev. Genet. QQ Qq QQ Qq

Sample Marker (M)Phenotype (y)Conditional probability 1 2 … m t 1 t 2 … t T of QTL genotype QQ( 2 ) Qq( 1 ) qq( 0 ) …y 1 (1) y 1 (2) … y 1 (T)  2 |1  1 |1  0 | y 2 (1) y 2 (2) … y 2 (T)  2 |2  1 |2  0 | …y 3 (1) y 3 (2) … y 3 (T)  2 |3  1 |3  0 | …y 4 (1) y 4 (2) … y 4 (T)  2 |4  1 |4  0 | …y 5 (1) y 5 (2) … y 5 (T)  2 |5  1 |5  0 | …y 6 (1) y 6 (2) … y 6 (T)  2 |6  1 |6  0 | …y 7 (1) y 7 (2) … y 7 (T)  2 |7  1 |7  0 | y 8 (1) y 8 (2) … y 8 (T)  2 |8  1 |8  0 |8 Data Structure Parents AA  aa F 1 Aa  Aa F 2 AA Aa aa ¼ ½ ¼

Mapping methods for dynamic traits Traditional approach: treat traits measured at each time point as a univariate trait and do mapping with traditional QTL mapping approaches such as interval or composite interval mapping. Limitations: –Single trait model ignores the dynamics of the gene expression change over time, and is too simple without considering the underlying biological developmental principle. A better approach: Incorporate the biological principle into a mapping procedure to understand the dynamics of gene expression using a procedure called Functional Mapping (pioneered by Wu and group).

A general framework pioneered by Dr. Wu and his colleagues, to map QTLs that affect the pattern and form of development in time course - Ma et al., Genetics Wu et al., Genetics 2004 (highlighted in Nature Reviews Genetics) - Wu and Lin, Nature Reviews Genetics 2006 While traditional genetic mapping is a combination between classic genetics and statistics, functional mapping combines genetics, statistics and biological principles. Functional Mapping ( FunMap )

Data structure for an F2 population PhenotypeMarker _______________________________________________________________________ Sampley(1)y(2)…y(T)12…m _____________________________________________________________________________________ 1y 11 y 21 …y T1 11…0 2y 12 y 22 …y T2 -11…1 3y 13 y 23 …y T3 -10…1 4y 14 y 24 …y T4 1-1…0 5y 15 y 25 …y T5 11…-1 6y 16 y 26 …y T6 10…-1 7y 17 y 27 …y T7 0-1…0 8y 18 y 28 …y T8 01…1 ny 1n y 2n …y Tn 10…-1  There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10, 02, 01 and 00, with sample sizes n 22, n 21, …, n 00 ;  The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and qq (0) given these marker genotypes  2i,  1i,  0i.

Univariate interval mapping L(y) = f j (y i ) =j=2,1,0 for QQ, Qq, qq The Lander-Botstein model estimates (  2,  1,  0,  2, QTL position) Multivariate interval mapping L(y) = Vector y = (y 1, y 2, …, y T ) f j (y i ) = Vectors u j = (  j1,  j2, …,  jT ) Residual variance-covariance matrix  = The unknown parameters: (u 2, u 1, u 0, , QTL position) [3T + T(T-1)/2 +T parameters]

Functional mapping: the framework Observed phenotype: y i = [y i (1), …, y i (T)] ~ MVN( u j,  ) Mean vector: u j = [μ j (1), μ j (2), …, μ j (T)], j=2,1,0 (Co)variance matrix:

An innovative model for genetic dissection of complex traits by incorporating mathematical aspects of biological principles into a mapping framework Functional Mapping Provides a tool for cutting-edge research at the interplay between gene action and development Functional mapping does not estimate (u 2, u 1, u 0,  ) directly, instead of the biologically meaningful parameters.

The Finite Mixture Model Modeling mixture proportions, i.e., genotype frequencies at a putative QTL Modeling the mean vector Modeling the (co)variance matrix Three statistical issues:

Modeling the developmental Mean Vector Parametric approach Growth trajectories – Logistic curve HIV dynamics – Bi-exponential function Biological clock – Van Der Pol equation Drug response – Emax model Nonparametric approach Lengedre function (orthogonal polynomial) Spline techniques

Example: Stem diameter growth in poplar trees Ma, et al. Genetics 2002

Modeling the genotype- dependent mean vector, u j = [u j (1), u j (2),…, u j (T)] = [,, …, ] Instead of estimating m j, we estimate curve parameters  p = (a j, b j, r j ) Number of parameters to be estimated in the mean vector Time points Traditional approach Our approach 5 3  5 = 15 3  3 =  10 = 30 3  3 =  50 =  3 = 9 Logistic Curve of Growth – A Universal Biological Law ( West et al.: Nature 2001)

Modeling the Covariance Matrix Stationary parametric approach Autoregressive (AR) model with log transformation Nonstationary parameteric approach Structured antedependence (SAD) model Ornstein-Uhlenbeck (OU) process  =

Functional interval mapping L(y) = Vector y = (y 1, y 2, …, y k ) f 2 (y i ) = f 1 (y i ) = f 0 (y i ) = u 2 = (,,…, ) u 1 = (,, …, ) u 0 = (,, …, )

Estimation

The EM algorithm M step E step Iterations are made between the E and M steps until convergence Calculate the posterior probability of QTL genotype j for individual i that carries a known marker genotype Solve the log-likelihood equations

EM continued The likelihood function:

Statistical Derivations M-step: update the parameters (see Ma et al. 2002, Genetics for details)

Testing QTL effect: Global test Instead of testing the mean difference at every time points for different genotypes, we test the difference of the curve parameters. The existence of QTL is tested by H 0 means the three mean curves overlap and there is no QTL effect. Likelihood ratio test with permutation to assess significance. where the notation “~” and “^” indicate parameters estimated under the null and the alternative hypothesis, respectively.

Testing QTL effect: Regional test Regional test: to test at which time period [t 1,t 2 ] the detect QTL triggers an effect, we can test the difference of the area under the curve (AUC) for different QTL genotype, i.e., where Permutation tests can be applied to assess statistical significance.

Applications Several real examples are used to show the utility of the functional mapping approach. Application I is about a poplar growth data set. Application II is about a mouse growth data set. Application III is about a rice tiller number growth data set.

Application I: A Genetic Study in Poplars Parents AA  aa F 1 Aa  AA BC AA Aa ½ Genetic design

Stem diameter growth in poplar trees Ma, Casella & Wu, Genetics 2002 a: Asymptotic growth b: Initial growth r: Relative growth rate

Differences in growth across ages UntransformedLog-transformed Poplar data

Modeling the covariance structure Stationary parametric approach First-order autoregressive model (AR(1)) Multivariate Box-Cox transformation to stabilize variance (Box and Cox, 1964 Transform-both-side (TBS) technique to reserve the interpretability of growth parameters (Carrol and Ruppert, 1984; Wu et al., 2004). For a log transformation (i.e., =0),  q = ( ,  2 )

Functional mapping incorporated by logistic curves and AR(1) model QTL Results by FunMap Results by Interval mapping FunMap has higher power to detect the QTL than the traditional interval mapping method does. Ma, Casella & Wu, Genetics 2002

Application II: Mouse Genetic Study Detecting Growth Genes Data supplied by Dr. Cheverud at Washington University

Mouse Linkage Map

Body Mass Growth for Mouse 510 individuals measured Over 10 weeks Parents AA  aa F 1 Aa  Aa F 2 AA Aa aa ¼ ½ ¼

Functional mapping Genetic control of body mass growth in mice Zhao, Ma, Cheverud & Wu, Physiological Genomics 2004

Application III: functional mapping of PCD QTL Rice tiller development is thought to be controlled by genetic factors as well as environments. The development of tiller number growth undergoes a process called programmed cell death (PCD).

Parents AA  aa F 1 Aa DH AA aa ½ Genetic design

Joint model for the mean vector We developed a joint modeling approach with growth and death phases are modeled by different functions. The growth phase is modeled by logistic growth curve to fit the universal growth law. The dead phase is modeled by orthogonal Legendre function to increase the fitting flexibility.

Cui et al. (2006) Physiological Genomics

QTL trajectory plot

Advantages of Functional Mapping Incorporate biological principles of growth and development into genetic mapping, thus, increasing biological relevance of QTL detection Provide a quantitative framework for hypothesis tests at the interplay between gene action and developmental pattern - When does a QTL turn on? - When does a QTL turn off? - What is the duration of genetic expression of a QTL? - How does a growth QTL pleiotropically affect developmental events? The mean-covariance structures are modeled by parsimonious parameters, increasing the precision, robustness and stability of parameter estimation

Functional Mapping: toward high-dimensional biology A new conceptual model for genetic mapping of complex traits A systems approach for studying sophisticated biological problems A framework for testing biological hypotheses at the interplay among genetics, development, physiology and biomedicine

Functional Mapping: Simplicity from complexity Estimating fewer biologically meaningful parameters that model the mean vector, Modeling the structure of the variance matrix by developing powerful statistical methods, leading to few parameters to be estimated, The reduction of dimension increases the power and precision of parameter estimation