An integrative genomics approach to infer causal associations between gene expression and disease Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards, S., Guhathakurta, D., Sieberts, S. K., Monks, S., Reitman, M., Zhang, C., Lum, P. Y., Leonardson, A., Thieringer, R., Metzger, J. M., Yang, L., Castle, J., Zhu, H., Kash, S. F., Drake, T. A., Sachs, A., and Lusis, A. J. Nature Genetics (37): Speaker: Yen-Yi Ho Advisor: Giovanni Parmigiani Department of Biostatistics, Johns Hopkins University
Outline Introduction –Background & Definitions –Scientific Questions Previous eQTL Studies –Gene Expression Data in Humans –Statistical Analytic Approaches –Results Schadt et al. 2005: An Integrative Approach –Causality Models –Application: Gene Expression in BXD Mice –Results from Application Discussion of New Approach
QTL (Quantitative Trait Locus) Genetic locus (QTL; L), Disease (D) More than 1000 monogenic Mendelian diseases controlling genes have been identified using traditional gene mapping approach. Multiple genes, environmental factors, and interactions have limited the successes in human complex traits (such as cancer, diabetes, asthma). L D Introduction
mRNA DNA Genotype Data (SNP polymorphism) Gene expression Data Expression QTL (eQTL) Goal : Identify genomic locations where genotype significantly affects gene expression. We have more information …
Cis-, trans-, master trans- eQTLs cis- eQTL trans- eQTL master trans- eQTL
1.1 (B) = cis 2.2 (A) = cis controlled by 1 (B) 3.No controls 4.4(D) = cis controlled by 3 (F) 5.Not a cis, controlled by Not a cis, controlled by all Constructing regulatory networks ( hypothetical example) Genetic locus Expression Jansen, R.C. & Nap, J.P. (2001) Trends Genet, 2001, 17,
Genetic locus Expression Scientific Questions What is the variation and heritability of gene expression? Are there associations between genetic loci and target gene expression? What is the proportion of cis-/trans-eQTLs? How do we verify of cis-? Are there any master trans-eQTLs? Annotation and functional categories do cis-, trans- and master trans-eQTLs (KEGG, GO,… ).
Transcript abundance may act as intermediate phenotype between genetic loci and the clinical phenotype. Secondary goal Incorporate information of genotype, expression, and clinical traits together to construct regulatory networks and to improve understanding of disease etiologies. Scientific questions and goals
Data
They all used lymphoblastoid cell line from CEPH families to measure expression. Differences 1. Selected different expression traits 2. Platforms to measure expression / preprocess 3. SNP markers density 4. Different statistical approaches. The data
Statistical methods of human eQTL mapping study Linkage Nonparametric linkage analysis 1. Sib-pair analysis for quantitative trait (ASP) 2. Variance component analysis (VC) Association (Linkage disequilibrium) Family-based association analysis (QTDT) Population-based association analysis (GWA) Generally, the resolution of association approach would be greater than linkage.
Comparison of resolution between linkage and association analysis Literature Review
Genes with between / within individual variation > 1 Literature review
Heritability
None Literature Review
Hit rate: The proportion of expression traits significantly linked to eQTLs (range from 0.8-4%) Proportion of cis-eQTL is about 30 % 2 master trans-eQTLs were identified eQTL findings from previous studies Literature Review
Master trans-eQTLs Literature Review 14q32 20q13
Genetic locus Expression An Integrative Approach: Schadt et al., Nature Genetics, 2005
Models for causality –Causal Model –Reactive Model –Independent Model L mRNA Disease L mRNA Disease L mRNA A integrative approach New approach
Causal Model –Joint Probability –Likelihood L: Genotype R: mRNA level D: Disease L mRNA Disease M1 Likelihood
Reactive Model –Joint probability –Likelihood L mRNA Disease M2 Likelihood L: Genotype R: mRNA level D: Disease
Independent Model –Joint Probability –Likelihood L : Genotype R: mRNA level D: Disease L Disease mRNA M3 Likelihood
Model Selection Likelihood-based Causality Model Selection (LCMS) –Calculating the Likelihood based on the data. –The model best supported by the data : smallest AIC (Akaike Information Criterion)
Simulation study The model with an AIC significantly smaller than the AIC’s of the competing models was noted. L T1
Application to BXD mice data The data BXD mice: F2 offspring from C57BL/6J (B6) and DBA/2J (DBA). C57BL/6J: ob mutation in the C57BL/6J mouse background (B6-ob/ob) causes obesity, but only mild and transient diabetes (Coleman and Hummel, 1973). DBA/2J: mice show a low susceptibility to developing atherosclerotic aortic lesions Gene expression Liver extracted at 16 months of age 23,574 gene expression measured using Agilent arrays Genetic loci 139 autosomal genetic loci (microsatellite markers, 13 cM) Disease Omental fat pad mass (OFPM) trait New approach
Filtering L mRNA Disease Identify 4 candidate regions for OFPM traits chr1 at 95cM, chr6 at 43 cM, chr9 at 8cM, chr19 at 28cM. Expression traits significantly correlated with OFPM 440 intermediate expression traits were selected (P<0.001) Expression trait with significant linkage eQTLs at the candidate regions. 113 expression trait and 267 eQTLs are identified Perform LCM model selections for the 113 expression traits and ranked the expression traits by percent genetic variation in OFPM causally explained by traits. ? ? ?
Results from Application Zfp90: zinc finger protein 90 Hsd11b1 : 11-beta hydroxysteroid dehydrogenase isoform 1 C3ar1 : complement component 3a receptor 1 Tgfbr2 : transforming growth factor, beta receptor II
C3ar1 -/- Knockout mice (n=5-7) Tgfbr2 +/- Knockout mice (n=5-7) 10 weeks of age
Discussion Fail to discriminate highly correlated traits. Multiple filtering steps are involved. Need more development if try to automatically apply to general data sets. Measurement error of mRNA exceed D Advantage of constructing eQTL networks is less likely. L Disease L mRNA Disease
Reference Morley, M.; Molony, C.M.; Weber, T.M.; Devlin, J.L.; Ewens, K.G.; Spielman, R.S. & Cheung, V.G., Genetic analysis of genome-wide variation in human gene expression. Nature, 2004, 430, Monks, S.A.; Leonardson, A.; Zhu, H.; Cundiff, P.; Pietrusiak, P.; Edwards, S.; Phillips, J.W.; Sachs, A. & Schadt, E.E., Genetic inheritance of gene expression in human cell lines. Am J Hum Genet, 2004, 75, Cheung, V.G.; Spielman, R.S.; Ewens, K.G.; Weber, T.M.; Morley, M. & Burdick, J.T. Mapping determinants of human gene expression by regional and genome-wide association. Nature, 2005, 437, Stranger, B.E.; Forrest, M.S.; Clark, A.G.; Minichiello, M.J.; Deutsch, S.; Lyle, R.; Hunt, S.; Kahl, B.; Antonarakis, S.E.; Tavar?, S.; Deloukas, P. & Dermitzakis, E.T., Genome- wide associations of gene expression variation in humans. PLoS Genet, 2005, 1, e78 Deutsch, S.; Lyle, R.; Dermitzakis, E.T.; Attar, H.; Subrahmanyan, L.; Gehrig, C.; Parand, L.; Gagnebin, M.; Rougemont, J.; Jongeneel, C.V. & Antonarakis, S.E. Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes., Hum Mol Genet, 2005, 14, Jansen, R.C. & Nap, J.P., Genetical genomics: the added value from segregation. Trends Genet, 2001, 17, Schadt, E.E.; Lamb, J.; Yang, X.; Zhu, J.; Edwards, S.; Guhathakurta, D.; Sieberts, S.K.; Monks, S.; Reitman, M.; Zhang, C.; Lum, P.Y.; Leonardson, A.; Thieringer, R.; Metzger, J.M.; Yang, L.; Castle, J.; Zhu, H.; Kash, S.F.; Drake, T.A.; Sachs, A. & Lusis, A.J., An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet, 2005, 37,
Thank you ☺