Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit Center for Human Genetic Research Massachusetts General Hospital

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Genetics I. I. Mendelian 1. History A. Introduction.
Basics of Linkage Analysis
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Extensions to Mendel’s Observation Types of Dominance Relationships Between Alleles of Same Locus: Complete Dominance Incomplete Dominance Codominance.
Section 7.1: Chromosomes and Phenotype
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
31 January, 2 February, 2005 Chapter 6 Genetic Recombination in Eukaryotes Linkage and genetic diversity.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Simulation/theory With modest marker spacing in a human study, LOD of 3 is 9% likely to be a false positive.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Association analysis Shaun Purcell Boulder Twin Workshop 2004.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
Genetic Recombination in Eukaryotes
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Reminder - Means, Variances and Covariances. Covariance Algebra.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Observing Patterns in Inherited Traits
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Analysis of genome-wide association studies
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
From the Gene to the Genome Genetic Inheritance Patterns Observing Genetic Differences in the DNA.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Dihybrid (or greater) Crosses: Review For either genotype or phenotype, the expected outcomes of a particular cross can be calculated by multiplying the.
CS177 Lecture 10 SNPs and Human Genetic Variation
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
Bioinformatics R for Bioinformatics PART II Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Quantitative Genetics
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Methods in genome wide association studies. Norú Moreno
The same gene can have many versions.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
¾ A (AA, Aa) A B: ¾ x ¾ = 9/16 A b: ¾ x ¼ = 3/16 a B: ¼ x ¾ = 3/16 a b: ¼ x ¼ = 1/16 GENE INTERACTIONS Consider two independent genes, A and B; two heterozygous.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
BIO.B.2- GENETICS CHAPTER 11. B2: Genetics 1. Describe and/ or predict observed patterns of inheritance i.e. dominant, recessive, co-dominant, incomplete.
Multiple-Locus Genome-Wide Association Testing David Dean CSE280A.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Association tests. Basics of association testing Consider the evolutionary history of individuals proximal to the disease carrying mutation.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Power Calculations for GWAS
SNPs and complex traits: where is the hidden heritability?
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Gene Hunting: Design and statistics
Bio.B.2- Genetics CHAPTER 11.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Power to detect QTL Association
Beyond GWAS Erik Fransen.
KEY CONCEPT Genes encode proteins that produce a diverse range of traits.
Exercise: Effect of the IL6R gene on IL-6R concentration
The Future of Association Studies: Gene-Based Analysis and Replication
Presentation transcript:

Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit Center for Human Genetic Research Massachusetts General Hospital Gene-environment & gene-gene interaction in association studies: a methodologic introduction

Finding disease-causing variation The Human Genome chromosome 4 DNA sequence SNP (single nucleotide polymorphism) …GGCGGTGTTCCGGGCCATCACCATTGCGGG CCGGATCAACTGCCCTGTGTACATCACCAAG GTCATGAGCAAGAGTGCAGCCGACATCATCG CTCTGGCCAGGAAGAAAGGGCCCCTAGTTTT TGGAGAGCCCATTGCCGCCAGCCTGGGGACC GATGGCACCCATTACTGGAGCAAGAACTGGG CCAAGGCTGCGGCGTTCGTGACTTCCCCTCC CCTGAGCCCGGACCCTACCACGCCCGACTA…

Rare disease, major gene effect GenotypeRisk of disease DD0.001 Dd0.001 dd0.95 Disease prevalence ~1 in 1000 Individuals with dd are ~1000 times more likely to get disease Frequency of d in controls~ 5% Frequency of d in cases~ 96% Disease prevalence ~1 in 1000 Individuals with dd are ~1000 times more likely to get disease Frequency of d in controls~ 5% Frequency of d in cases~ 96%

Common polygenic disease

GenotypeRisk of disease DD0.01 Dd0.012 dd Common disease, polygenic effects Disease prevalence ~1 in 100 Each extra d allele increases risk by ~1.2 times Frequency of d in controls~ 5% Frequency of d in cases~ 6% Disease prevalence ~1 in 100 Each extra d allele increases risk by ~1.2 times Frequency of d in controls~ 5% Frequency of d in cases~ 6%

? Genotype Environment Phenotype

? Gene-environment correlation Gene effect Environmental effect The environment modifies the effect of a gene A gene modifies the effect of an environment G x E interaction Gene-environment interaction

Linkage disequilibrium (LD) Epistasis Gene effect Epistasis: one gene modifies the effect of another Gene × gene interaction

Classical definition of epistasis The aa genotype masks the effect of the bb genotype AA Aa aa BBBb bb

Separate analysis locus A shows an association with the trait locus B appears unrelated Marker A Marker B

Joint analysis locus B modifies the effects of locus A

Two locus genotypes Locus A Locus BAAAaaa BBAABBAaBBaaBB BbAABbAaBbaaBb bbAabbAabbaabb

Epistasis & haplotypes Two-locus genotype A/a B/b (AaBb) A and B need not even be on same chromosome Haplotype AB / ab A and B on same chromosome; effect could appear as “interaction” cis versus trans effects AB haplotype causes diseaseA and B interact to cause disease A a B b A a b B A a B b A a b B disease no disease disease

Two locus genotypes Locus A Locus BAAAaaa BBf AABB f AaBB f aaBB f BB Bbf AABb f AaBb f aaBb f Bb bbf Aabb f Aabb f aabb f bb f AA f Aa f aa f “Penetrance” = probability of developing disease given genotype

GenotypeRisk of disease DD0.01 Dd0.012 dd Common disease, polygenic effects Disease prevalence ~1 in 100 Each extra d allele increases risk by ~1.2 times Frequency of d in controls~ 5% Frequency of d in cases~ 6% Disease prevalence ~1 in 100 Each extra d allele increases risk by ~1.2 times Frequency of d in controls~ 5% Frequency of d in cases~ 6%

Small single SNP effects might represent larger epistatic effects AA Aa aa BBBb bb Risk of developing disease Frequency a = b = 0.1

Interaction may be a common feature of genetic variation Brem et al (2005) Nature –gene expression phenotypes in yeast –two-stage approach to find pairs of loci 65% of these pairs showed significant interaction many secondary loci would be missed by standard approaches though

Examples of interactions? RiskEnvironmentOutcome phenylalanine hydroxylase deficiency dietary phenylalanine mental retardation debrisoquine metabolism smokinglung cancer fair skinsun exposureskin cancer Lewis blood groupalcohol intakecoronary atherosclerosis APOE genotypehead injuryAlzheimer's disease

The rest of this talk… Statistical issues Study designs Examples

AAAC AA CC AA AC AA ACCC AA ACAC ACACCC  Family-based transmission disequilibrium test (TDT) Population-based case/control 

Paternal haplotype Maternal haplotype A C G G T G ACGGTG Marker 1 (“Locus”, “SNP”) Marker 2Marker 3 Genotypes AGT/CGG AGG/CGT Haplotypes ? In the population: 2 alleles implies 3 genotypes: AA AC CC ACAC Frequency p q=1-p Frequency p2p2 q2q2 2pq Allele Genotype An “association study”: does allele/genotype/haplotype frequency differ between cases and controls? Homozygote Heterozygote

Relative risk D+D- E+ab E-cd Risk in E+ = a / ( a + b ) Risk in E- = c / ( c + d ) Relative risk of exposure = (a /( a + b )) / (c /(c + d ))

Odds ratio: measure of association Aa Caseab Controlcd Odds of A in cases = a/b Odds of A in controls = c/d Odds ratio = (a/b)/(c/d) = ad / bc

E-E+ AaAa Case Control Odds ratio (80*20)/(80*20) (60*20)/(80*40) Z = ( ln(OR E- ) – ln(OR E+ ) ) / sqrt( V E- + V E+ ) V( ln(OR) ) = 1/a + 1/b + 1/c + 1/d

Regression modeling of interaction Y = b X X + e Y = b X X + b Z Z + b I XZ + e Y = ( b X + b I Z )X + b Z Z + e interaction component effect of X on Y is modified by Z

Y = b 0 + b 1 G + b 2 E +b 3 G×E Y Linear for continuous outcomes Logistic regression for yes/no outcomes G = 0, 1, 2 copies of allele “A” E = yes/no exposure (0/1) continuous measure E- E+ Gene dosage

Epistasis & dominance Dominance as intralocus interaction –dominance component can also interact too, e.g. with an environment: Epistasis as interlocus interaction –additive × additive (two-way interactions) –additive × dominance (three-way interactions) –dominance × dominance (four-way interactions) Y = b 0 + b 1 A + b 2 D + b 2 E + b 3 A×E + b 4 D×E A coded { -1, 0, +1 } D coded { 0, 1, 0 }

Y AA Aa aa E- E+ Genotype Y = b 0 + b 1 A + b 2 D + b 2 E + b 3 A×E + b 4 D×E A coded { -1, 0, +1 } D coded { 0, 1, 0 }

The “Interactome”

Definitions of epistasis Biological Statistical Individual-level phenomenon Population-level phenomenon

Requires: 1) Variation between individuals 2) Effect on disease Requires: 1) Correct statistical definition of effect

What do interactions mean? TEST MAIN EFFECT –Null hypothesis straightforward TEST INTERACTION –Null hypothesis is a mathematical model describing joint effects A- A+ B-1a B+b?

A-A+RR(A) B-1aa/1 = a B+babab/b = a Additive risk differences A-A+RD(A) B-1aa-1 = a-1 B+ba+b-1a+b-1-b= a-1 Multiplicative risk ratios

“…we defined interaction as departure from a multiplicative model…” Multiplicative model(a×b) –common, easy to implement, logistic regression additive on log-odds scale multiplicative on risk scale Other common models (on risk) –additive(a + b) –heterogeneity model (a + b – ab )

A- A+ B- B LENGTH = A + B

A- A+ B- B AREA = A + B + A×B

OriginalLog-transformCubic-transformCensored7-point scale G1 G2 G1  G2 

OR(A) = 2 OR(B) = /21/3 Additive (3.00) Multiplicative (4.00) ???

OR(A) = 1.2 OR(B) = /21/3 Additive (1.40) Multiplicative (1.44) ?

AA ACAC  No controls (Case-only design) Population-based controls Family-based controls More robust, fewer assumptions More efficient, powerful v.s.

Case-only design Detect interaction only, no main effects Risk factorsPrevalence G-E- p 0 G+E- p G G-E+p E G+E+p GE = p 0 ∙ p G /p 0 ∙ p E /p 0

Case-only design Detect interaction only, no main effects Risk factorsPrevalence G-E- p 0 G+E- p G G-E+p E G+E+p GE = p 0 ∙ p G /p 0 ∙ p E /p 0 Leads to OR INT = OR GE / (OR G ∙ OR E ) It turns out, OR INT = OR Case / OR Control where OR Case is the association of G and E in cases and OR Control is the association of G and E in controls

No interactionInteraction % replicates significant at p=0.05 Case-only designs offer efficient detection of interaction

Case-only design isn’t always valid Chromosomal proximity Multiple ethnicities in case sample Gene AGene B Gene AGene B stratification

Epistasis: LD in cases ≠ LD in controls

Cases (Scz) Controls Genes in 5q GABA cluster Pamela Sklar Tracey Petryshen C&M Pato Pamela Sklar Tracey Petryshen C&M Pato

TDT requires independence assumption aa Aa aa aa Aa Aa AA Aa Aa AA Aa AA Stratify for bb probandsStratify for BB probands →100% →0% →100% If variants A and B are in LD (common haplotypes AB / ab) → false positive interactions (due to linkage or population stratification)

An “all pairs of SNPs” approach to epistasis does not scale well # SNPs# pairs , , , ,999,750,000

Multiple testing increases false positives Number of independent tests performed P(at least 1 false positive) per test false positive rate 0.05 per test false positive rate = 0.05/50

Tests for interaction have low power Increasing sample N Statistical power Epistasis test Standard association test

DTNBP1 & 7 other genes encode proteins that make up the BLOC1 protein complex –biogenesis of lysosome-related organelles complex 1 DTNBP1’s effect on Scz mediated via BLOC1? –if so, an analysis including all 8 genes might help to resolve inconsistent studies Dysbindin-1 (DTNBP1) & schizophrenia Derek Morris Aiden Corvin Michael Gill Derek Morris Aiden Corvin Michael Gill

DTNBP1 association studies rs P1328P1333 rs P1287 rs P1655P1635 rs P1325 rs P1765P1757P1320P1763P1578P1792P1795P1583 rs rs AAT GGC CCC GCAATCC ACATT TGTCA CA CAT CATCTC GG GG Exons Straub et al. (2002) SNPs Schwab et al. (2003) Van den Oord et al. (2003) Van den Bogaert et al. (2003) Tang et al. (2003) Kirov et al. (2004) Williams et al. (2004) Funke et al. (2004) Numakawa et al. (2004) Li et al. (2005)

Types of interaction G+ G- G+ G- G+ G- Direction of effectPresence of effectMagnitude of effect

Duplicate gene action Example: Kernel Color in Wheat Only 1 dominant allele required, either A or B A_B_Normal A_bbNormal aaB_Normal aabbNo product AAAaaa BB Bb bb 

Complementary gene action Example: Flower color in sweet pea One recessive genotype at either gene would increase disease risk i.e. genes A and B required A_B_Normal A_bbNo product aaB_No product aabbNo product AAAaaa BB Bb bb  

AAAaaa BB Bb bb                     Complementary gene action Duplicate gene action  Heterogeneity model “Checkerboard” model

Negative feedback: a common biological mechanism

-/-+/-+/+ -/- +/- +/+ Negative feedback: simple model of dysregulation

-/-+/-+/+ -/- +/- +/+ Frequency of one locus (other locus fixed p=0.4) Single marker relative risk Negative feedback: single marker analysis leads to the “opposite allele” problem

Standard single SNP analyses DTNBP1 MUTEDPLDNSNAPAPCNO BLOC1S1BLOC1S2 BLOC1S3 -log10(p-value) p=0.05 Dysbindin-1 by itself shows no evidence of association with Scz 373 Irish schizophrenics 812 controls

A B C D E F G H I J A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 ……. J 6 J 7 J 8 A single gene-based test 80 allele-based tests

MUTED genotype DTNBP1 Odds ratio An independent replication? DTNBP1  MUTED epistasis (Straub et al. WCPG meeting Oct 2005.) An independent replication? DTNBP1  MUTED epistasis (Straub et al. WCPG meeting Oct 2005.) DTNBP1 MUTED BLOC1S2 CNO PLDN SNAPAP BLOC1S1 BLOC1S3 Known protein interactions in BLOC-1 complex Gene-based p = Correcting for multiple tests, p = Gene-based p = Correcting for multiple tests, p = 0.025

DTNBP1 & MUTED DTNBP1 × MUTED gene-based test p = corrected, p=0.025 Most significant DTNBP1 × MUTED allele-based result: (rs × rs ) Single markerJoint DTNBP11.02 (0.794)0.77 (0.07) MUTED0.93 (0.549)0.93 (0.54) INTERACTIONn/a1.54 (0.009) Odds Ratio (nominal p-value)

Methylenetetrahydrofolate reductase (MTHFR) polymorphisms and serum folate interact to influence negative symptoms and cognitive impairment in schizophrenia Joshua Roffman, Donald Goff, et al Folic acid deficiency may contribute to negative symptoms and cognitive impairment in schizophrenia –underlying mechanism remains uncertain A cohort of 159 outpatients with schizophrenia measured: –negative symptoms –frontal lobe deficits

PANSS Negative Symptoms C/C & C/T T/T C/C & C/T T/T Verbal Fluency C/C & C/TT/T WCST % Perseverative Errors Interaction of low serum folic acid and homozygosity for the MTHFR 677T allele confers risk. Patients homozygous for the MTHFR 677T allele may therefore benefit specifically from folic acid supplementation.

Further reading Cordell HJ (2002) Human Molecular Genetics 11: –a statistical review of epistasis, methods and definitions Clayton D & McKeigue P (2001) The Lancet, 358, –a critical appraisal of GxE research Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics, 37, –epistasis in whole-genome association studies