Beyond GWAS
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmacogenetics, Phamacogenomics
Multiple testing Recall we are testing ~1 Million markers, more or less Several strategies to adjust the p-values for doing so many tests – Bonferroni – False Discovery Rate (FDR) – Permutation
Multiple testing - Bonferroni Bonferroni adjustment – 0.05/{# tests, i.e., # markers, M} – most widely used in practice – Pr(Reject any test | null hypothesis true) = 0.05
Multiple testing - FDR False Discovery Rate (FDR) limits the expected number of false positives Less stringent control than Bonferroni, e.g. “Another way to look at the difference is that a p-value of 0.05 implies that 5% of all tests will result in false positives. An FDR adjusted p-value (or q-value) of 0.05 implies that 5% of significant tests will result in false positives. The latter is clearly a far smaller quantity.” values.aspx (Your textbook)
Multiple testing - Permutation Many of the tested genotype markers are correlated with each other (in LD), and so the tests are correlated Bonferroni adjusts as if they were completely independent Permutation will be more powerful, but… [max(T) in plink, --mperm]
Summary: Multiple testing Most people just use Bonferroni correction Other methods more powerful (and people have reasonable arguments for them) Nan Laird comments (text for the course) “Given the many false positive findings in the history of genetic association studies, one rather errs on being too conservative.” – Initial GWAS had a lot of false positives (recall, replication, replication, replication...)
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmocogenetics, Phamacogenomics
Gene environment interaction ● Need strong initial hypothesis about the environment ● e.g., Chronic Obstructive Pulmonary Disease (COPD) and smoking (DeMeo et al., AJHG 2006, SERPINE2 gene) ● Environmental exposures can be difficult to characterize (e.g., pollution)
Gene-Environment Interaction Example – Phenylketoneuria (PKU) (Gene) (Environment)
Gene-Environment Interaction Odds Ratio (OR) ah / bg ch / dg eh / fg 1 ● OR Interaction = OR G+E+ / OR G+E- OR G-E+ ● If OR Interaction = 1, multiplicative effects ● Example: OR Interaction = 15 / 5 x 3 = 1
Example 2: Factor V Leiden Mutations, Oral Contraceptive Use, and Venous Thrombosis OR G+E+: 34.7 G+E-: 6.9 G-E+: 3.7 G-E-: Reference Total Vanderbroucke et al., The Lancet 1994 OR Interaction = OR G+E+ / OR G+E- OR G- E+ = 34.7 / 6.9 x 3.7 = 1.4
Testing for GxE in regression logit{P(Y=1|g,E)}= 0 + g X(g)+ e E+ ge X(g)E E could also be continuous, as could Y (then linear regression instead of logistic)... Tricky! - Scale dependent – Continuous environmental exposure - What if we modeled E differently, i.e. log(E) or added in E 2, etc.? Also can adjust for E 2, E 3 to make sure an interaction. – Can model X(g)=(I g=AA, I g=AB ) Tricky! Statistical interaction biological interaction
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmocogenetics, Phamacogenomics
Gene-gene interaction Similar to gene-environment interaction, in terms of scale, etc. Also called epistasis
Gene-gene interaction P(Y=1|g 1,g 2 )= 0 + 1 X(g 1 ) + 2 X(g 2 ) + 12 X(g 1 ) X(g 2 ) Usually test when g 1 is from one gene, and g 2 from another gene OR from a GWAS, take the hits Feasible to do all pairwise: plink: --fast-epistasis – “4.5 billion two-locus tests generated from a 100K data set took just over 24 hours to run” (
Gene-Gene Interaction Models Marchini et al. Nature Genetics 2005
Example: GWAS of Psoriasis Strange et al. Nature Genetics 2010 Take the hits, and follow up on gene-gene interaction test --(nextslide)-->
Gene-Gene Interaction Strange et al. Nature Genetics 2010 Only example I am currently aware of where took GWAS hits and found something when looking for interactions.
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmocogenetics, Phamacogenomics
Minor Allele Frequency (MAF) for Rare variants “Common”: MAF > 0.05 “Less common”: 0.05>MAF>0.01 “Rare”: 0.01<MAF SNP: MAF>0.01 (Single Nucleotide Polymorphism) SNV: MAF<0.01 (Single Nucleotide Variant)
Rare variants Previous GWAS focused on chips designed for MAF > 0.05 (most powered for MAF > 0.10) Sequencing (de novo) Exome arrays How do we analyze them?
Analysis of rare variants Still an open area of research: One-at-a-time analysis Multi-marker tests Cohort Allelic Sums Test (CAST) Combined multivariate and collapsing (CMC) More flexible methods...
One-at-a-time analysis Standard univariate test we’ve been talking about Univariate analysis will have low power unless a very large sample size Nejentsev et al., Science 2009 MAF = ( ) / [ *( )] =
Standard Multi-marker tests Evaluate multiple rare variants simultaneously in a single model logit(P(Y=1|X))= + x 1 + x 2 +…+ x M H 0 : =0 Standard approach (likelihood ratio, score test) may have difficulty fitting the model due to sparse data (e.g., singleton SNP in case OR?) (Recap: one of the approaches we brought up last time to analyze groups of common variants also)
Cohort Allelic Sums Test (CAST) Collapsing method: group rare variants, e.g., within a gene Assumes same effect size of each variant in a group, logit(P(Y=1|X))= + { k=1,…,M x k } – Like regressing count of number of minor alleles across multiple loci Cohen et al., Science 2004; Morgenthaler Mut Res 2007 >95%
Combined multivariate and Collapsing (CMC) Test rare and common togther? Only rare? Only common? Combines the previous two approaches, but simultaneously models rare and common variants Rare variants collapsed together per MAF, and treated as a single variant logit(P(Y=1|X))= + k=common variants} k x k + rare { k=1,…,M x k }
Other rare variant approaches Many, many other rare variants methods out there Different assumptions (or lack there of) on how rare variants effect disease, e.g., how smoothed together, prior knowledge,… A common approach with less assumptions is SKAT, a more flexible multivariate test (Wu et al., AJHG, 2011)
Summary: Rare variants Need to aggregate rare variants for increased efficiency Difficult to choose aggregation a priori, more data-driven approaches may be more useful
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmocogenetics, Phamacogenomics
What is Pharmacogenetics? The study of the role of inheritance in the individual variation in drug response. Efficacy Toxicity
Phillips et al. JAMA 2001 Adverse Drug Reactions are common
Pharmacodynamics How a drug acts Drug target
Pharmacokinetics How a drug is processed ADME o Absorption o Distribution o Metabolism o Excretion Drug Levels (dosage) o Efficacy o Toxicity
Measure drug levels in the body Plasma concentration Metabolic Ratio o Compare blood vs. urine o Can be measured over time
Example: TPMT ● TMPT gene: Thiopurine methythyltransferase gene ● TPMT controls metabolism of the thiopurine drugs azathioprine, 6-mercaptopurine, and 6- thioguanine ● Chemotherapeutic agents and immunosuppresive drugs sensitivity and toxicity altered by variant
Standard TPMT Dosing
Standard Dosing: Drug Exposure and Toxicity
Genotype Specific TPMT Dosing
Genotype Specific: Drug Exposure and Toxicity
Outline Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmacogenetics, Phamacogenomics
Outline Gene-Environment Interaction Gene-Gene Interaction Pharmacogenetics Pharmacogenomics
What is Pharmacogenomics and how is it different from Pharmacogenetics? Genomic scale Array based platforms
Pharmacogenomics Evans and Relling Nature 2004
Challenges for Pharmacogenomics How predictive is a test? Does the test apply to all groups? Is a test superior to current clinical practice? Will testing improve outcomes? Is testing cost effective?