Molecular & Genetic Epi 217 Association Studies: Indirect John Witte
Homework, Question 4: Haplotypes IDMTHFR_C677TMTHFR_A1298CHaplotypes? 959CCAAC-A / C-A 1044CCACC-A / C-C 147CTAAC-A / T-A 123CTACC-A / T-C or C-C / T-A Genotypes 677TT and 1298CC never observed together: Suggests most Probable haplotype, and potential selection or chance. Rare variants: not necessarily lethal, especially those that are associated with late onset diseases.
3 SNPs in the TAS2R38 Gene P AV AVIAVI P A I A AV P V I P VV A A I A VV
TASR: 3 SNPs form Haplotypes PAVPAV AVIAVI Taster Non-taster
TAS2R38 Haplotype Function
IDTaster rs rs rs HaplotypesAmino Acid 100CTAGCGCGG*/TACPAV/AVI 121CTAGCGCGG*/TACPAV/AVI CCGG CGG/CGGPAV/PAV 191CTAGCGCGG*/TACPAV/AVI 201CTAGCGCGG*/TACPAV/AVI 22.TTAACCTAC/TACAVI/AVI 241CCGG CGG/CGGPAV/PAV 26.CTAGCGCGG*/TACPAV/AVI 281CTAGCGCGG*/TACPAV/AVI 291CCGGCGCGG/CGCPAV/PAI 300TTAACCTAC/TACAVI/AVI 311CCGG CGG/CGGPAV/PAV TASR Genotyping Results
Too many MTHFR SNPs Solution: Tag SNP Selection SNPs are correlated (aka Linkage Disequilibrium) Carlson et al. (2004) AJHG 74:106 high r 2 AAAA TTTT G C C G A CCCCCC G C C G T CCCCCC GGGG AAAA A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 Pairwise Tagging: SNP 1 SNP 3 SNP 6 3 tags in total Test for association: SNP 1 SNP 3 SNP 6
Coverage: Measurement Error in TagSNPs
Common Measures of Coverage Threshold Measures –e.g., 73% of SNPs in the complete set are in LD with at least one SNP in the genotyping set at r 2 > 0.8 Average Measures –e.g., Average maximum r 2 = 0.84
Coverage and Sample Size Sample size required for Direct Association, n Sample size for Indirect Association n* = n/ r 2 For r 2 = 0.8, increase is 25% For r 2 = 0.5, increase is 100%
Tag SNPs Database Resources
HapMap Re-sequencing to discover millions of additional SNPs; deposited to dbSNP. SNPs from dbSNP were genotyped Looked for 1 SNP every 5kb SNP Validation –Polymorphic –Frequency Haplotype and Linkage Disequilibrium Estimation –LD tagging SNPs
HapMap Phase III Populations ASW African ancestry in Southwest USA CEU Utah residents with Northern and Western European ancestry from the CEPH collection CHB Han Chinese in Beijing, China CHD Chinese in Metropolitan Denver, Colorado GIH Gujarati Indians in Houston, Texas JPT Japanese in Tokyo, Japan LWK Luhya in Webuye, Kenya MEX Mexican ancestry in Los Angeles, California MKK Maasai in Kinyawa, Kenya TSI Toscani in Italia YRI Yoruba in Ibadan, Nigeria
Tag SNPs: HapMap
Tag SNPs: HapMap & Haploview
Tag SNPs: HapMap & Haploview
Identified 33 common MTHR SNPs (MAF > 5%) among Caucasians Forced in 3 potentially functional/previously associated SNPs Identified tag based on pairwise tagging 15 tags SNPs could capture all 33 MTHR SNPs (mean r2 = 97%) Note: number of SNPs required varies from gene to gene and from population to population Tag SNPs: HapMap Summary
1K Genomes Project
Genome-wide Assocation Studies (GWAS)
1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2 samples markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs
SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design
Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power
Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!
Pathways Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: May be interested in potential joint and/or interaction effects of multiple genes in one pathway.
Interactions “The interdependent operation of two or more causes to produce or prevent an effect” “Differences in the effects of one or more factors according to the level of the remaining factor(s)” Last, 2001 AAAaaa BBAt risk No risk BbAt risk No risk bbNo risk
Why look for interactions? Improve detection of genetic (& environmental) risks. Understand etiology/biology New hypotheses? Diagnostics Prevention and interventions
Dilution of effects OR= Drinker? Micronutrient X Environmental exposure Y Gene A Other gene Z Within particular subgroups, effect of gene may be quite high or low
Statistical vs. Biological Interactions Not identical. One hypothesizes biological interaction But ‘tests’ for statistical interaction Does statistical evidence support our biological hypothesis?
Multiplicative vs. Additive Interactions gG e E gG e E gG e E Multiplicative “effect” (ORs, RRs) Multiplicative interaction (ORs, RRs) 2.8/ /1.0 = = / /1.0 = = 2.8 Departure from =1 is a multiplicative interaction Additive “effect” RER = (OR(E,G)-1)/((OR(E,g)-1)+(OR(e,G)-1)) = (2.4-1)/((2.0-1)+(1.4-1)) = 1.0 RER = relative excess risk
Brennan, P. Carcinogenesis : Two possible causal pathways: additive and multiplicative interaction for colorectal cancer Additive interaction: G1 and E5: independent risk factors Multiplicative interaction: G2 and E2: work through same pathway If factors are not known to act independently, use multiplicative.
Analysis of Multiple Genes Joint / Additive Multiplicative Increasing complexity
More Complex Modeling Multifactor-dimensionality reduction –(Moore & Williams, Ann Med 2002) Logic regression –(Kooperberg & Ruczinski, Genetic Epi 2005) Multi-loci analysis –(Marchini, Donnelly, Cardon, Nat Genet 2005) Bayesian epistasis association mapping –(Zhang & Liu, Nat Genet 2007)