Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012.

Similar presentations


Presentation on theme: "Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012."— Presentation transcript:

1 Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012

2 Definitions  Genetic Epidemiology  ‘a science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations’ - Morton, 1982  Molecular Epidemiology (www.aacr.org)  seeks to identify human (cancer) risk and (carcinogenic) mechanisms to improve (cancer) prevention strategies  is multi-disciplinary and translational, going from the bench to the field and back  uses biomarkers and state-of-art technologies to gain mechanistic information from epidemiological studies

3 Genetic and Molecular Epidemiology Genetic variation Disease Exposure Biological Factors/ Mechanism Association?

4 Genetic Studies

5 Twin studies  Determine if a disease has a genetic component  Estimate the genetic contribution to disease (heritability)  Genetics (heritable component)  Shared environment  Unique environment  Twins  Monozygotic (MZ) share 100% of their genes  Dyzygotic (DZ) share ~50% of their genes  Use correlation of trait/disease  R MZ = genetics + shared environment  R DZ = ½ genetics + shared environment  Genetics = 2 x (R MZ – R DZ )

6 Heritability Lichtenstein et al, 2000

7 Association studies  Family based  Parent-child trios, siblings  Population based  Case-control  Types of studies  Candidate gene/SNPs  Genome-wide association study (GWAS)  Single nucleotide polymorphisms (SNPs) vs. mutations/rare variants  Germline variation  SNPs > 1% population frequency A/A A/C casescontrols

8 Samples  Blood  DNA, RNA, biomarkers (dietary, hormones)  Tissue  Tumor and normal  DNA, RNA, proteins

9 Candidate genes  Select a gene of interest  Select SNPs to genotype  Literature  tagSNPs  Haplotype tagSNPs CGAACG CGAACG CGACCG CTACCA CTACCA G/TA/CG/A CGAACG CGAACG CGACCG CTACCA CTACCA G/TA/CG/A 1 2 3 4 5

10 Candidate genes  The International HapMap Project  Catalog of common genetic variants  Describes what these variants are, where they occur, and how they are distributed among people within populations and among populations

11  www.hapmap.org www.hapmap.org  Haploview – visualize correlations between SNPs in HapMap or study data  Tagger – method to select tagSNPs in HapMap or study data Candidate genes

12  Are the SNPs associated with outcome?  Are the SNPs associated with intermediate phenotypes/biomarkers/tumor markers? Candidate genes

13 Genotyping technology  Taqman  PCR-based fluorescent assay  Single SNP assay  Sequenom  PCR-based single-base extension  MALDI-TOF (Matrix-Assisted Laser Desorption/Ionization – Time Of Flight)  Multi-plex (≤36-40 SNPs) assay

14 Genome-wide Association Study (GWAS)  Estimated 10 million SNPs in the genome  Genotype 350k – 1 million SNPs across entire genome  Test association of each SNP with outcome  Adjust for the number of tests performed  p < 5x10 -8 considered “genome-wide” significant  Replicate findings in a different population  Same SNP, same direction, approximate same magnitude of effect

15 GWAS results Amundadottir et al, 2009

16 Published Genome-Wide Associations through 6/2010, 904 published GWA at p<5x10 -8 for 165 traits NHGRI GWA Catalog www.genome.gov/GWAStudies

17 Genotyping technology  Illumina  1 million SNP chip  tagSNPs selected from HapMap data  Affymetrix  1 million SNP chip  Selected based on distance http://www.illumina.com/Documents/products/technotes/ technote_intelligent_snp_selection.pdf

18 Whole Genome Sequencing  Human Genome Project  First genome sequenced in 2000; project completed 2003  1000 Genomes Project  Goal: to create a complete and detailed catalogue of human genetic variation  Knome (founded by George Church and Harvard University)  knomeDiscovery – sequencing (30x) and interpretation for ~$5,000  The Personal Genome  Interpretation (counseling?)  Screening?  High-risk groups?  Drug efficacy?  May help individuals alter behavior – but for now, we can’t do anything about our genes!

19 Bias in Genetic Studies

20 Genetic polymorphismDisease ??? CONFOUNDING

21 Bias in Genetic Studies Genetic polymorphismDisease Race/Ethnicity CONFOUNDING

22 Population Stratification  Example:  Prostate cancer is more common in African Americans than in Caucasians  Frequency of many SNPs is different in African American and Caucasian populations  If we ignored race/ethnicity, what might happen in our study?

23 Population Stratification Figure 1. The effects of population structure at a SNP locus. If the study population consists of subpopulations that differ genetically, and if disease prevalence also differs across these subpopulations, then the proportions of cases and controls sampled from each subpopulation will tend to differ, as will allele or genotype frequencies between cases and controls at any locus at which the subpopulations differ. The figure shows an example of this scenario with two populations in which the cases have an excess of individuals from population 2 and population 2 has a lower frequency of allele A than population 1. In this example, the structure mimics the signal of association in that there is a significant difference in allele and genotype frequencies between cases and controls. Marchini, 2004 Caucasian African American

24 Adjusting for Ethnicity  Defining & measuring ethnicity  Self-report  Ancestry (where are you grandparents from?)  Genotype many (hundreds) “ancestry informative markers”  Control for ethnicity  In design  Restrict to one ethnicity  Match on ethnicity  In analysis  Stratify by ethnicity  Include ethnicity in regression model

25 Misclassification  Non-differential  Of exposure: the degree of misclassification is the same according to disease status  Likelihood that exposure is wrong is similar among those who do and do not develop disease  Differential  Of exposure: The degree of misclassification varies according to the disease status

26 Misclassification  Laboratory tests do not always work perfectly – some % of samples may fail genotyping  Missing or incorrect exposure information  Non-differential or differential misclassification?  What can we do to ensure that the misclassification is non- differential?

27 Gene x Environment Interaction: An Example of Effect Modification Given equal exposure to the same risk factor, individuals may have different risk of disease depending on their genetic background  The effect of an exposure on a disease outcome is modified by genotype

28 Gene-environment interaction D+D- E+4020 E-8040 D+D- E+6080 E-2060 D+D- E+100 E-100 OR = 1 AA genotypeAT/TT genotype OR = 1 OR = 2.25 Stratify on genotype

29 Effect Modification is Biological DNA damage Lung Cancer CYP1A1 GSTM1 Metabolism

30 GWAS follow-up

31 -Dozens of GWAS for many diseases have now been performed -Thousands of samples and hundreds of thousands of SNPs -Replication is necessary to determine which significant results are real -Once we know the results are real, then what??? Eeles RA et al. (2008)

32 GWAS follow-up  Risk prediction model development  Understand biological function  candidate genes/regions!  Some associated SNPs are not in gene regions  Many types of biological data and techniques can be employed to determine the function of the risk SNPs  Fine mapping  Expression (RNA and protein)  Enhancer activity

33 GWAS follow-up – 8q24 story Ghoussaini et al. A) Haploview output of the 1.18-Mb 8q24 "desert" showing the five cancer-specific regions reported to date

34 GWAS follow-up – 8q24 story Pomerantz et al, 2009 8q24 variation not associated with MYC mRNA expression in prostate tumor or normal tissue

35 (a) ChIP assay on Colo205, demonstrating a pattern consistent with enhancer activity. (b) Luciferase reporter assay demonstrating enhancer activity in two CRC lines. Error bars denote one standard deviation from the mean of replicate assays. (c) Representative luciferase assay showing increased enhancer activity of G over T alleles, performed on a total of 18 clones (nine G and nine T over 3 d) (P = 0.024). Error bars denote one standard deviation from the mean of assays performed in triplicate. (d) Mass spectrometry plots from Sequenom analysis showing preferential binding of TCF7L2 to risk allele (G) in immunoprecipitated DNA, as evidenced by differential peak heights (right panel) compared to control input DNA (left panel) (P = 1.1 10 -5 ). GWAS follow-up – 8q24 story Pomerantz et al, 2009

36 GWAS follow-up (and beyond) GWAS results mRNA expression

37 Thank you! Questions?


Download ppt "Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012."

Similar presentations


Ads by Google