Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM skhussain@ucla.edu Epidemiology 243: Molecular.

Similar presentations


Presentation on theme: "Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM skhussain@ucla.edu Epidemiology 243: Molecular."— Presentation transcript:

1 Consideration for Planning a Candidate Gene Association Study With TagSNPs
Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular Epidemiology

2 Objectives Molecular genetics primer
Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

3 Central dogma A T C G DNA mRNA Protein

4 What are SNPs? More than 99% of all nucleotides are the same in all humans 1% of nucleotides are polymorphic SNPs>> insertions-deletions Bi-nucleotide – T (80%) A (20%) Where do SNPs occur? Exons Introns Flanking regions

5 What are haplotypes? A haplotype is the pattern of nucleotides on a single chromosome Two “copies” of each chromosome The haplotype inference problem T T C G T A ? T ? G ? A TA TT CG GG TA AA ? T ? G ? A A T G G A A

6 What is linkage disequilibrium?
Linkage disequilibrium (LD) describes the non-random association of nucleotides on the same chromosome in a population One nucleotide at one position (locus) predicts the occurrence of another nucleotide at another locus No LD LD Another closely related concept is Linkage Disequilibrium The technical definition for LD is as follows: blah It is a population measure, so it is not something that is unique to an individual Describe figures: Here is an example where we have no LD We have 4 chromosomes indicated by these blue lines Lets assume we have two SNPs, one here and one here The variant, or minor allele of the SNP is indicated by either a purple dot at position 1 or a red dot at position 2 In this example, we see four potential scenarios, which occur at equal frequencies, which indicated that we have no LD In this next example, we have high LD, because when the variant allele of position 1 is present, so is the variant allele of position 2

7 What are markers? Disease Phenotype
Test for association between phenotype and marker loci Test for genetic association between the phenotype and the DSL LD Candidate gene Marker loci (SNPs) Disease Susceptibility Locus

8 Disease Susceptibility Locus
What are tagSNPs? TagSNPs are a subset of all SNPs in a gene that mark groups of SNPs in LD Avoids redundant genotyping LD LD Marker loci (SNPs) Disease Susceptibility Locus

9 The joint effect of tagSNPs in cytokine genes and cigarette smoking in cervical cancer risk

10 T-cell proliferation IL - 2 gene IFN γ Activated T cell Proliferation
of TH1 cells receptor Proliferation of TH1 - cells IL IL - - 2 2 IL IL - - 2 2 gene gene IL - 2 receptor IFN γ gene Activated T Activated T - - cell cell

11 Background Cigarette smoking ↑ 1.5- to 3-fold cancer risk
Cigarette smoking ↓ levels of IL-2 and IFNγ (cervical and circulating) ↓ levels of IL-2 and IFNγ HPV persistence in the cervix Cervical neoplasia Decreased survival from invasive cervical cancer

12 Model Cigarette smoking HPV-associated squamous cell cervical cancer
SNPs in IL-2, IL-2R, and IFNG

13 Methods Study design Subjects Data collection
Population-based case-only study Subjects 308 Caucasian squamous cell cervical cancer cases diagnosed Residing in 3 western Washington counties Data collection Structured in–person interviews DNA isolated from buffy coats

14 Objectives Molecular genetics primer
Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

15 Multi-stage tagSNP design
Select reference panel Re-sequence panel, identify SNPs (many markers, few subjects) Choose tagSNPs Genotype tagSNPs in main study (few markers, many subjects)

16 1. Select reference panel
Definition A sample of your study population Most representative Samples from the Coriell Repository Ability to integrate your data with other resources = Candidate gene SNPs = HapMap SNPs

17 2. Re-sequence reference panel
Amplify and Sequence DNA Gene PolyPhred Phred Phrap (Nickerson, 1997) (Ewing, 1998)

18 Alternatives to re-sequencing
Program for Genomic Applications (PGA) SeattleSNPs – inflammation NIEHS SNPs – environmental response Innate Immunity International HapMap Project 5 million SNPs in four ethnically distinct populations

19 3. Choose tagSNPs (LD) Option LDSelect Tagger r2 threshold (0.80) Yes
(Carlson, 2002) Tagger (de Bakker, 2005) r2 threshold (0.80) Yes SNP exclusions/inclusions No SNP design score

20 LDSelect output for IL-2 SeattleSNPs, r2≥0.80, MAF ≥0.05, Caucasians
Bin Total Number of Sites TagSNPs 1 2 rs rs rs rs 3 rs rs 4 rs

21 Genomic context Exons (cSNPs) Upstream flanking region
SIFT (Ng, 2002) PolyPhen (Ramensky, 2002) Upstream flanking region Intron-exon junctions

22 Sequence conservation
UCSC Genome Browser, PhasCons (Siepel, 2005) Score Repeat region Unique region

23 Objectives Molecular genetics primer
Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

24 Minor allele frequency and genetic model
300 cases, 300 controls, alpha=0.05

25 Sample size requirement
LD SNPs genotyped SNPs not genotyped r2 Sample size requirement S1 and S2 - 600 S1 S2 1.00 0.85 706 S1 S2 N/r2 (Pritchard, 2001)

26 Genotype error Generally non-differential
Reduces your power Every 1% increase in genotyping error rates requires sample size increased by 2-8% (Zou et al, 2004, Genetic Epidemiology) Depends on error model

27 Power calculators Quanto htPowercc
G, E, G X E, G X G Case-control, case-sibling, case-parent, and case-only designs Quantitative or binary outcome htPowercc r2 Power for Association With Error (PAWE) Genotyping errors

28 TagSNP summary Efficient yet comprehensive coverage of the genetic variation in our candidate genes Reduce costs Preference should be given to putatively functional variants: Literature, gene context, sequence conservation Influences of statistical power: MAF, genetic model, LD, and genotyping error

29 Programs for Genomic Applications
SeattleSNPs, NIEHS, Innate Immunity, International HapMap, Coriell cell repository, cSNP predictive analysis: SIFT, PolyPhen, Vista, The following programs can be found at the Rockefeller site, Tagger LDSelect PAWE Quanto


Download ppt "Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM skhussain@ucla.edu Epidemiology 243: Molecular."

Similar presentations


Ads by Google