Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA Epidemiology 244: Cancer Epidemiology Methods
Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization
Central dogma DNA Protein mRNA A T C G
What are SNPs? More than 99% of all nucleotides are the same in all humans 1% of nucleotides are polymorphic SNPs>> insertions-deletions Bi-nucleotide – T (80%) A (20%) Where do SNPs occur? Exons Introns Flanking regions
? T ? G ? A A T G G A A T T C G T A What are haplotypes? A haplotype is the pattern of nucleotides on a single chromosome Two “copies” of each chromosome The haplotype inference problem TA TT CG GG TA AA
What is linkage disequilibrium? Linkage disequilibrium (LD) describes the non- random association of nucleotides on the same chromosome in a population One nucleotide at one position (locus) predicts the occurrence of another nucleotide at another locus No LD LD
Disease Susceptibility Locus Disease Phenotype Test for genetic association between the phenotype and the DSL Marker loci (SNPs) LD Test for association between phenotype and marker loci What are markers? Candidate gene
What are tagSNPs? TagSNPs are a subset of all SNPs in a gene that mark groups of SNPs in LD Avoids redundant genotyping Disease Susceptibility Locus LD Marker loci (SNPs)
The joint effect of tagSNPs in cytokine genes and cigarette smoking in cervical cancer risk
T-cell proliferation IL-2gene IFNγgene Activated T-cell Proliferation of TH1-cells IL-2 IFNγ IL-2 receptor IL-2gene Activated T-cell IL-2 -2gene Activated T-cell IL-2 -2 receptor IL-2 receptor IFNγgene IFNγ γgene IFNγ
Background Cigarette smoking ↑ 1.5- to 3-fold cancer risk Cigarette smoking ↓ levels of IL-2 and IFNγ (cervical and circulating) ↓ levels of IL-2 and IFNγ HPV persistence in the cervix Cervical neoplasia Decreased survival from invasive cervical cancer
Cigarette smoking HPV-associated squamous cell cervical cancer Model SNPs in IL-2, IL-2R, and IFNG
Study design Population-based case-only study Subjects 308 Caucasian squamous cell cervical cancer cases diagnosed Residing in 3 western Washington counties Data collection Structured in–person interviews DNA isolated from buffy coats Methods
Multi-stage tagSNP design Re-sequence panel, identify SNPs (many markers, few subjects) Choose tagSNPs Genotype tagSNPs in main study (few markers, many subjects) Select reference panel
A sample of your study population Most representative Samples from the Coriell Repository Ability to integrate your data with other resources 1. Select reference panel = Candidate gene SNPs= HapMap SNPs
2. Re-sequence reference panel PolyPhred Phred Phrap (Nickerson, 1997) (Ewing, 1998) Amplify and Sequence DNA Gene
Alternatives to re-sequencing Program for Genomic Applications (PGA) SeattleSNPs – inflammation NIEHS SNPs – environmental response Innate Immunity International HapMap Project 5 million SNPs in four ethnically distinct populations
3. Choose tagSNPs Option LDSelect (Carlson, 2002) Tagger (de Bakker, 2005) r 2 thresholdYes SNP exclusions/inclusionsNoYes SNP design scoreNoYes
LDSelect output for IL-2 SeattleSNPs, r 2 ≥0.80, MAF ≥0.05, Caucasian Bin Total Number of Sites TagSNPs 12 rs rs rs rs rs rs rs
Exons (cSNPs) SIFT (Ng, 2002) PolyPhen (Ramensky, 2002) Upstream flanking region Intron-exon junctions Genomic context
Sequence conservation Repeat regionUnique region UCSC Genome Browser, PhasCons (Siepel, 2005) Score
TagSNP summary Efficient yet comprehensive coverage of the genetic variation in our candidate genes Reduce costs Preference should be given to putatively functional variants: Literature, gene context, sequence conservation
Thanks for your attention! Questions?