Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM skhussain@ucla.edu Epidemiology 243: Molecular.

Slides:



Advertisements
Similar presentations
What is an association study? Define linkage disequilibrium
Advertisements

Association Tests for Rare Variants Using Sequence Data
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Outline to SNP bioinformatics lecture
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Genome-Wide Association Study (GWAS)
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
The International Consortium. The International HapMap Project.
Copyright OpenHelix. No use or reproduction without express written consent1.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
Genomic Analysis: GWAS
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
High level GWAS analysis
Power to detect QTL Association
By Michael Fraczek and Caden Boyer
Polymorphisms in the H19 Gene and the Risk of Bladder Cancer
Lipopolysaccharide binding protein promoter variants influence the risk for Gram-negative bacteremia and mortality after allogeneic hematopoietic cell.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Exercise: Effect of the IL6R gene on IL-6R concentration
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Medical genomics BI420 Department of Biology, Boston College
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
Medical genomics BI420 Department of Biology, Boston College
A modest but significant effect of CGB5 gene promoter polymorphisms in modulating the risk of recurrent miscarriage  Kristiina Rull, M.D., Ph.D., Ole.
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium  Christopher S. Carlson,
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The same gene can have many versions.
Regional plot and genome browser view of 14q13.
Development of a Novel Next-Generation Sequencing Assay for Carrier Screening in Old Order Amish and Mennonite Populations of Pennsylvania  Erin L. Crowgey,
Presentation transcript:

Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM skhussain@ucla.edu Epidemiology 243: Molecular Epidemiology

Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

Central dogma A T C G DNA mRNA Protein

What are SNPs? More than 99% of all nucleotides are the same in all humans 1% of nucleotides are polymorphic SNPs>> insertions-deletions Bi-nucleotide – T (80%) A (20%) Where do SNPs occur? Exons Introns Flanking regions

What are haplotypes? A haplotype is the pattern of nucleotides on a single chromosome Two “copies” of each chromosome The haplotype inference problem T T C G T A ? T ? G ? A TA TT CG GG TA AA ? T ? G ? A A T G G A A

What is linkage disequilibrium? Linkage disequilibrium (LD) describes the non-random association of nucleotides on the same chromosome in a population One nucleotide at one position (locus) predicts the occurrence of another nucleotide at another locus No LD LD Another closely related concept is Linkage Disequilibrium The technical definition for LD is as follows: blah It is a population measure, so it is not something that is unique to an individual Describe figures: Here is an example where we have no LD We have 4 chromosomes indicated by these blue lines Lets assume we have two SNPs, one here and one here The variant, or minor allele of the SNP is indicated by either a purple dot at position 1 or a red dot at position 2 In this example, we see four potential scenarios, which occur at equal frequencies, which indicated that we have no LD In this next example, we have high LD, because when the variant allele of position 1 is present, so is the variant allele of position 2

What are markers? Disease Phenotype Test for association between phenotype and marker loci Test for genetic association between the phenotype and the DSL LD Candidate gene Marker loci (SNPs) Disease Susceptibility Locus

Disease Susceptibility Locus What are tagSNPs? TagSNPs are a subset of all SNPs in a gene that mark groups of SNPs in LD Avoids redundant genotyping LD LD Marker loci (SNPs) Disease Susceptibility Locus

The joint effect of tagSNPs in cytokine genes and cigarette smoking in cervical cancer risk

T-cell proliferation IL - 2 gene IFN γ Activated T cell Proliferation of TH1 cells receptor Proliferation of TH1 - cells IL IL - - 2 2 IL IL - - 2 2 gene gene IL - 2 receptor IFN γ gene Activated T Activated T - - cell cell

Background Cigarette smoking ↑ 1.5- to 3-fold cancer risk Cigarette smoking ↓ levels of IL-2 and IFNγ (cervical and circulating) ↓ levels of IL-2 and IFNγ HPV persistence in the cervix Cervical neoplasia Decreased survival from invasive cervical cancer

Model Cigarette smoking HPV-associated squamous cell cervical cancer SNPs in IL-2, IL-2R, and IFNG

Methods Study design Subjects Data collection Population-based case-only study Subjects 308 Caucasian squamous cell cervical cancer cases diagnosed 1986-2004 Residing in 3 western Washington counties Data collection Structured in–person interviews DNA isolated from buffy coats

Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

Multi-stage tagSNP design Select reference panel Re-sequence panel, identify SNPs (many markers, few subjects) Choose tagSNPs Genotype tagSNPs in main study (few markers, many subjects)

1. Select reference panel Definition A sample of your study population Most representative Samples from the Coriell Repository Ability to integrate your data with other resources = Candidate gene SNPs = HapMap SNPs

2. Re-sequence reference panel Amplify and Sequence DNA Gene PolyPhred Phred Phrap (Nickerson, 1997) (Ewing, 1998)

Alternatives to re-sequencing Program for Genomic Applications (PGA) SeattleSNPs – inflammation NIEHS SNPs – environmental response Innate Immunity International HapMap Project 5 million SNPs in four ethnically distinct populations

3. Choose tagSNPs (LD) Option LDSelect Tagger r2 threshold (0.80) Yes (Carlson, 2002) Tagger (de Bakker, 2005) r2 threshold (0.80) Yes SNP exclusions/inclusions No SNP design score

LDSelect output for IL-2 SeattleSNPs, r2≥0.80, MAF ≥0.05, Caucasians Bin Total Number of Sites TagSNPs 1 2 rs2069763 rs2069772 rs2069776 rs2069778 3 rs2069777 rs2069779 4 rs2069762

Genomic context Exons (cSNPs) Upstream flanking region SIFT (Ng, 2002) PolyPhen (Ramensky, 2002) Upstream flanking region Intron-exon junctions

Sequence conservation UCSC Genome Browser, PhasCons (Siepel, 2005) Score Repeat region Unique region

Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power

Minor allele frequency and genetic model 300 cases, 300 controls, alpha=0.05

Sample size requirement LD SNPs genotyped SNPs not genotyped r2 Sample size requirement S1 and S2 - 600 S1 S2 1.00 0.85 706 S1 S2 N/r2 (Pritchard, 2001)

Genotype error Generally non-differential Reduces your power Every 1% increase in genotyping error rates requires sample size increased by 2-8% (Zou et al, 2004, Genetic Epidemiology) Depends on error model

Power calculators Quanto htPowercc G, E, G X E, G X G Case-control, case-sibling, case-parent, and case-only designs Quantitative or binary outcome htPowercc r2 Power for Association With Error (PAWE) Genotyping errors

TagSNP summary Efficient yet comprehensive coverage of the genetic variation in our candidate genes Reduce costs Preference should be given to putatively functional variants: Literature, gene context, sequence conservation Influences of statistical power: MAF, genetic model, LD, and genotyping error

Programs for Genomic Applications SeattleSNPs, http://pga.mbt.washington.edu NIEHS, http://egp.gs.washington.edu/ Innate Immunity, http://innateimmunity.net/ International HapMap, http://www.hapmap.org/ Coriell cell repository, www.coriell.org cSNP predictive analysis: SIFT, http://blocks.fhcrc.org/sift/SIFT.html PolyPhen, http://coot.embl.de/PolyPhen Vista, http://genome.lbl.gov/vista/index.shtml The following programs can be found at the Rockefeller site, http://linkage.rockefeller.edu/soft/ Tagger LDSelect PAWE Quanto