SNP Discovery and Genotyping Workshop

Slides:



Advertisements
Similar presentations
What is an association study? Define linkage disequilibrium
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Signatures of Selection
Outline to SNP bioinformatics lecture
SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006.
Medical Resequencing Debbie Nickerson Department of Genome Sciences University of Washington.
Overview of SNP Genotyping Debbie Nickerson Department of Genome Sciences University of Washington
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Overview of SNP Genotyping Debbie Nickerson Department of Genome Sciences University of Washington
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences NIEHS SNPs Workshop Jan 10-11,
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Genetic Variations Lakshmi K Matukumalli. Human – Mouse Comparison.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Single Nucleotide Polymorphisms (SNPs
Next Generation Sequencing
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Of Sea Urchins, Birds and Men
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Ho Kim School of Public Health Seoul National University
Haplotype Diversity across 100 Candidate Genes for Inflammation, Lipid Metabolism, and Blood Pressure Regulation in Two Populations  Dana C. Crawford,
Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium  Christopher S. Carlson,
Presentation transcript:

SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate genes Chris Carlson Identifying haplotypes for genotype-phenotype analysis of candidate genes Dana Crawford SNP genotyping strategies

SNP Discovery and Genotyping Strategies Debbie Nickerson - debnick@u.washington.edu Overview of Variation in the Human Genome SNP Discovery Strategies and Status SNP Data in the PGAs Genotyping SNPs

Total sequence variation in humans Population size: 6x109 (diploid) Mutation rate: 2x10–8 per bp per generation Expected “hits”: 240 for each bp Every variant compatible with life exists in the population BUT: Most are vanishingly rare Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature 409:928 - 933 (2001)

Strategies to Find SNPs Mine them from Existing Genome Resources Targeted SNP Discovery in Candidate Genes Berkeley PGA - http://pga.lbl.gov CardioGenomics - http://www.cardiogenomics.org InnateImmunity - http://innateimmunity.net SeattleSNPs - http://pga.mbt.washington.edu Southwestern - http://pga.swmed.edu

Sequence-based SNP Mining o m i c D N A m R N A B A C l i b r a r y R R S L i b r a r y c D N A L i b r a r y o r S a m p l i n g B A C O v e r l a p S h o t g u n O v e r l a p E S T O v e r l a p S e q u e n c e O v e r l a p S N P d i s c o v e r y G T T T A A A T A A T A C T G A T C A G T T T A A A T A A T A C T G A T C A G T T T A A A T A G T A C T G A T C A G T T T A A A T A G T A C T G A T C A ~ 4.1 Million SNPs Available http://www.ncbi.nlm.gov/SNP/

Mining Finds Only A Small Fraction of the SNPs 1.0 96 48 24 16 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency

Total Estimated SNPs and Fraction in dbSNP mi ni ma l a l le l e exp e c t ed S N Ps exp e c t ed S N P exp e c t ed % in f r equen c y ( mil l i on s ) f r equen c y (bp) da ta ba s e 1% 11.0 290 11-12 5% 7.1 450 15-17 10% 5.3 600 18-20 20% 3.3 960 21-25 30% 2.0 1570 23-27 40% 0.97 3280 24-28 L. Kruglyak and D. Nickerson, Nat Genet 27:234-236 2001

Surfactant B - Locus Link dbSNP (http://www.ncbi.nlm.nih.gov/SNP/)

Surfactant B - dbSNP

Confirmation of SNP Resource in New Sample Potential Pitfalls Confirmed Multiple Method Report in dbSNP Confirmed Unique Method Report in dbSNP 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% BAC RRS EST PCR Other Any Multiple Report BRE Multiple Report

Strategies to Find SNPs Mine them from Existing Resources Targeted SNP Discovery in Candidate Genes Berkeley PGA - http://pga.lbl.gov CardioGenomics - http://www.cardiogenomics.org InnateImmunity - http://innateimmunity.net SeattleSNPs - http://pga.mbt.washington.edu Southwestern - http://pga.swmed.edu

Sequence-based SNP Identification Amplify DNA Sequence Phred Phrap Base-calling Contig assembly 5’ 3’ Sequence each end of the fragment. Quality determination Final quality determination PolyPhred Polymorphism detection ATAGACG ATACACG ATAGACG ATACACG Consed Sequence viewing Polymorphism tagging Analysis Homozygotes Heterozygote Polymorphism reporting Individual genotyping Phylogenetic analysis

Sequence-Based Detection and Genotyping of SNPs Jim Sloan, Tushar Bhangle (PolyPhred) Matthew Stephens, Paul Scheet (Quality Scores for SNPs) Phil Green, Brent Ewing, David Gordon (Phred, Phrap, Consed)

PGA SNPs The PGAs provide a validated SNP resource (Allele Frequency Data) Novel Views of the Variation Data Emerging Pathway Interfaces Color Fasta Formats Gene Structure Views Visual Genotypes Linkage Disequilibrium Views TagSNPs Haplotypes Many New Formats Under Development

Toward comprehensive association studies 5-7 million common variants exist in genome Testing all for association is impractical today Can the list be reduced w/o loss of power? SNPs in Coding (Amino Acid Changes) Linkage disequilibrium (SNPs in other functional regions, i.e. regulatory elements)

cSNPs - Both Deep and Average Coverage Available from the PGAs CD36 - Southwestern PGA - Deep cSNP Discovery Strategy - Healthy, High Cholesterol, High Triglycerides, Congential Cardiac Abnormalities, Left Ventricular Hypertrophy ……. CD36 - SeattleSNPs PGA - Average cSNP Discovery Strategy -Healthy only

SIFT (Sorting Intolerant From Tolerant) Coding Changes CYP4F2 Trp (W)  Gly (G) Predicted to be tolerated Val (V)  Gly (G) Predicted not to be tolerated Ng and Henikoff, Gen. Res. 2002

Collins, Guyer, Chakravarti Science 278:1580-81, 1997 SNP-Based Association Studies Indirect: Use dense map of SNPs and test for linkage disequilibrium (use association to find sites in entire sequence (non-coding) with function) 5’ 3’ Arg-Cys Val-Val Collins, Guyer, Chakravarti Science 278:1580-81, 1997

SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate genes Chris Carlson Identifying haplotypes for genotype-phenotype analysis of candidate genes Dana Crawford SNP genotyping strategies

Christopher Carlson csc47@u.washington.edu Selecting SNPs for Genotype-Phenotype Analysis Using Allelic Association (Linkage Disequilibrium) Christopher Carlson csc47@u.washington.edu

Candidate Gene Association Analysis Describe existing genetic variation Rare SNPs (deep exonic resequencing) Common SNPs (complete resequencing) Select a subset of SNPs for genotyping cSNPs (amino acid changes) htSNPs (resolve haplotypes) tagSNPs (patterns of genotype) Test for genotype/phenotype correlations

SeattleSNPs Resequencing Strategy I Resequence the complete genomic region of each gene 2000 bp upstream of first exon 1500 bp downstream of poly-A signal All exons and introns for genes below 35 kbp Image courtesy of GeneSNPs

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

SeattleSNPs Resequencing Strategy II Resequence candidate genes from inflammation and coagulation pathways Resequence 47 individuals 24 African American 23 European American Homozygote common Heterozygote Homozygote rare Missing Data

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

VG2 Visual Genotype 2 Web interface Visualize genotypes View SNPs by frequency Sort on similarity between sites Sort on similarity between samples Visualize LD

Preliminary Analyses Hardy Weinberg Equilibrium Population specificity Nucleotide diversity Pop genetics statistics (e.g. Tajima’s D)

SNP Selection: cSNPs Genotype SNPs which change amino acids Genotype other “good story” SNPs SNPs in known regulatory elements SNPs in Conserved Noncoding Sequences Image courtesy of GeneSNPs

SNP Selection: htSNPs Genotype “haplotype tagging” SNPs which resolve existing common haplotypes

SNP Selection: htSNPs Genotype “haplotype tagging” SNPs which resolve existing common haplotypes

SNP Selection: tagSNPs Resequence a modest number of samples Describe patterns of genotype at all common SNPs Genotype tagSNPs which efficiently capture existing patterns of genotype

Linkage Disequilibrium A B Haplotype is the pattern of alleles on a single chromosome 4 possible haplotypes Linkage Disequilibrium (LD) describes the allelic association between two SNPs Two popular LD statistics: D´ r2 Discuss picture/nix slide

Complete LD A B Unequal allele frequency Allelic association is as strong as possible 3 haplotypes observed No detected recombination between SNPs Genotype is not perfectly correlated D´ = 1 r2 < 1

Perfect LD A B Equal allele frequency Allelic association is as strong as possible 2 haplotypes observed No detected recombination between SNPs Genotype is perfectly correlated D´ = 1 r2 = 1

Rational SNP Selection Select SNPs to genotype on the basis of LD Some SNPs are in LD with many other SNPs Some SNPs are in LD with no other SNPs SNPs between a pair of associated SNPs are not necessarily associated with the flanking SNPs

LD SNP Selection Example CSF3 in European Americans 5200 bp 17 SNPs

LD SNP Selection Example 10 common SNPs (above 10% minor allele frequency) CSF3 in European Americans 5200 bp 17 SNPs

LD Site Selection Algorithm Find minimal set of SNPs for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP Calculate all pairwise r2 values Set r2 threshold based on power estimates for study

LD Site Selection Algorithm Find minimal set of SNPs for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP Calculate all pairwise r2 values Set r2 threshold based on power estimates for study

CSF3 Site Selection Threshold LD: r2 > 0.64 Bin 1: 4 sites Bin 2: 4 sites Bin 3: 2 sites Genotype 1 SNP from each bin, chosen for biological intuition or ease of assay design

Power and LD Given Select SNPs such that every SNP is either All common SNPs described Patterns of LD between common SNPs are known Select SNPs such that every SNP is either Directly assayed Associated with an assayed SNP Test for disease associations with assayed SNPs Power to detect disease associations at unassayed SNPs depends on r2 between assayed and unassayed SNPs

LD Selection and Haplotype LD selected SNPs provide the highest possible haplotype diversity for a given number of SNPs assayed LD selection is robust to recombination and hotspot structure LD selection is sensitive to population stratification

SNP Selection Summary It is possible to test all common variants in a candidate gene directly for risk association (main effects) with meaningful null negative results Caveat: Higher order risks unaddressed Haplotype (G X G effects within a locus) Epistasis (G X G effects between loci) Environment (G X E effects)

SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate genes Chris Carlson Identifying haplotypes for genotype-phenotype analysis of candidate genes Dana Crawford SNP genotyping strategies

Identifying Haplotypes for Genotype-Phenotype Analysis Dana C. Crawford dcrawfo@gs.washington.edu

Outline of discussion Constructing or inferring haplotypes Haplotype tools available in PGA Description of haplotypes in SeattleSNPs genes Use of VH1 tool to visually inspect Haplotype blocks Haplotype diversity Hotspots of recombination Summary of SeattleSNPs haplotype data

What is a Diplotype ? Humans are diploid At each SNP there are two alleles, which are observed as a genotype At each gene there are two haplotypes, which are observed as a multi-site genotype, or diplotype

What is a Haplotype? A: “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997 VH1 – haplotype visualization tool

How Do You Construct Haplotypes? 1. Collect extended family members C/T, A/G C/C, A/G T/T, G/G C/T, A/A C T A G T T G G C C

2. Go from diploid to haploid via somatic cell hybrids How Do You Construct Haplotypes? 2. Go from diploid to haploid via somatic cell hybrids e.g. Patil et al 2001

How Do You Construct Haplotypes? 3. Allele-specific PCR SNP 1 SNP 2 C/T A/G

How Do You Construct Haplotypes? Statistical inference Clark Algorithm EM (Arlequin) Phase Ligation (HAPLOTYPER) PHASE

Clark Algorithm Find unambiguous haplotypes Homozygotes Single Heterozygotes

Clark Algorithm Find ambiguous diplotypes formed from two unambiguous genotypes

Clark Algorithm Find ambiguous diplotypes formed from one unambiguous genotype and one new genotype

Clark Algorithm Iterate until either all haplotypes resolve, or ambiguous haplotypes are inconsistent with any inferred haplotype

Haplotype Algorithm Comparison Clark Intuitive Fast EM Complete solution Slightly more accurate than Clark Robust to ambiguity PHASE Complete solution Slightly more accurate than EM Slow version 2 faster Haplotyper (Ligation) Fast Better than Clark Less accurate than EM or PHASE

Haplotype Tools in the PGA InnateImmunity 25 genes re-sequenced in innate immunity pathway 4 populations: European and African-Americans, Hispanics, Asthmatics PHASE and Haplotyper results posted on website http://innateimmunity.net

Haplotype Tools in the PGA SeattleSNPs 120 genes re-sequenced in inflammation response 2 populations: European- and African-Americans PHASE results posted on website Interactive tool (VH1) to visualize and sort haplotypes http://pga.gs.washington.edu

Distribution of Haplotypes in 100 SeattleSNPs Genes AD ED

Common Haplotypes in 100 SeattleSNPs Genes (Frequency >5%) Population >5% MAF Average Range ED 4.54 1 - 8 AD 4.99 0 - 11

Haplotype Sharing Between Populations in 100 SeattleSNPs Genes

Number of Haplotypes From Two Different Discovery Strategies The average number of inferred haplotypes per gene we are observing in SeattleSNPs is greater than what has been previously described in the literature from other large surveys. One possibility for this difference is the fact that we are employing a different discovery strategy compared with other surveys. For example, another discovery strategy is re-sequencing coding regions of the gene rather than re-sequencing the entire gene (which is what we did). To compare our data with a coding variation discovery strategy, we inferred haplotypes from coding SNPs with a MAF >5%. In general, we observed approximately half the average number of haplotypes per gene compared with inferring haplotypes using all SNPs with MAF >5%. The average number of haplotypes per gene inferred from coding SNPs we observed here is more similar to estimates from other large surveys in the literature. It is not surprising that there are fewer haplotypes when coding variation is used rather than all common variation because the number of haplotypes per gene is related to the number of SNPs per gene. Using all common SNPs, we observed an average density of 4.71 and 2.79 SNPs/kb in the AD and ED populations. Using coding SNPs, the density was lower (less than 1 SNP/kb in both populations), thus, fewer haplotypes. Because it is costly to genotype all common sites, many people are interested in strategies that require fewer sites but still retain the information observed using all common sites. Is this possible?

Haplotype Structures Are Similar Across Discovery Strategies… FGB – African-Americans 13 SNPs >5% Coding SNPs 29 SNPs >5%

…But, Not For All Genes F10 – African-Americans 13 SNPs >5% Coding SNPs 48 SNPs >5%

Are Blocks Preserved Using Different Discovery Strategies? Four-gamete test: A B a b HaploBlockFinder; Zhang and Jin 2003 A B a b Yes*, for some: 10% of genes in AD 25% of genes in ED *>75% of the blocks are preserved Fewer “blocks” with fewer SNPs/kb

Using Visualization Tools (VH1) To Identify Haplotype Blocks IL10: Rare sites removed Sorted by related sites “Block” structure evident

Using VH1 to Identify Highly Divergent Haplotypes Some haplotypes are highly divergent More likely to have functional consequences? Mixed Blessing: Easier to detect Harder to dissect

Using Haplotypes To Identify Hotspots Of Recombination CD36 haplotypes, sorted by sample

Linkage Disequilibrium and Hotspots Hotspot in between sites need to be typed from both ends Associated Sites CD36

Detection of Recombination Hotspots In Candidate Genes HOTSPOTTER Developed by Na Li and Matthew Stephens Multilocus model for LD: Does not rely on “block-like” patterns Relates LD to underlying recombination process Incorporated into new version of PHASE (v2.0) students.washington.edu/lina/software/

CD36 – combined population

CD36 – AD and ED populations

HOTSPOTTER Preliminary Results 15 out of 100 genes have evidence of a hotspot: AGTR1 APOB CD36 IL1B IL21R IL4 NOS3 PLAUR PON1 SERPIN45 SELP SFPA2 SFTPB VCAM1 VEGF

SeattleSNPs Haplotype Summary More haplotypes per gene than previously described <50% of African-American chromosomes are represented by common shared haplotypes Block structure is preserved across discovery strategies for only a fraction of the genes Evidence for hotspots of recombination in human genes

SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate genes Chris Carlson Identifying haplotypes for genotype-phenotype analysis of candidate genes Dana Crawford SNP genotyping strategies

Ideals for SNP Genotyping High Sensitivity - PCR but moving towards direct genomic DNA detection High Specificity - Accurate Simple process - Easy to automate - High Throughput Multiplexing - Perform many assays at once - decrease costs Cheap

SNP Genotyping Allele-Specific Hybridization Polymerase Extension Matched Mis-Matched P r o b e a n d T a r g e t C A l l e l e T A l l e l e C Allele-Specific Hybridization C C T a r g e t G A H y b r i d i z e F a i l t o h y b r i d i z e + d d C T P C Polymerase Extension T a r g e t G A C i n c o r p o r a t e d C F a i l s t o i n c o r p o r a t e C Oligonucleotide Ligation C C T a r g e t G A L i g a t e F a i l t o l i g a t e Invader C C C T a r g e t G A C l e a v e F a i l t o c l e a v e Taqman C C C T a r g e t G A D e g r a d e F a i l t o d e g r a d e C Allele-Specific PCR C C T a r g e t G A A m p l i f y F a i l t o a m p l i f y

SNP Typing Formats Microtiter Plates - Fluorescence eg. Taqman - Good for a few markers - lots of samples - PCR Size Analysis by Mass or Electrophoresis eg. Sequenom or SnapShot - Moderate Multiplexing reducing costs Arrays - Custom or Universal eg. Affymetrics, Illumina or ParAllele - Highly multiplexed - HighThroughput - Genotype directly on genomic DNA

Taqman Genotyping with fluorescence-based homogenous assays (single-tube assay) A G Quencher Reporter

Genotype Calling - Cluster Analysis

Genotyping by Mass Spectrometry Multiplex ~ 5 SNPs

Comparative Genotyping in Populations Population 1 Population 2 Pooled DNA PCR Pooled DNA Quantitative Assay Estimate Allele Frequency PCR Pooled DNA Quantitative Assay Estimate Allele Frequency Polymorphism Polymorphism 60/40 85/15

Pooled Genotyping Advantages: Speed, Cost Major Disadvantages: Loss of haplotype information Loss of stratification by phenotype or environmental factors

SNP Genotyping Custom SNP Genotyping Chips:

- Universal Tag Readouts Multiplexed Genotyping - Universal Tag Readouts C T A G L o c u s 1 S p e c i f i c S e q u e n c e L o c u s 2 S p e c i f i c S e q u e n c e T a g 1 s e q u e n c e c T a g 1 s e q u e n c e T a g 2 s e q u e n c e c T a g 2 s e q u e n c e S u b s t r a t e S u b s t r a t e B e a d o r C h i p B e a d o r C h i p B e a d A r r a y C h i p A r r a y T a g 1 T a g 2 T a g 3 T a g 4 Multiplex ~1,000 SNPs Not dependent on primary PCR ParAllele Illumina

Illumina Genotyping - Gap Ligation

1,000 SNPs Assayed on 96 Samples

SNP Genotyping Lots of systems - Still costly but dropping Offering Moderate to High throughputs Systems vary in price $$ -$$$$ Laboratory Information Management Systems (Key: Track - Samples, - Assays - Completion rate - Reproducibility/Error Analysis)