Xiaole Shirley Liu STAT115/STAT215/

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Introduction to Haplotype Estimation Stat/Biostat 550.
What is an association study? Define linkage disequilibrium
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Single Nucleotide Polymorphism And Association Studies
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Ronnie A. Sebro Haplotype reconstruction BMI /21/2004.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Introduction to Precision Medicine
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Gene Hunting: Linkage and Association
Genome-Wide Association Study (GWAS)
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
The Haplotype Blocks Problems Wu Ling-Yun
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Power Calculations for GWAS
Evolution and Population Genetics
Single Nucleotide Polymorphisms (SNPs
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Constrained Hidden Markov Models for Population-based Haplotyping
Genome Wide Association Studies using SNP
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
Recombination (Crossing Over)
Case Study #2 Session 1, Day 3, Liu
Power to detect QTL Association
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Exercise: Effect of the IL6R gene on IL-6R concentration
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Ho Kim School of Public Health Seoul National University
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Xiaole Shirley Liu STAT115/STAT215/ Haplotypes and GWAS Xiaole Shirley Liu STAT115/STAT215/

Haplotype Haplotype block: a cluster of linked SNPs Haplotype boundary: blocks of sequence with strong LD within blocks and no LD between blocks, reflect recombination hotspots Association studies using haplotype is more accurate than using individual SNPs Haplotype size distribution STAT115

SNP Profiling [C/T] [A/G] T X C [A/C] [T/A] Tagging SNPs: Possible haplotype: 24 In reality, a few common haplotypes explain 90% variations Tagging SNPs: SNPs that capture most variations in haplotypes removes redundancy Redundant STAT115

SNP Genotyping One SNP at a time or genome-wide (SNP array) 2.5kb 0.30 STAT115

40 Probes Used Per SNP Allele call Signal AA, BB, AB Theoretically 1A+1B, 2A, 2B But could have 1A+3B Amplified! STAT115

Haplotype Inference Genotyping only tells an individual is e.g. Aa BB Cc, but it doesn’t tell whether haplotype is: ABC + aBc, or ABc + aBC Haplotype can often be inferred if parental genotype is known Similar to blood typing, e.g. F: A, M: AB, C: B  F: , M: , C: Otherwise, look at the population genotypes, infer common haplotypes STAT115

Haplotype Inference Clark’s Algorithm Construct haplotypes from unambiguous individuals Remove samples that can be explained as combinations of haplotypes discovered already Propose haplotype that would explain most remaining Iterate 2 & 3 until finish STAT115

Haplotype Inference Clark’s Algorithm Construct haplotypes from unambiguous individuals Remove samples that can be explained as combinations of haplotypes discovered already Propose haplotype that would explain most remaining Iterate 2 & 3 until finish Disadvantages: Depend on # of ambiguous subjects Cannot get started when n is small STAT115

EM and Gibbs Sampling in Motif Finding Problem Observe: sequence S Unknown: motif θ and site location A (alignment), but given one, can infer the other EM and Gibbs Sampler Initialize random motif θ Iterate: Given θ and sequence S, update site location A Given A and S, update θ EM updates by weighted average Gibbs sampling updates by sampling STAT115

Statistical Model for Haplotype T T A C C --- 1 T T A C G --- 2 T T A G C --- 3 T T A G G --- 4 T T C C C --- 5 T T C C G --- 6 T T C G C --- 7 T T C G G --- 8 Haplotype Frequency 4 2 5 3 1 6 7 8 Haplotype Pool 1 6 Each individual’s two haplotypes are treated as random draws from a pool of haplotypes with certain frequencies that can satisfy the genotyping STAT115

Haplotype Inference EM and Gibbs Sampler Observe genotype Y, estimate haplotype pair Z for each individual and haplotype frequency  Initialize haplotype frequencies Iteration: Estimate Z given Y,  Estimate  given Y, Z STAT115

Haplotype Inference EM and Gibbs Sampler Observe genotype Y, estimate haplotype pair Z for each individual and haplotype frequency  Initialize haplotype frequencies Iteration: Estimate Z given Y,  Estimate  given Y, Z STAT115

Haplotype Inference Partition-Ligation When #SNP is big, # possible haplotypes is too big, so divide and conquer Consider an inferred sub-haplotype as one allele STAT115

Hapmap of Human Genome HapMap: catalog of common genetic variants in human What are these variants Where do they occur in our DNA How are they distributed within populations and between populations around the world Goals: Define haplotype “blocks” across the genome Enable unbiased, genome-wide association studies STAT115

1000 Genomes Projects Characterization of human genome sequence variation Foundation for investigating the relationship between genotype and phenotype Break STAT115

Association Studies Association between genetic markers and phenotype E.g. Cystic Fibrosis ~70% of Cystic Fibrosis patients have a deletion of 3 base pairs resulting in the loss of a phenylalanine amino acid at position 508 of the CFTR gene Especially, find disease genes, SNP / haplotype markers, for susceptibility prediction and diagnosis

SNPs in Pharmacogenomics Warfarin and CYP2C9: SNPs in Pharmacogenomics Warfarin anticoagulant drug; CYP2C9 gene metabolizes warfarin. A patient requiring low dosage warfarin compared to normal population, has an odd ratio of 6.21 for having  1 variant allele Subgroup of patients who are poor metabolisers of warfarin are potentially at higher risk of bleeding Aithal et al., 1999, Lancet.

Influences individual decisions on life styles, prevention, screening, and treatment

Genome-Wide Association Studies Quality Control Unusual similarity between individual Wrong sex Trio has non-Mendelian inheritance Genotyping quality Two strategies: Family-based association studies Population-based case-control association studies

Quality Control: SNP calls % SNP called SNP calls from all the samples at a locus Good calls! Bad calls!

Family-based Association Studies Look at allele transmission in unrelated families and one affected child in each Like coin toss, likelihood of fair coin A a A a

TDT: Transmission Disequilibrium Test Only heterozygote parents matters, calculate observed over expected Could also compare allele frequency between affected vs unaffected children in the same family Break

Case Control Studies SNP/haplotype marker frequency in sample of affected cases compared to that in age /sex /population-matched sample of unaffected controls

From Genotyping to Allele Counts

Test Significant Associations Expected: (24 + 278) * (24 + 86) / (24 + 278 + 86 + 296) = 49 (278+296) * (86+296) / (24 + 278 + 86 + 296) = 321 2 = 27.5, 1df, p < 0.001

Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial Infarction 2 (1df) P-value Cases 2,132 (55.4) 1,716 (44.6) 55.1 1.2 x 10-13 Controls 2,783 (47.4) 3,089 (52.6) Allelic Odds Ratio = 1.38 OR = 1, no disease association OR > 1, allele C increase risk of disease OR < 1, allele C decrease risk of disease Samani N et al, N Engl J Med 2007; 357:443-453.

Multiple hypotheses testing? GWAS Pvalues

GWAS Pvalues for Type II Diabetes Bonferroni correction: most common, typically p < 10-7 or 10-8 Manhattan Plot How many SNPs were done? McCarthy et al, Nat Rev Genetics, 2008

Reproducibility of Association Studies Most reported associations have not been consistently reproduced Hirschhorn et al, Genetics in Medicine, 2002, review of association studies 603 associations of polymorphisms and disease 166 studied in at least three populations Only 6 seen in > 75% studies

Size Matters Visscher, AJHG 2012

How to Improve Statistical Power? Without increasing samples? Test association of disease with haplotypes instead of individual SNPs Also reduce genotyping errors Split samples: First half narrow down promising SNPs / haplotypes Second half refining hits (much fewer multiple hypotheses) Increase sample size: precision medicine initiative cohort ~ 1 million volunteers

Manolio et al., Clin Invest 2008 P < 9.9 × 10–7 (P<=10-6) Manolio et al., Clin Invest 2008 33

Summary Haplotype inference Clarks: resolve unambiguous first, propose new haplotypes to maximize explanation EM & Gibbs: iteratively infer haplotype frequency and individuals’ haplotypes Tagging SNPs and GWAS Family based association studies: TDT transmitted allele to affected child Case control studies: X-sq (allele frequency difference in case and controls) and OR STAT115

Acknowledgement Jun Liu & Tim Niu Cheng Li & Yuhyun Park Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse Jim Stankovich Teri Manolio David Evans Guodong Wu Stefano Monti Bo Li