Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiaole Shirley Liu STAT115/STAT215/

Similar presentations


Presentation on theme: "Xiaole Shirley Liu STAT115/STAT215/"— Presentation transcript:

1 Xiaole Shirley Liu STAT115/STAT215/
Haplotypes and GWAS Xiaole Shirley Liu STAT115/STAT215/

2 Haplotype Haplotype block: a cluster of linked SNPs
Haplotype boundary: blocks of sequence with strong LD within blocks and no LD between blocks, reflect recombination hotspots Association studies using haplotype is more accurate than using individual SNPs Haplotype size distribution STAT115

3 SNP Profiling [C/T] [A/G] T X C [A/C] [T/A] Tagging SNPs:
Possible haplotype: 24 In reality, a few common haplotypes explain 90% variations Tagging SNPs: SNPs that capture most variations in haplotypes removes redundancy Redundant STAT115

4 SNP Genotyping One SNP at a time or genome-wide (SNP array) 2.5kb
0.30 STAT115

5 40 Probes Used Per SNP Allele call Signal AA, BB, AB Theoretically
1A+1B, 2A, 2B But could have 1A+3B Amplified! STAT115

6 Haplotype Inference Genotyping only tells an individual is e.g. Aa BB Cc, but it doesn’t tell whether haplotype is: ABC + aBc, or ABc + aBC Haplotype can often be inferred if parental genotype is known Similar to blood typing, e.g. F: A, M: AB, C: B  F: , M: , C: Otherwise, look at the population genotypes, infer common haplotypes STAT115

7 Haplotype Inference Clark’s Algorithm
Construct haplotypes from unambiguous individuals Remove samples that can be explained as combinations of haplotypes discovered already Propose haplotype that would explain most remaining Iterate 2 & 3 until finish STAT115

8 Haplotype Inference Clark’s Algorithm
Construct haplotypes from unambiguous individuals Remove samples that can be explained as combinations of haplotypes discovered already Propose haplotype that would explain most remaining Iterate 2 & 3 until finish Disadvantages: Depend on # of ambiguous subjects Cannot get started when n is small STAT115

9 EM and Gibbs Sampling in Motif Finding
Problem Observe: sequence S Unknown: motif θ and site location A (alignment), but given one, can infer the other EM and Gibbs Sampler Initialize random motif θ Iterate: Given θ and sequence S, update site location A Given A and S, update θ EM updates by weighted average Gibbs sampling updates by sampling STAT115

10 Statistical Model for Haplotype
T T A C C --- 1 T T A C G --- 2 T T A G C --- 3 T T A G G --- 4 T T C C C --- 5 T T C C G --- 6 T T C G C --- 7 T T C G G --- 8 Haplotype Frequency 4 2 5 3 1 6 7 8 Haplotype Pool 1 6 Each individual’s two haplotypes are treated as random draws from a pool of haplotypes with certain frequencies that can satisfy the genotyping STAT115

11 Haplotype Inference EM and Gibbs Sampler
Observe genotype Y, estimate haplotype pair Z for each individual and haplotype frequency  Initialize haplotype frequencies Iteration: Estimate Z given Y,  Estimate  given Y, Z STAT115

12 Haplotype Inference EM and Gibbs Sampler
Observe genotype Y, estimate haplotype pair Z for each individual and haplotype frequency  Initialize haplotype frequencies Iteration: Estimate Z given Y,  Estimate  given Y, Z STAT115

13 Haplotype Inference Partition-Ligation
When #SNP is big, # possible haplotypes is too big, so divide and conquer Consider an inferred sub-haplotype as one allele STAT115

14 Hapmap of Human Genome HapMap: catalog of common genetic variants in human What are these variants Where do they occur in our DNA How are they distributed within populations and between populations around the world Goals: Define haplotype “blocks” across the genome Enable unbiased, genome-wide association studies STAT115

15 1000 Genomes Projects Characterization of human genome sequence variation Foundation for investigating the relationship between genotype and phenotype Break STAT115

16 Association Studies Association between genetic markers and phenotype
E.g. Cystic Fibrosis ~70% of Cystic Fibrosis patients have a deletion of 3 base pairs resulting in the loss of a phenylalanine amino acid at position 508 of the CFTR gene Especially, find disease genes, SNP / haplotype markers, for susceptibility prediction and diagnosis

17 SNPs in Pharmacogenomics
Warfarin and CYP2C9: SNPs in Pharmacogenomics Warfarin anticoagulant drug; CYP2C9 gene metabolizes warfarin. A patient requiring low dosage warfarin compared to normal population, has an odd ratio of 6.21 for having  1 variant allele Subgroup of patients who are poor metabolisers of warfarin are potentially at higher risk of bleeding Aithal et al., 1999, Lancet.

18 Influences individual decisions on life styles, prevention, screening, and treatment

19 Genome-Wide Association Studies
Quality Control Unusual similarity between individual Wrong sex Trio has non-Mendelian inheritance Genotyping quality Two strategies: Family-based association studies Population-based case-control association studies

20 Quality Control: SNP calls
% SNP called SNP calls from all the samples at a locus Good calls! Bad calls!

21 Family-based Association Studies
Look at allele transmission in unrelated families and one affected child in each Like coin toss, likelihood of fair coin A a A a

22 TDT: Transmission Disequilibrium Test
Only heterozygote parents matters, calculate observed over expected Could also compare allele frequency between affected vs unaffected children in the same family Break

23 Case Control Studies SNP/haplotype marker frequency in sample of affected cases compared to that in age /sex /population-matched sample of unaffected controls

24 From Genotyping to Allele Counts

25 Test Significant Associations
Expected: ( ) * ( ) / ( ) = 49 ( ) * (86+296) / ( ) = 321 2 = 27.5, 1df, p < 0.001

26

27 Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial Infarction
2 (1df) P-value Cases 2,132 (55.4) 1,716 (44.6) 55.1 1.2 x 10-13 Controls 2,783 (47.4) 3,089 (52.6) Allelic Odds Ratio = 1.38 OR = 1, no disease association OR > 1, allele C increase risk of disease OR < 1, allele C decrease risk of disease Samani N et al, N Engl J Med 2007; 357:

28 Multiple hypotheses testing?
GWAS Pvalues

29 GWAS Pvalues for Type II Diabetes
Bonferroni correction: most common, typically p < 10-7 or 10-8 Manhattan Plot How many SNPs were done? McCarthy et al, Nat Rev Genetics, 2008

30 Reproducibility of Association Studies
Most reported associations have not been consistently reproduced Hirschhorn et al, Genetics in Medicine, 2002, review of association studies 603 associations of polymorphisms and disease 166 studied in at least three populations Only 6 seen in > 75% studies

31 Size Matters Visscher, AJHG 2012

32 How to Improve Statistical Power?
Without increasing samples? Test association of disease with haplotypes instead of individual SNPs Also reduce genotyping errors Split samples: First half narrow down promising SNPs / haplotypes Second half refining hits (much fewer multiple hypotheses) Increase sample size: precision medicine initiative cohort ~ 1 million volunteers

33 Manolio et al., Clin Invest 2008
P < 9.9 × 10–7 (P<=10-6) Manolio et al., Clin Invest 2008 33

34 Summary Haplotype inference
Clarks: resolve unambiguous first, propose new haplotypes to maximize explanation EM & Gibbs: iteratively infer haplotype frequency and individuals’ haplotypes Tagging SNPs and GWAS Family based association studies: TDT transmitted allele to affected child Case control studies: X-sq (allele frequency difference in case and controls) and OR STAT115

35 Acknowledgement Jun Liu & Tim Niu Cheng Li & Yuhyun Park
Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse Jim Stankovich Teri Manolio David Evans Guodong Wu Stefano Monti Bo Li


Download ppt "Xiaole Shirley Liu STAT115/STAT215/"

Similar presentations


Ads by Google