The HAP webserver: Tools for the Discovery of Genetic Basis of Human Disease HYUN MIN KANG Computer Science and Engineering University of California, San.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
A dynamic program algorithm for haplotype block partitioning Zhang, et. al. (2002) PNAS. 99, 7335.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
Identification of Copy Number Variants using Genome Graphs
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
The Haplotype Blocks Problems Wu Ling-Yun
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Invest. Ophthalmol. Vis. Sci ;52(6): doi: /iovs Figure Legend:
Pharmacogenetics: Implications of race and ethnicity on defining genetic profiles for personalized medicine  Victor E. Ortega, MD, Deborah A. Meyers,
Genome Wide Association Studies using SNP
Volume 21, Issue 3, Pages (October 2017)
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
PheWAS and Beyond: The Landscape of Associations with Medical Diagnoses and Clinical Measures across 38,662 Individuals from Geisinger  Anurag Verma,
Volume 21, Issue 3, Pages (October 2017)
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
A Three–Single-Nucleotide Polymorphism Haplotype in Intron 1 of OCA2 Explains Most Human Eye-Color Variation  David L. Duffy, Grant W. Montgomery, Wei.
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Data Mining Applied to Linkage Disequilibrium Mapping
Presentation transcript:

The HAP webserver: Tools for the Discovery of Genetic Basis of Human Disease HYUN MIN KANG Computer Science and Engineering University of California, San Diego 1. Introduction Understanding the structure of human variation is important for understanding the genetic basis of human diseases. Recent advances in high-throughput genotyping technology generating a tremendous amount of high density single nucleotide polymorphism(SNP) data holds great promise for discovering genetic risk factors associated with disease. In order to identify association between disease and variations in an individual’s chromosome, the genotype data must be phased into haplotypes. Based on HAP, which is a very efficient tool for haplotype resolution based on imprefect phylogeny, HAP webserver provides an integrated method to reconstruct haplotype structure and to identify genetic variants associated with complex phenotypes which can give insight into the genetic factors of complex diseases. Our methods leverage interplay between genotype phasing, haplotype phylogeny, association analysis, and functional SNPs prediction. Our methods leverage new insights into the structure of human variation which allows us to observe phenotype associations directly from genotype and phenotype data. We demonstrate our methods via an analysis of two genes implicated in hypertension. Our methods are easily accessible via the webserver, providing complete results of association analysis including graphical visualizations. We expect that our methods will facilitate current association studies. NOAH ZAITLEN Bioinformatics Program University of California, San Diego TAURIN TAN-ATICHAT Electrical and Computer Engineering University of California, San Diego EDWARD SHYU Computer Science and Engineering University of California, San Diego GRACE SHAW Computer Science and Engineering University of California, San Diego DAFNA BITTON Computer Science and Engineering University of Calfornia, San Diego ELAD HAZAN Department of Computer Science Princeton University ERAN HALPERIN International Computer- Science Institute, Berkeley ELEAZAR ESKIN Computer Science and Engineering University of California, San Diego 2. HAP – haplotype resolution HAP is a haplotype analysis system which is aimed in helping geneticists perform disease association studies. The main feature of HAP is a phasing method which is based on the assumption of imperfect phylogeny. The phasing method is very efficient, which allows HAP to work with very large data sets, and to perform other operations such as finding a partition of the region into blocks of limited diversity or performing association tests on each of these block with in vitro experiments already published. HAP takes as input a set of genotypes over a region, taken form a population, and returns the haplotype phase of each of the individual’s genotypes. From our studies, we observed that HAP is very accurate when the number of individual taken is at least a couple of dozens. In addition to phasing, HAP also produces a partition of the region into blocks of correlated SNPs. The block partition of the haplotypes is such that it minimizes the number of tag SNPs. HAP leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated “blocks”(Daly et al 01, Patil et al 01). HAP has shown to have competitive accuracy compared to the state of the art sofrwares(such as PHASE, HAPLOTYPER). On the other hand, HAP is extremely fast and can be used on very large datasets. Recently, HAP is successfully used in revealing whole genome haplotype structure. (Hinds et al. 05) Figure 2  Predicted CHGA phylogeny Each symbol denotes a haplotype variants of CHGA promoter. Each haplotype variant is classified into one of three groups: ancestral, common, or recent haplotype. A solid line denotes mutant, and dashed lines denotes recombination. This figure is automatically generated by our webserver. CHGA HAPLOTYPE ID NUCLEOTIDE AT POSITIONSTATISTICAL TESTS Linear Regression Unpaired t-test Mann- Whitney Jonckheere- Terpstra A GATTGTCC.948(+).963 (+).969 (+).963 (+) B AATTGTCC.977(  ).999 (  ).996 (  ) C GACGATAC.175 (  ).209 (  ).505 (  ).485 (  ) D GATTGCCC.999 (  ).990 (+).983 (+).997 (+) E GTTTGCCT.004 (+)**.011 (+)* F GACGATCC.836 (  ).978 (  ).986 (  ) Table 1  Haplotype analysis between CHGA promoter region and CHGA plasma levels : Statistical p-values for the association between the haplotypes in CHGA promoter region and CHGA plasma levels in 221 African Americans over various statistical tests. Each haplotype ID and its sequence is identical to that of Figure 2. The p-values are evaluated by permutation tests with 10 5 times of random shuffling of phenotypes. The p-values are also adjusted to multiple comparisons, thus no further conservative adjustments are required. The plus or minus sign next to each p-value denotes whether the haplotype variant shows positive or negative effect on the phenotype for each statistical test. Single and double asterisks by the p-value denotes that the p-value is less than 0.05 and 0.01, respectively. This table is automatically generated by our webserver. Figure 4  CHGA association visualization A histogram of CHGA levels grouped by the number of copies of the haplotypes E in Table 1. The x- axis represents plasma levels, and y-axis represents the fraction of individuals with given plasma level. It can be observed that there are significant association for the haplotype to increase plasma level. This figure is automatically generated by our webserver. Figure 5  CHGA functional SNPs prediction Results of predicting how each SNP contributes to the association identified in Table 1. The y-axis is a score that represents the degree of functional contribution. The SNP at the position -89 makes the highest functional contribution, and those at positions -1014,-988,-462 share the second highest score. This results is consistent to the in vitro experiments previously published. This figure is automatically generated by our webserver. 3. Inferring Phylogenetic Relationships between Haplotypes Recent studies have shown that within short regions, there is limited genetic variability, and only a small number of haplotypes account for the entire population. In a typical region of 20kb, three or four common haplotypes account for 80% of the population. Futhermore, most rare variants appear to be minor variants of common haplotypes. Using these results, phylogeny is inferred by identifying most likely ancestors for the each of the rare haplotypes given the frequent ones. Then, ancestral haplotypes are found by searching for similar common variants. Figure 1  HAP webserver (a) HAP is used in revealing whole genome haplotype structure. The article “Whole-Genome Patterns of Common DNA Variation in Three Human Populations” is published on the cover of Science. (b) The screenshot of HAP webserver main page, available at 4. Identifying Association via Statistical Tests Leveraging haplotype structure Quantitative phenotypes & Dose-effects Nonparametric Tests Covariates Figure 3  Linkage disequilibrium plot Results of of running HAP webserver with linkage disequilibrium data. The example data is available via webserver. The axis represent SNP positions. The red regions indicate high disequilibrium while the blue indicates low disequilibrium. 5. Functional SNPs Prediction Once associated haplotypes are identified using rigorous statistical tests, our methods provide a method for estimating the likelihood of each SNP contributing the association. To make this prediction, we iterate over several groupings of the haplotypes to attempt to isolate the functional SNPs. The outcome of the second step is a score distribution over the SNPs estimating how likely each SNP is to be functional. 6. Whole Genome Association Studies with HDL Mouse Phenome Database Figure 6  HDL Phenotype The association test results for the level of HDL cholesterol in the different mouse strains. Figure 7  Random The association test results for randomly permuted HDL phenotype in figure 6.