The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Next Generation Sequencing, Assembly, and Alignment Methods
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.
Molecular Evolution Revised 29/12/06
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Bioinformatics and Phylogenetic Analysis
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Lecture 24 Inferring molecular phylogeny Distance methods
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
PhyloSub Jiao et. al. BMC Bioinformatics 2014, 15:35.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Calculating branch lengths from distances. ABC A B C----- a b c.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer Reconstructing phylogenetic trees for.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Lecture 19 – Species Tree Estimation
Bioinformatics Overview
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Inferring phylogenetic trees: Distance and maximum likelihood methods
Molecular Evolution.
BNFO 602 Phylogenetics Usman Roshan.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Comparing read recruitment, de novo, and insertion tree strategies for phylogenetic diversity computation. Comparing read recruitment, de novo, and insertion.
Taxonomic identification and phylogenetic profiling
Presentation transcript:

The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong Ding Dec. 6

Outline Background Workflow Sequence comparison Tree comparison Summary & future work

Can short-reads successfully recover phylogeny? Next generation sequencing (NGS) Low-cost High-throughput Short-read Multi individual sample Short-reads Reconstructed sequence phylogeny ? BackgroundWorkflowSequence comparison Tree comparisonSummary

Simulation process Original genealogyOriginal haplotypesNJ tree Simulated by SerialSimCoal with coalescent model Consensus sequence Short-reads Simulated by MetaSim with 454 error model Mapping Alignment built by SHRiMP and SSAHA Reconstructed haplotypes Haplotypes reconstructed by ShoRAH NJ tree built by PAUP* Compare tree topology Compare number and similarity of haplotypes BackgroundWorkflowSequence comparison Tree comparisonSummary

6 parameters used Effective population size N Sample size n Mutation rate μ Sequence length l NnμlSr_NSr_l E E E — Number of short-reads Sr_N Length of short-reads Sr_l BackgroundWorkflowSequence comparison Tree comparisonSummary All 486 combination of these parameters were simulated

Different numbers of haplotypes BackgroundWorkflowSequence comparison Tree comparisonSummary

Similar sequences BackgroundWorkflowSequence comparison Tree comparisonSummary

Can reconstructed haplotypes still capture some phylogenetic information? Different haplotypes number  impossible to recover the true phylogenetic trees Assuming true haplotypes number of the sample is known Select the most similar reconstructed sequences to build phylogeny tree Calculate symmetric difference BackgroundWorkflowSequence comparison Tree comparisonSummary Cluster (k-mean) reconstructed haplotypes to n groups Build tree with consensus sequence of each group Calculate tree balance statistics

Method for tree comparison A B C B A C (BC) (ABC) (AC) (ABC) symmetric difference = 2 Symmetric difference for rooted and labeled trees Tree balance statistics for rooted and unlabeled trees A N i is the internal nodes number between tip i and root e.g. i=A, N A = 2, Ñ = ( )/5=2.4

Different topology of most similar sequence tree BackgroundWorkflowSequence comparison Tree comparisonSummary

Different balance statistics of k- mean cluster tree BackgroundWorkflowSequence comparison Tree comparisonSummary nN_barI_c orgrecPorgrecP e e e e-09

Summary & future work Reconstructed haplotypes typically failed to estimate the correct number of haplotypes Consequently, it was not possible to recover the true phylogenetic trees. Even assuming we know the true haplotype number, the chance to recover the true tree topology is still small. Other reconstruction method, use multiple reference sequence when mapping…

Reference Anderson, C.N.K., Ramakrishnan, U. et al Serial SimCoal: A population genetic model for data from multiple populations and points in time.. Bioinformatics 21, Johnson, P.L., Slatkin, M., Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res 16, Richter, D.C., Ott, F. et al MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3, Suzuki, S., Ono, N., Furusawa, C., Ying, B.-W., Yomo, T., Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms. PLoS ONE 6, e Zagordi, O., Bhattacharya, A. et al ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 Metei D., Misko D,. et al SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics 27, 7 Ning Z, Cox AJ and Mullikin JC SSAHA: a fast search method for large DNA databases. Genome research,