Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer Reconstructing phylogenetic trees for.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Molecular Evolution Revised 29/12/06
Structural bioinformatics
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Bioinformatics and Phylogenetic Analysis
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Protein Sequence Classification Using Neighbor-Joining Method
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
Phylogenetic Reconstruction based on RNA Secondary Structural Alignment Benny Chor, Tel-Aviv Univ. Joint work with Moran Cabili, Assaf Meirovich, and Metsada.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Presenter: Yang Ruan Indiana University Bloomington
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Chapter 3 Computational Molecular Biology Michael Smith
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
EB3233 Bioinformatics Introduction to Bioinformatics.
The Tree of Life How do we select a gene sequence for comparison?
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel.
1 Longest Common Subsequence as Private Search Payman Mohassel and Mark Gondree U of CalgaryNPS.
Phylogeography of Leucetta chagosensis (Porifera, Calcarea) Christoph Flucke, Jens Kurz, Rasmus Liedigk, Zdenka Valenzova Fig.4: RAxML Phylogram Fig.5:
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
Copyright OpenHelix. No use or reproduction without express written consent1.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
L ESSON A IMS & O BJECTIVES Two part lab: First part will be completed in class today. (1) Use the online Bioinformatics tool ClustalW to analyze DNA sequences.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Scaling BAli-Phy to Large Datasets June 16, 2016 Michael Nute 1.
Bioinformatics Overview
邹权 (PH.D.&Professor) 天津大学 计算机科学与技术学院
Introduction to Bioinformatics Resources for DNA Barcoding
Research Paper on BioInformatics
The ideal approach is simultaneous alignment and tree estimation.
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Genome organization and Bioinformatics
BNFO 602 Phylogenetics Usman Roshan.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BNFO 602 Phylogenetics – maximum parsimony
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Genes to Trees Daniel Ayres and Adam Bazinet
Alignment time for Clustal Omega (red), MAFFT (blue), MUSCLE (green) and Kalign (purple) against the number of sequences of HomFam test sets. Alignment.
Phylogenetic tree of 38 Pseudomonas type strains, based on the V3-V5 region sequence of the 16S rRNA gene (V3 primer, positions 442 to 492; and V5 primer,
Computational Genomics Lecture #3a
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
MULTIPLE SEQUENCE ALIGNMENT
Phylogenetic analyses of alphacoronaviruses based on complete genome and ORF1ab protein sequence. Phylogenetic analyses of alphacoronaviruses based on.
Introduction to Bioinformatics
Neonatal HSV-2 genomes are genetically distinct from one another and encompass a broad range of known HSV-2 genetic diversity. Neonatal HSV-2 genomes are.
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer Reconstructing phylogenetic trees for ultra-large unaligned DNA sequences via with Hadoop

Background: why /15

Phylogenetic Tree Genome-Genome Gene-Gene Population /15 Model Computation

Background: challenge Multiple sequence alignment Phylogenetic tree /15

Flow /15

Flow---Clustering

Sampling /15

/15

/15 Flow---MSA

A Trie Tree for a Sequence /15

More tricks in MSA /15 input sequences trie trees search sum up update final result

/15

Experiments Data –Human mtGenome –16s rRNA Measurement –Running time –Average SP score (For MSA) /15 datasetmax lengthmin lengthaverage lengthsequence numberfile size mt genome (1x) MB mt genome (20x) MB mt genome (50x) MB mt genome (100x) GB 16s rRNA (small) MB 16s rRNA (big) GB

Experiments---phylogenetic tree /15 1x20x50x100x HPTree1 m 12 s3 m 18 s14 m 28 s44 m 17 s IQ-TREE13 m 7 s18 m 4 s39 m 43 s67 m 3 s IQ-TREE(8-core)9 m 39 s12 m 27 s26 m56 m 7 s phangorn40 sMore than 3 h--- RAxML33 m 3 sMore than 8 h--- STELLSMore than 1 h--- SmallSetBigSet HPTree207 m 44 sMore than 24 h IQ-TREE---

Experiments---MSA (mtDNA) /15 10 M(1X)213 M(20X)532 M(50X)1.1G(100X) HAlign(Trie Tree)3 m16s HAlign(Hadoop)2 m21s10 m53s14 m14s28 m28s MAFFT1 m41s175 m984 m KAlign170 m44s M(1X)213 M(20X)532 M(50X)1.1G(100X) HAlign(Trie Tree) HAlign(Hadoop)191 MAFFT KAlign

Experiments---MSA (16s rRNA) / M1.4G HAlign54 m 32 s199 m 35 s MAFFT3584 m 52 s M1.4G HAlign MAFFT Best Alignment

Experiments Running time comparison between aligned and unaligned data /15

Software /15 Quan Zou, et al. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment based on Center Star Strategy. Bioinformatics. Doi: /bioinformatics/btv177.

Discussion Summary –MSA with Hadoop –NJ phylogenetic tree with Hadoop From DNA to Protein RNA secondary structure is ignored Several complex issues in evolution are ignored /15

Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer