Download presentation
Presentation is loading. Please wait.
Published byIrma Summers Modified over 9 years ago
1
Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer zouquan@nclab.net http://cs.tju.edu.cn/faculty/zouquan/ Reconstructing phylogenetic trees for ultra-large unaligned DNA sequences via with Hadoop
2
Background: why 2016-3-72/15
3
Phylogenetic Tree Genome-Genome Gene-Gene Population 2016-3-73/15 Model Computation
4
Background: challenge Multiple sequence alignment Phylogenetic tree 2016-3-74/15
5
Flow 2016-3-75/15
6
Flow---Clustering
7
Sampling 2016-3-77/15
8
2016-3-78/15
9
2016-3-79/15 Flow---MSA
11
A Trie Tree for a Sequence 2016-3-711/15
12
More tricks in MSA 2016-3-712/15 input sequences trie trees search sum up update final result
13
2016-3-713/15
14
Experiments Data –Human mtGenome –16s rRNA Measurement –Running time –Average SP score (For MSA) 2016-3-714/15 datasetmax lengthmin lengthaverage lengthsequence numberfile size mt genome (1x) 165791655616569.7 67210 MB mt genome (20x) 13440213 MB mt genome (50x) 33600532 MB mt genome (100x) 672001.1GB 16s rRNA (small) 15998071442.8108453156 MB 16s rRNA (big) 16298071388.510116211.4GB
15
Experiments---phylogenetic tree 2016-3-715/15 1x20x50x100x HPTree1 m 12 s3 m 18 s14 m 28 s44 m 17 s IQ-TREE13 m 7 s18 m 4 s39 m 43 s67 m 3 s IQ-TREE(8-core)9 m 39 s12 m 27 s26 m56 m 7 s phangorn40 sMore than 3 h--- RAxML33 m 3 sMore than 8 h--- STELLSMore than 1 h--- SmallSetBigSet HPTree207 m 44 sMore than 24 h IQ-TREE---
16
Experiments---MSA (mtDNA) 2016-3-716/15 10 M(1X)213 M(20X)532 M(50X)1.1G(100X) HAlign(Trie Tree)3 m16s-------- HAlign(Hadoop)2 m21s10 m53s14 m14s28 m28s MAFFT1 m41s175 m984 m-------- KAlign170 m44s-------- 10 M(1X)213 M(20X)532 M(50X)1.1G(100X) HAlign(Trie Tree)183.7----- HAlign(Hadoop)191 MAFFT152 ----- KAlign238562-----
17
Experiments---MSA (16s rRNA) 2016-3-717/15 153 M1.4G HAlign54 m 32 s199 m 35 s MAFFT3584 m 52 s------- 153 M1.4G HAlign1566032079 MAFFT26743------- Best Alignment1288913100
18
Experiments Running time comparison between aligned and unaligned data 2016-3-718/15
19
Software http://datamining.xmu.edu.cn/software/halign/ 2016-3-719/15 http://datamining.xmu.edu.cn/software/Phylogenetic_tree/ Quan Zou, et al. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment based on Center Star Strategy. Bioinformatics. Doi:10.1093/bioinformatics/btv177.
20
Discussion Summary –MSA with Hadoop –NJ phylogenetic tree with Hadoop From DNA to Protein RNA secondary structure is ignored Several complex issues in evolution are ignored 2016-3-720/15
21
2016-3-7 Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer zouquan@nclab.net http://cs.tju.edu.cn/faculty/zouquan/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.