Protein Sequence Classification Using Neighbor-Joining Method Bo Liu
Overview Given: A group of sequences, they have somewhat similarity between each other and same protein function. Input: One unknown function sequence Output: If this sequence belongs to this protein cluster.
Representation of Sequences Group Distance Matrix Matrix Calculation Pair-Wise Alignment Multiple Sequence Alignment Alignment-Free: Relative Lempel-Ziv Complexity A B C D 7 11 6 14 9 Otu et al. Bioinformatics, 2003
Correlation of Input Sequence with Group NJ method Smallest Sum of Branch Lengths A B C D -40 -34 Saitou et al. Mol. Biol. Evol., 1987
NJ Method Leaf Length Distance to Node New Distance Matrix AB C D 5 8 AB C D 5 8 7 Studier et al. Mol. Biol. Evol., 1988
Classification Criteria Node with longest leaf length. Evolve too fast Last node joined into the tree. Cost the most to join the tree
Running Time Preprocessing: Query Sequence Classification: Distance Matrix Calculation: O(n2l2) Query Sequence Classification: Distance Calculation: O(nl2) NJ Construction: O(n3)
Thank you