Protein Sequence Classification Using Neighbor-Joining Method

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

An Introduction to Phylogenetic Methods
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Multiple alignment: heuristics
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Multiple sequence alignment
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Phylogenetic trees Sushmita Roy BMI/CS 576
MCB 5472 Lecture #6: Sequence alignment March 27, 2014.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Terminology of phylogenetic trees
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
BINF6201/8201 Molecular phylogenetic methods
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Whole Genome Phylogenetic Analysis Yifeng Liu and Reihaneh Rabbanyk Khorasgani April 8th, 2009.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Students Adda Zachary Deema Al Ghanim Horsley Jacqueline Sandrick Daniel Mentors Xiaoming Gao Xinjun Zhang Thilina Gunarathne Supervised by Dr.Judy Qiu.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Phylogenetic genome analysis, phylogenomics
From: Phylogenetic Analysis of the ING Family of PHD Finger Proteins
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
The ideal approach is simultaneous alignment and tree estimation.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Comparative RNA Structural Analysis
Gene Tree Estimation Through Affinity Propagation
Phylogeny.
Phylogenetic analysis of replication proteins expressed by bifidobacterial plasmids. Phylogenetic analysis of replication proteins expressed by bifidobacterial.
Lecture 19: Evolution/Phylogeny
Neighbor-joining tree of the 262 S
(A) Bayesian phylogenetic tree of the H gene nucleotide alignment from tigers Pt2004 and Pt and representative CDV sequences obtained from GenBank.
Presentation transcript:

Protein Sequence Classification Using Neighbor-Joining Method Bo Liu

Overview Given: A group of sequences, they have somewhat similarity between each other and same protein function. Input: One unknown function sequence Output: If this sequence belongs to this protein cluster.

Representation of Sequences Group Distance Matrix Matrix Calculation Pair-Wise Alignment Multiple Sequence Alignment Alignment-Free: Relative Lempel-Ziv Complexity   A B C D 7 11 6 14 9 Otu et al. Bioinformatics, 2003

Correlation of Input Sequence with Group NJ method Smallest Sum of Branch Lengths   A B C D -40 -34 Saitou et al. Mol. Biol. Evol., 1987

NJ Method Leaf Length Distance to Node New Distance Matrix AB C D 5 8   AB C D 5 8 7 Studier et al. Mol. Biol. Evol., 1988

Classification Criteria Node with longest leaf length. Evolve too fast Last node joined into the tree. Cost the most to join the tree

Running Time Preprocessing: Query Sequence Classification: Distance Matrix Calculation: O(n2l2) Query Sequence Classification: Distance Calculation: O(nl2) NJ Construction: O(n3)

Thank you