Download presentation
Presentation is loading. Please wait.
1
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard, Laurent Duret, Simon Penel, Manolo Gouy, François Rechenmann, and Guy Perrière Presented by: Jean Yeh
2
Background Information The authors have created three databases that gather genes into homologous families HOVERGEN – vertebrates HOBACGEN – prokaryotes HOGENOM – completely sequenced organisms Among homologous genes, need to be able to differentiate orthologs from paralogs
3
Homologous Sequences Homologs: Two genes related by descent from a common ancestral DNA sequence Orthologs: Two genes in different species; evolved from a single ancestral gene by speciation Paralogs: Two genes related by duplication within a genome
4
Orthologs and Paralogs http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif
5
Gene Function Gene function tends to change after gene duplication Orthologs are more reliable predictors of gene function than paralogs Evolutionary distance also plays a role Closely related paralogs probably more similar than distantly related orthologs
6
Goal Create algorithms that allow for automatic searching for orthologs or paralogs in their databases One algorithm for tree reconciliation One algorithm for tree pattern matching Implement under architecture used to query the databases
7
Tree Reconciliation Infers speciation and duplication events Compares gene tree G with species tree S to give a reconciled tree R Algorithm: R = S Step through G and R simultaneously If nodes are incongruent, insert duplication node in R and annotate gene losses
8
Tree Reconciliation
9
Tree Pattern Matching A tree pattern is a peculiar tree structure with taxonomic and evolutionary parameters contained in nodes and leaves Can be considered a subtree Want to match to a target tree E.g. pattern (X, Y, Z) matches ((X, Y), Z), (X, (Y, Z)), and ((X, Z), Y)
10
Tree Pattern Matching Uses a recurrence algorithm that takes into account different taxonomic levels as well as the specific branch constraints Cuts down on run time by checking the number of leaves in the pattern and the target tree Allows users to search for orthologs/paralogs
11
FamFetch Interface User interface to access the databases Incorporates both algorithms Pattern editor has two frames: tool and pattern Pattern frame – interactive editor to construct, load, save, and match patterns with a tree database Tool frame – tools used in pattern frame
12
FamFetch
13
Tree Rooting For tree reconciliation, the trees must be rooted Authors use their reconciliation algorithm to find the most parsimonious solution – the one that requires the least number of gene duplications Reconciliation algorithm relatively fast
14
Tree Pattern Search By forming their algorithm as a tree pattern search, the authors managed to increase possible queries for the users Can search for gene duplication or gene speciation events, not just orthologs and paralogs Also relatively fast algorithm, though lose the human flexibility of pattern matching
15
Automatic Search for Orthologs Previously done with pairwise BLAST searches and reciprocal hits Need all genes and if genes are wrong, results may be wrong Classifying genes into clusters of orthologs depends on evolutionary distance between species
16
Possible Improvement Have program estimate reliability of reconciliation While it allows for easier comparative sequence analysis, it was designed solely for databases the authors had already created Might be improved if it could be generalized for more databases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.