P-POD-PANTHER: update

Slides:



Advertisements
Similar presentations
Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98%
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Comparative genomics Joachim Bargsten February 2012.
© Wiley Publishing All Rights Reserved. Phylogeny.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
This work is supported by: a grant from the USDA-NSF Microbial Genome Sequencing Program, a Presidential Early Career Award for Scientists and Engineers.
Genomics in Drug Organon, Oss Tim Hulsen.
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Introduction to Phylogenetics
BIOINFORMATIK I UEBUNG 2 mRNA processing.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
By Michael Han Sanger Wormbase Group SAB 2008 Comparative Genomics with.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Lecture 7: Constrained Conditional Models
Tools For Vertebrate Gene Naming
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Protein Family Annotation Pipeline: update
Consistent and Efficient Reconstruction of Latent Tree Models
Fig. 1. — The life cycle of S. papillosus. (A) The life cycle of S
BLAST program selection guide
Basics of Comparative Genomics
Genome alignment Usman Roshan.
Sequence based searches:
Comparative Genomics.
The Refgene Database.
Department of Genetics • Stanford University School of Medicine
Genome Annotation Continued
Strategies for annotation of a genome
Ensembl Genome Repository.
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Walking the Interactome for Prioritization of Candidate Disease Genes
1. C. briggsae sequence curation 2. SNP data handling
Identification of novel F-box proteins in Xenopus laevis
Gautam Dey, Tobias Meyer  Cell Systems 
Volume 5, Issue 2, Pages e4 (August 2017)
Basics of Comparative Genomics
Welcome - webinar instructions
MAGE: Models and Algorithms for Genome Evolution 2013
Origins and Impacts of New Mammalian Exons
Presentation transcript:

P-POD-PANTHER: update Kara Dolinski P-POD/Princeton Paul Thomas PANTHER/SRI

Current status: OrthoMCL clusters from P-POD are incorporated in PANTHER families: Colors indicate different OrthoMCL families, mouseover displays OrthoMCL ID (soon will hyperlink to P-POD)

Currently crunching away at new protein sets generated by PANTHER: Updated protein sets P-POD OrthoMCL InParanoid/MultiParanoid PANTHER trees Consensus clusters compare Next step: incorporate TreeFam in PANTHER and Consensus clusters via sequence mapping (cannot simply run the TreeFam analysis on our own protein sets, so more complicated than OrthoMCL/InParanoid)

Large-scale comparison of trees with OrthoMCL clusters Algorithm to compare each OrthoMCL cluster to a tree and classify as: Perfect match to tree Consistent with tree Inconsistent with tree Manually review inconsistencies with the aim to improve trees

Clusters from different “orthology” methods E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. OrthoMCL in red; PhiGs in blue; InParanoid in green An “ortholog cluster” is made by one or more “slices” through the protein family tree Some combination of evolutionary rates and history of duplications Might miss genes that have inherited some but not all functions from the MRCA

Perfect agreement E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m.

Perfect agreement E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m.

Consistent E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m.

Inconsistent (blue) E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m.

OrthoMCL clusters overlaid on PANTHER trees 14695 non-singleton clusters from P-POD spanning 12 RefGenomes 4815 trees from PANTHER 62% 20% 18%

Validating trees by comparing with other tree methods TreeFam Compare tree topology Robinson-Foulds “symmetric difference distance” (requires exact match of all leaf nodes) Compare ortholog and within-species paralog predictions Requires only a match of a subset of leaf nodes

GIGA trees on “full” TreeFam alignments are more similar to “clean” TreeFam trees 0.2 0.4 0.6 0.8 1 Robinson-Foulds tree distance GIGA-full vs TreeFam-clean (red) TreeFam-full vs. TreeFam-clean (blue)

GIGA trees are robust to addition of more sequences 0.2 0.4 0.6 0.8 1 Robinson-Foulds tree distance GIGA-full vs. GIGA-clean (red) TreeFam-full vs. TreeFam-clean (blue)

Next steps Start annotation of trees using PAINT Review first trees with all GO curators to work out process Begin quantitatively tracking progress, e.g. Number of families annotated, number of homology annotations inferred, number of homology annotations inferred per experimental annotation Compare consistency with OrthoMCL using the same dataset Review and correct trees if necessary before GO annotation of tree Compare tree algorithm with TreeFam curated seed trees (incorporate subtrees from TreeFam if they are superior) Map additional orthology methods to trees InParanoid TreeFam