Download presentation
Presentation is loading. Please wait.
Published byGyőző Papp Modified over 6 years ago
1
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics Department of Preventive Medicine Keck School of Medicine University of Southern California
2
Outline Brief history of PANTHER What are the main features of PANTHER? How are trees used in PANTHER? What annotations can be obtained from PANTHER? How do I use PANTHER in InterPro?
3
Brief History Initial papers on PANTHER database and tools New tree algorithm Thomas PD. BMC bioinformatics 2010 PANTHER 12 launched - HMMER3 1998 2001 2003 2005 2008 2010 2013 2017 PANTHER joins InterPro Integrated into GO Reference Genome curation pipeline. 2009, PLoS Comput. Biology Gaudet P, 2011, Brief Bioinformatics Start of the project Human genome. Use of UniProt reference proteomes
4
PANTHER features Comprehensive Covers eukaryotes and prokaryotes
Genome coverage depends on species Vertebrates: 82% - 97% Invertebrates: 50% - 80% Fungi: 45% - 81% Plants: 54% - 89% Bacteria: 40% - 82%
5
Built on entire sequences, not single domains
PANTHER features Built on entire sequences, not single domains Designed to classify at two distinct levels Family: homologous over at least part of the sequence Subfamily: generally orthologous, i.e. descending from the same gene in common ancestral genome Extensively annotated Protein function (Gene Ontology) Protein class (classifications “above” family level)
6
PANTHER Library Building
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms There are a number of tools available from PANTHER: I will focus on classification of protein sequences Web services ~1,000,000 sequences In ~15,000 family clusters
7
PANTHER Library Building
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters
8
PANTHER is a collaborative project
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters
9
Families, Trees and Subfamilies
10
Families, Trees and Subfamilies
duplication Subfamily nodes speciation
11
Annotations of HMMs Gene Ontology Gene/protein function
PANTHER “GO-slim” Shows only selected terms from GO Currently 550 terms vs ~45,000 for entire GO Examples: Molecular function: INSR + ‘receptor activity’ Cellular component: INSR + ‘plasma membrane’ Biological process: INSR + ‘insulin receptor signaling pathway’ Molecular level Where molecule functions What larger processes (biological programs) it is used in
12
GO Phylogenetic Annotation Project
Review experimental GO annotations Build a model of function evolution Gain and loss of specific functions along tree branches Model predicts functions for unannotated genes Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi: /bib/bbr042 PAINT Phylogenetic Annotation and Inference Tool
13
carboxylic ester hydrolase
Green indicates experimental Black dot indicates direct experimental data. White dot indicates a more general functional class inferred from ontology carboxylic ester hydrolase Red indicates NOT function for the gene cholinesterase
14
carboxylic ester hydrolase
Inheritance of functions through tree, unless following a loss If a corresponding experimental annotation is not already present, these are PREDICTIONS Node with loss of function carboxylic ester hydrolase Node with gain of function- cholinesterase neuroligins Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi: /bib/bbr042
15
Annotations of HMMs Protein class
~250 distinct terms arranged into a DAG Groupings of protein families and subfamilies Related functions E.g. homeodomain vs. C2H2 zinc finger transcription factor Enzymatic mechanism E.g. cysteine protease vs. serine protease Substrate specificity E.g. potassium channel vs. sodium channel Protein class groups based on diverse criteria that are based on classifications used by experts in a given area
16
PANTHER HMM scoring
17
PANTHER HMM scoring PANTHER Hit
-PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)
18
PANTHER family HMM annotation
19
PANTHER family HMM annotation
20
PANTHER family HMM annotation
21
PANTHER HMM scoring PANTHER Hit
-PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)
22
PANTHER subfamily HMM annotation
23
Summary PANTHER uses phylogenetic trees to classify protein sequences
Sequences in 103 genomes have already been classified at pantherdb.org New sequences can be classified using PANTHER HMMs in InterProScan Detailed annotation can be derived from PANTHER HMM hits Gene Ontology Protein Class Pathways
24
Acknowledgements USC Huaiyu Mi Anushya Muruganujan Xiaosong Huang
John Casagrande Sagar Poudel GO Consortium Suzanna Lewis Pascale Gaudet Marc Feuermann Quest for Orthologs Consortium Brigitte Boeckmann Christophe Dessimoz Adrian Altenhoff UniProt (Reference Proteomes Project) Maria Martin Alan Wilter de Sousa InterPro Rob Finn Lorna Richardson Alex Mitchell Neil Rawlings pantherdb.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.