Presentation is loading. Please wait.

Presentation is loading. Please wait.

PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.

Similar presentations


Presentation on theme: "PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics."— Presentation transcript:

1 PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics Department of Preventive Medicine Keck School of Medicine University of Southern California

2 Outline Brief history of PANTHER What are the main features of PANTHER? How are trees used in PANTHER? What annotations can be obtained from PANTHER? How do I use PANTHER in InterPro?

3 Brief History Initial papers on PANTHER database and tools New tree algorithm Thomas PD. BMC bioinformatics 2010 PANTHER 12 launched - HMMER3 1998 2001 2003 2005 2008 2010 2013 2017 PANTHER joins InterPro Integrated into GO Reference Genome curation pipeline. 2009, PLoS Comput. Biology Gaudet P, 2011, Brief Bioinformatics Start of the project Human genome. Use of UniProt reference proteomes

4 PANTHER features Comprehensive Covers eukaryotes and prokaryotes
Genome coverage depends on species Vertebrates: 82% - 97% Invertebrates: 50% - 80% Fungi: 45% - 81% Plants: 54% - 89% Bacteria: 40% - 82%

5 Built on entire sequences, not single domains
PANTHER features Built on entire sequences, not single domains Designed to classify at two distinct levels Family: homologous over at least part of the sequence Subfamily: generally orthologous, i.e. descending from the same gene in common ancestral genome Extensively annotated Protein function (Gene Ontology) Protein class (classifications “above” family level)

6 PANTHER Library Building
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms There are a number of tools available from PANTHER: I will focus on classification of protein sequences Web services ~1,000,000 sequences In ~15,000 family clusters

7 PANTHER Library Building
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters

8 PANTHER is a collaborative project
Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters

9 Families, Trees and Subfamilies

10 Families, Trees and Subfamilies
duplication Subfamily nodes speciation

11 Annotations of HMMs Gene Ontology Gene/protein function
PANTHER “GO-slim” Shows only selected terms from GO Currently 550 terms vs ~45,000 for entire GO Examples: Molecular function: INSR + ‘receptor activity’ Cellular component: INSR + ‘plasma membrane’ Biological process: INSR + ‘insulin receptor signaling pathway’ Molecular level Where molecule functions What larger processes (biological programs) it is used in

12 GO Phylogenetic Annotation Project
Review experimental GO annotations Build a model of function evolution Gain and loss of specific functions along tree branches Model predicts functions for unannotated genes Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi: /bib/bbr042 PAINT Phylogenetic Annotation and Inference Tool

13 carboxylic ester hydrolase
Green indicates experimental Black dot indicates direct experimental data. White dot indicates a more general functional class inferred from ontology carboxylic ester hydrolase Red indicates NOT function for the gene cholinesterase

14 carboxylic ester hydrolase
Inheritance of functions through tree, unless following a loss If a corresponding experimental annotation is not already present, these are PREDICTIONS Node with loss of function carboxylic ester hydrolase Node with gain of function- cholinesterase neuroligins Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi: /bib/bbr042

15 Annotations of HMMs Protein class
~250 distinct terms arranged into a DAG Groupings of protein families and subfamilies Related functions E.g. homeodomain vs. C2H2 zinc finger transcription factor Enzymatic mechanism E.g. cysteine protease vs. serine protease Substrate specificity E.g. potassium channel vs. sodium channel Protein class groups based on diverse criteria that are based on classifications used by experts in a given area

16 PANTHER HMM scoring

17 PANTHER HMM scoring PANTHER Hit
-PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)

18 PANTHER family HMM annotation

19 PANTHER family HMM annotation

20 PANTHER family HMM annotation

21 PANTHER HMM scoring PANTHER Hit
-PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)

22 PANTHER subfamily HMM annotation

23 Summary PANTHER uses phylogenetic trees to classify protein sequences
Sequences in 103 genomes have already been classified at pantherdb.org New sequences can be classified using PANTHER HMMs in InterProScan Detailed annotation can be derived from PANTHER HMM hits Gene Ontology Protein Class Pathways

24 Acknowledgements USC Huaiyu Mi Anushya Muruganujan Xiaosong Huang
John Casagrande Sagar Poudel GO Consortium Suzanna Lewis Pascale Gaudet Marc Feuermann Quest for Orthologs Consortium Brigitte Boeckmann Christophe Dessimoz Adrian Altenhoff UniProt (Reference Proteomes Project) Maria Martin Alan Wilter de Sousa InterPro Rob Finn Lorna Richardson Alex Mitchell Neil Rawlings pantherdb.org


Download ppt "PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics."

Similar presentations


Ads by Google