PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.

Slides:



Advertisements
Similar presentations
Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Homology Profile-HMMs Domains Protein-family Databases How to build a new (Pfam) protein family EMBO Workshop, Cape Town, 2014 Function annotation transfer.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Tree of Life Chapter 26.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Comparative genomics Joachim Bargsten February 2012.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
COG and GO tutorial.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Internet tools for genomic analysis: part 2
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Phylogeny and the Tree of Life
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Automatic methods for functional annotation of sequences Petri Törönen.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
HOGENOM a phylogenomic database
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Phylogeny & Systematics
InterPro Sandra Orchard.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Protein families, domains and motifs in functional prediction May 31, 2016.
Phylogeny and the Tree of Life
Web Apollo/JBrowse • JBrowse is a web based genome browser
Bioinformatics Overview
Protein families, domains and motifs in functional prediction
Evolutionary genomics can now be applied beyond ‘model’ organisms
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
Basics of Comparative Genomics
Mental Functioning and the Gene Ontology
P-POD-PANTHER: update
Department of Genetics • Stanford University School of Medicine
The Major Lineages of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
Ensembl Genome Repository.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Introduction to Bioinformatics
Phylogeny and the Tree of Life
Presentation transcript:

PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics Department of Preventive Medicine Keck School of Medicine University of Southern California

Outline Brief history of PANTHER What are the main features of PANTHER? How are trees used in PANTHER? What annotations can be obtained from PANTHER? How do I use PANTHER in InterPro?

Brief History Initial papers on PANTHER database and tools New tree algorithm Thomas PD. BMC bioinformatics 2010 PANTHER 12 launched - HMMER3 1998 2001 2003 2005 2008 2010 2013 2017 PANTHER joins InterPro Integrated into GO Reference Genome curation pipeline. 2009, PLoS Comput. Biology Gaudet P, 2011, Brief Bioinformatics Start of the project Human genome. Use of UniProt reference proteomes

PANTHER features Comprehensive Covers eukaryotes and prokaryotes Genome coverage depends on species Vertebrates: 82% - 97% Invertebrates: 50% - 80% Fungi: 45% - 81% Plants: 54% - 89% Bacteria: 40% - 82%

Built on entire sequences, not single domains PANTHER features Built on entire sequences, not single domains Designed to classify at two distinct levels Family: homologous over at least part of the sequence Subfamily: generally orthologous, i.e. descending from the same gene in common ancestral genome Extensively annotated Protein function (Gene Ontology) Protein class (classifications “above” family level)

PANTHER Library Building Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms There are a number of tools available from PANTHER: I will focus on classification of protein sequences Web services ~1,000,000 sequences In ~15,000 family clusters

PANTHER Library Building Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters

PANTHER is a collaborative project Curation Annotated families and subfamilies ~1,300,000 sequences from 103 organisms Web services ~1,000,000 sequences In ~15,000 family clusters

Families, Trees and Subfamilies

Families, Trees and Subfamilies duplication Subfamily nodes speciation

Annotations of HMMs Gene Ontology Gene/protein function PANTHER “GO-slim” Shows only selected terms from GO Currently 550 terms vs ~45,000 for entire GO Examples: Molecular function: INSR + ‘receptor activity’ Cellular component: INSR + ‘plasma membrane’ Biological process: INSR + ‘insulin receptor signaling pathway’ Molecular level Where molecule functions What larger processes (biological programs) it is used in

GO Phylogenetic Annotation Project Review experimental GO annotations Build a model of function evolution Gain and loss of specific functions along tree branches Model predicts functions for unannotated genes Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042 PAINT Phylogenetic Annotation and Inference Tool

carboxylic ester hydrolase Green indicates experimental Black dot indicates direct experimental data. White dot indicates a more general functional class inferred from ontology carboxylic ester hydrolase Red indicates NOT function for the gene cholinesterase

carboxylic ester hydrolase Inheritance of functions through tree, unless following a loss If a corresponding experimental annotation is not already present, these are PREDICTIONS Node with loss of function carboxylic ester hydrolase Node with gain of function- cholinesterase neuroligins Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042

Annotations of HMMs Protein class ~250 distinct terms arranged into a DAG Groupings of protein families and subfamilies Related functions E.g. homeodomain vs. C2H2 zinc finger transcription factor Enzymatic mechanism E.g. cysteine protease vs. serine protease Substrate specificity E.g. potassium channel vs. sodium channel Protein class groups based on diverse criteria that are based on classifications used by experts in a given area

PANTHER HMM scoring

PANTHER HMM scoring PANTHER Hit -PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)

PANTHER family HMM annotation

PANTHER family HMM annotation

PANTHER family HMM annotation

PANTHER HMM scoring PANTHER Hit -PANTHER family and subfamily (if HMM score > family) -name (e.g., WNT-RELATED)

PANTHER subfamily HMM annotation

Summary PANTHER uses phylogenetic trees to classify protein sequences Sequences in 103 genomes have already been classified at pantherdb.org New sequences can be classified using PANTHER HMMs in InterProScan Detailed annotation can be derived from PANTHER HMM hits Gene Ontology Protein Class Pathways

Acknowledgements USC Huaiyu Mi Anushya Muruganujan Xiaosong Huang John Casagrande Sagar Poudel GO Consortium Suzanna Lewis Pascale Gaudet Marc Feuermann Quest for Orthologs Consortium Brigitte Boeckmann Christophe Dessimoz Adrian Altenhoff UniProt (Reference Proteomes Project) Maria Martin Alan Wilter de Sousa InterPro Rob Finn Lorna Richardson Alex Mitchell Neil Rawlings pantherdb.org feedback@pantherdb.org