Download presentation
Presentation is loading. Please wait.
Published byDwain Hodges Modified over 9 years ago
1
Understanding proteins: resources for identification and annotation
2
The Gene Ontology: Annotating protein function, role and localization Contact: Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk
3
What is an ontology?
4
→Collectibles & art →Stamps →UK (Great Britain)Victoria →1884 GREAT BRITAIN 10S SCOTT (11,999.99$) A definition... “A controlled representation of ideas, concepts or events in a given domain and the relationships between them.”
5
Why do we need ontologies? Help with data retrieval allow grouping of annotations brain20 hindbrain15 rhombomere10 Adapted from Barry Smith: http://ontology.buffalo.edu/smith/BioOntology_Course.html Query ‘brain’ without ontology20 Query ‘brain’ with ontology45 Make data (re-)usable through standards Common structure and terminology (controlled vocabulary) Avoid redundancies (single data source) Allow common tools, techniques, training, validation...
6
Gene ontology What is the gene ontology? Organized, controlled vocabulary of terms that describe gene products characteristics. http://geneontology.org/ Represents gene product properties, not gene products themselves Three branches (domains): Cellular component Molecular function Biological process Species-independent (with taxonomic restrictions) Represents physiological processes Goes up to the level of the cell
7
The Gene Ontology is like a dictionary term: transcription initiation definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. id: GO:0006352 How does GO work?
8
Clark et al., 2005 part_of is_a GO tree and annotations
9
GO terms for Caspase 9 An annotation example…
10
attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Which processes are up- or down- regulated? Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.
11
QuickGO: browsing GO Term definition http://www.ebi.ac.uk/QuickGO/
12
QuickGO: browsing GO Term relationships (ancestors)
13
QuickGO: browsing GO Term relationships (children)
14
QuickGO: browsing GO Proteins annotated to term
15
Annotation and ontology files www.geneontology.org/GO.downloads.shtml Ontology files: Hold ontology terms and structure Species-independent You can get GO-slims Annotation files: Hold list of terms and the proteins annotated with them You can get species- specific files or the whole annotation.
16
More about GO: EBI train online www.ebi.ac.uk/training/online/course/go-quick-tour www.ebi.ac.uk/training/online/course/uniprot-goa-quick-tour
17
Acknowledgements & questions Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk
18
UniProt: A repository of annotated protein sequences Contact: Duncan Legge UniProt Content Team EBI-EMBL help@uniprot.org dlegge@ebi.ac.uk
19
Background of UniProt Since 2002 a merger and collaboration of three databases: Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database Swiss-Prot & TrEMBLPIR-PSD
20
We Aim To Provide… o A high quality protein sequence database A non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. o Easy protein identification Stable identifiers and consistent nomenclature / controlled vocabularies o Thorough protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source
21
The Two Sides of UniProtKB Non-redundant, high-quality manual annotation - reviewed Redundant, automatically annotated - unreviewed UniProtKB/TrEMBL 1 entry per nucleotide submission UniProtKB/Swiss-Prot 1 entry per protein
22
UniProtKB/Swiss-Prot Manually annotated UniProtKB/TrEMBL Computationally annotated
23
Data sources of UniProtKB UniProt/TrEMBL VEGA (Sanger) WormBase FlyBase Sub/ Peptide Data PDB Patent Data Ensembl ENA (EMBL) DNA database mRNA Data
24
Curation of a UniProt/SwissProt entry Sequence Sequence variants Nomenclature Sequence features UniProt/TrEMBL UniProt/SwissProt Ontologies Literature Annotations References
25
UniProt Website www.uniprot.org
26
UniProt layout
28
Annotation comments FUNCTION SUBCELLULAR LOCATION ALTERNATIVE PRODUCTS TISSUE SPECIFICITY DEVELOPMENTAL STAGE INDUCTION SIMILARITY CATALYTIC ACTIVITY COFACTOR ENZYME REGULATION BIOPHYSICOCHEMICAL- PROPERTIES PATHWAY SUBUNIT INTERACTION PTM RNA EDITING MASS SPECTROMETRY DOMAIN POLYMORPHISM DISRUPTION PHENOTYPE ALLERGEN DISEASE TOXIC DOSE BIOTECHNOLOGY PHARMACEUTICAL MISCELLANEOUS CAUTION SEQUENCE CAUTION WEB RESOURCE
29
Controlled vocabularies used whenever possible Evidence tags to show source
30
Master headline
31
Proteomes in UniProt Complete proteomes Complete sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced. Reference proteomes Some complete proteomes have been selected as reference proteome sets. These cover the proteomes of well- studied model organisms and other proteomes of interest for biomedical research.
32
Obtaining Proteomes
33
Help / Feedback Stuck? Just ask – active help and support team Feedback – if you find something incorrect, outdated, missing etc please tell us. help@uniprot.org
34
www.ebi.ac.uk/training/online/course/uniprot-quick-tour/ Find out more: EBI online courses
35
Acknowledgements & questions Duncan Legge UniProt Content Team EBI-EMBL dlegge@ebi.ac.uk
36
InterPro: An integrated protein sequence analysis resource Contact: Amaia Sangrador InterPro curation Team EBI-EMBL interhelp@ebi.ac.uk amaia@ebi.ac.uk
37
What is InterPro? InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites
38
The aim of InterPro InterPro
39
Protein annotation: a predictive approach This is the approach taken by protein signature databases Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment We can use these models to infer relationships with the characterised sequences from which the alignment was constructed
40
Full alignment methods Single motif methods Patterns Multiple motif methods Fingerprints Three (4) different protein signature approaches Profiles & Hidden Markov models (HMMs)
41
Structural domains Functional annotation of families/domains Protein features (sites) Hidden Markov Models Finger prints Profiles Patterns HAMAP InterPro Consortium
42
DatabaseBasisInstitution Built from FocusURL PfamHMMSanger Institute Sequence alignment Family & Domain based on conserved sequence http://pfam.sanger.ac.uk/ Gene3DHMMUCL Structure alignment Structural Domain http://gene3d.biochem.ucl.a c.uk/Gene3D/ SuperfamilyHMMUni. of Bristol Structure alignment Evolutionary domain relationships http://supfam.cs.bris.ac.uk/ SUPERFAMILY/ SMARTHMMEMBL Heidelberg Sequence alignment Functional domain annotation http://smart.embl- heidelberg.de/ TIGRFAMHMMJ. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http://www.jcvi.org/cms/rese arch/projects/tigrfams/overv iew/ PantherHMMUni. S. California Sequence alignment Family functional classification http://www.pantherdb.org/ PIRSFHMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification http://pir.georgetown.edu/pir www/dbinfo/pirsf.shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http://www.bioinf.mancheste r.ac.uk/dbbrowser/PRINTS/i ndex.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http://expasy.org/prosite/ HAMAPProfilesSIB Sequence alignment Microbial protein family classification http://expasy.org/sprot/ham ap/ ProDom Sequence clustering PRABI : Rhône-Alpes Bioinformatics Center Sequence alignment Conserved domain prediction http://prodom.prabi.fr/prodo m/current/html/home.php
43
Signatures are provided by member databases They are scanned against the UniProt database to see which sequences they match Curators manually inspect the matches before integrating the signatures into InterPro InterPro signature integration process Signatures representing the same entity are integrated together Relationships between entries are traced, where possible Curators add literature referenced abstracts, cross-refs to other databases, and GO terms
44
http://www.ebi.ac.uk/interpro/
45
Search using the key word: CD4 Let’s find some information about T-cell surface antigen CD4 in InterPro Using InterPro
46
Results from the “CD4” key word search
47
Type Name Identifier Contributing signatures Description Go terms References Family-centered view
48
Search using human CD4 protein sequence Using InterPro
49
Type Name Identifier Domains Family Protein-centered view
50
Type Name Identifier Contributing signatures Description References Domain-centered view
51
Using InterPro with unknown sequences: InterProScan Search with unknown protein sequence InterProScan is the software package that allows sequences to be scanned against InterPro's signatures
52
InterPro entries and contributing signatures Unintegrated signatures (not reviewed)
53
InterPro usage within the EBI Used by UniProtKB curators in their annotation of Swiss-Prot proteins Forms part of the automated system that adds annotation to UniProtKB/TrEMBL Provides matches to over 80% of UniProtKB Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKB sequences outside the EBI 50,000 unique visitors to the web site per month > 2 million sequences searched online per month Plus offline searches with downloadable version
54
Probabilistic models != biological certainty We are using biologically-unaware search tools and probabilistic models Ask questions, weigh the evidence Remember!
55
Caveats We need your feedback! missing/additional references reporting problems requests Sheer amount of data can be overwhelming Member databases do not always agree! InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry! interhelp@ebi.ac.uk
56
www.ebi.ac.uk/training/online/course-list/introduction-protein-classification-ebi www.ebi.ac.uk/training/online/course/interpro-quick-tour www.ebi.ac.uk/training/online/course/interpro-functional-and-structural-analysis-protei Find out more: EBI online courses
57
Acknowledgements & questions Amaia Sangrador InterPro curation team EBI-EMBL amaia@ebi.ac.uk
58
PDBe: Protein Data Bank in Europe Contact: Gary Battle Project Leader Outreach PDBe battle@ebi.ac.uk http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope
59
PDBe overview Mission: Bringing Structure to Biology Major activities: Deposition and annotation site for structural data on biomacromolecules (X-ray, NMR, EM) Integration of macromolecular structure data with important biological and chemical data resources Provide tools and services for accessing, exploiting and disseminating structural data to the wider biomedical community
60
Worldwide Protein Data Bank (wwPDB)
63
PDBeXplore Browse the PDB using familiar classification systems (enzymes, folds, families, compounds, taxonomy, sequence). Latest structures: pdbe.org/pdbexplore
64
PDBePISA Exploration of macromolecular (protein, DNA/RNA and ligand) interfaces and prediction of probable quaternary structures. Predict quaternary structure: pdbe.org/pisa
65
PDBeFold Interactive comparison, alignment and superposition based on protein secondary structure. Find similar structures: pdbe.org/fold
66
PDBeMotif Flexible 3D search and analysis of protein-ligand interactions, binding environments and structural motifs. Analyse binding sites and motifs: pdbe.org/motif
67
NMR resources and services Visualisation and validation of NMR models and data. NMR resources: pdbe.org/nmr
68
EM resources and services Comprehensive search and analysis tools for EMDB entries. EM resources: pdbe.org/em
69
Electron Microscopy Data Bank (EMDB) Global public repository for EM density maps of macromolecular complexes and subcellular structures Founded at EBI in 2002 Jointly operated by PDBe, RCSB and NCMI PDBe EM portal provides advanced search, visualisation and analysis services. http://pdbe.org/emdb
70
Educational resources: Quips Interactive exploration of interesting structures from the PDB Quite interesting PDB structures: pdbe.org/quips
71
Stay informed… http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope
72
www.ebi.ac.uk/training/online/course/pdbe-quick-tour/ Find out more: EBI online courses
73
Acknowledgements & questions Gary Battle EBI-EMBL battle@ebi.ac.uk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.