Download presentation
Presentation is loading. Please wait.
1
Demo: Protein Information Resource
October 10, 2003 NIH Proteomics Workshop Bethesda, MD Raja Mazumder, Ph.D. Scientific Coordinator and Senior Protein Scientist, PIR
2
Database Demo NREF Database
NREF Entry (NF ) iProClass Database iProClass Sequence (A58910), Motif (PCM00487) PIR-PSD Database PIR Entry (A58910) Other Molecular Databases Function: KEGG Enzyme (EC ), KEGG Pathway (MAP00230); BRENDA (EC ) Structure: PDB (1AK5), SCOP (Alanine Racemase), CATH (1AK5) Domain: Pfam (PF00478), CDD (HemL) Classification: COGs (COG0001)
3
PIR Web Site (http://pir.georgetown.edu)
4
Text Search Result
5
Text Search Result with NULL/NOT NULL
6
Peptide Search Results
7
PIR-NREF Search Result (I)
Test Sequence: ftp://nbrfa.georgetown.edu/pir/misc/test.seq
8
PIR-NREF Search Result (II)
9
PIR Pattern Search
10
PIR Pattern Search Result (I)
Pattern Match: Sequence vs. PROSITE
11
PIR Pattern Search Result (II)
Search a query pattern against a sequence database.
12
PIR Domain Display
13
PIR-NREF Database (http://pir. georgetown. edu/pirwww/search/pirnref
14
PIR-NREF Report
15
Related Sequences
16
PIR-iProClass Database
17
iProClass Sequence Report
18
PDB Structure of Molecule: Inosine-5'-Monophosphate Dehydrogenase
19
Development of protein sequence databases
Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein Information Resource (PIR) Protein data bank (PDB) – structural database (1972) remains most widely used database of structures SWISSPROT – protein sequence database (1987) still in use – not exhaustive but heavily annotated UniProt: The United Protein Databases (UniProt, 2003) will create a central database of protein sequence and function by joining the forces of the SWISS-PROT, TrEMBL and PIR protein database activities
20
Protein Family Classification
Discovery of New Knowledge by Using Information Embedded within Families of Homologous Sequences and Their Structures Superfamily and Domain Classification Superfamily Concept End-to-End Similarity & Same Overall Domain Architecture Significance Improve Sensitivity of Protein Identification Provide Complete Clustering for Database Organization Detect and Correct Genome Annotation Errors Systematically Drive Other Annotations Stimulate Evolution, Genomics and Proteomics Research Classification allows new identifications that predict functional properties of genome-derived sequences, as well as correcting errors, and providing a framework for organizing sequence space.
21
Protein Family/Superfamily Definitions
A Set of Protein Sequences That Share a Common Evolutionary Ancestor with End-to-End Sequence Similarity (No Major Discrepancy by Standard Multiple Alignment Methods) Have the Same Domain Architecture (Except Incomplete or Alternately Spliced) Overall Sequence Identity ?% Superfamily A Set of Protein Families That Share a Common Evolutionary Ancestor From End-to-end Have the Same Domain Architecture Best-hit rule Closely related sequences – usually at least 50% identical in sequence and about the same length -- are grouped into families. These proteins are expected to have very similar structure, activity, and properties. A superfamily may consist of several or many families. Structures are recognizably similar, but functional properties and catalytic activities can differ.
22
Protein Domain Definition
Domains can be described as discrete structurally conserved units in proteins that are evolutionary mobile They typically correspond to discrete globular folding units in the structure of a protein and may often occur independently of other domains in the protein A Recognizable Region of Similarity Have a Common Ancestry Found in Diverse Protein Sequences (in >= 2 Superfamilies) A Sequence Can Belong to Only One Protein Family and Superfamily, but May Contain More Than One Domains.
23
Network structure of protein classification
P-loop NTPase (Structural fold) Domain superfamilies AAA+ ATPases DNA pumping ATPases RecA/SF1/SF2 helicase lineage Homeomorphic families Replicative DNA helicase ATPase Nucleic acid helicase VACa-D5Rb MCV-MC094R SFV-gp080R FPV-FPV058 MSV-MSV089 AMV-AMV087 VAC-A32L MCV-MC140L SFV-gp120L FPV-FPV197 MSV-MSV171 AMV-AMV150 VAC-A18R MCV-MC123R SFV-gp108R FPV-FPV183 MSV-MSV148 AMV-AMV059
24
Network structure of protein classification
25
Superfamily-Domain-Motif Relationship
Here, for example, are diagrammatic representations of 5 superfamilies containing the calcineurin-like phosphoesterase domain, in several cases in association with other known types of domains.
26
iProClass Superfamily List
All Superfamilies Containing PF00001 Superfamily-Domain Relationship: ~6000 SFs have >=1 Domains Superfamily for Domain Architecture Provide more annotation and select seeds for the 6000 SFs for initial interpro xref SF assignment: based on global score and domain match
27
iProClass Superfamily Report
28
Alignment and Tree View
29
PIR-Protein Sequence Database
30
PIR-PSD Entry
31
BLAST/FASTA Search
32
PIR FASTA Search Result
33
PIR Searches and Alignment
BLAST Search PIR Searches and Alignment Multiple Alignment & Tree View
34
PIR Hidden Markov Model
HMM Model Building & Sequence Search One Protein Against All HMMs All Proteins Against One HMM
35
PIR Bibliography Submission
View Bibliography Information View Protein Entry Submit Citation with Optional Categorization
36
PIR Bibliography Submission
37
Bibliography Information Display (I)
From PIR-NREF From Other Curated Database
38
Bibliography Information Display (II)
From User Submission From Computer-Mapping (e.g. Gene Symbol)
39
Proteomic Bioinformatics
Large-Scale Analysis of Proteomic Data: Homology Search for Pathways
40
PIR Batch retrieval
41
PIR Batch Retrieval Results
42
Pairwise Alignments
43
PIR Pairwise Alignment
44
Composition & Molecular Weight Calculation
45
Composition & Molecular Weight Calculation
46
PIR support center
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.