Download presentation
Presentation is loading. Please wait.
1
Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Tutorial: Bioinformatics Resources Tutorial: Bioinformatics Resources (http://pir.georgetown.edu/~huz/class/bioinfo_resource.html)http://pir.georgetown.edu/~huz/class/bioinfo_resource.html
2
2 computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Bioinformatics?
3
3 Molecular Biology Database Collection -- 858 key databases of 15 categories (http://nar.oxfordjournals.org/cgi/content /full/34/suppl_1/D3/DC1)http://nar.oxfordjournals.org/cgi/content /full/34/suppl_1/D3/DC1
4
4 Database Collection in Nucleic Acids Res.
5
5 http://pir.georgetown.edu/~huz/class/2005_database_update.html Online Access to Database Collection http://www.oxfordjournals.org/nar/database/cap/ 2006
6
6 Overview I. Text search / Information retrieval II. Sequence & genomics databases III. Protein family databases IV. Database of protein functions V. Databases of protein structures VI. Proteomics databases Database Contents, Search and Retrieval
7
7 Text Searches Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)http://www.ncbi.nlm.nih.gov/Entrez/
8
8 PubMed Literature Database ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed
9
9 UniProt Text Search (http://www.pir.uniprot. org/cgi-bin/textSearch)http://www.pir.uniprot. org/cgi-bin/textSearch Google type search vs. Boolean searches: AND, OR, NOT
10
10 PIR Text Search (I) (http://pir.georgetown.edu/pirwww/ search/textsearch.html) http://pir.georgetown.edu/pirwww/ search/textsearch.htmlhttp://pir.georgetown.edu/pirwww/ search/textsearch.html Search: Alpha crystallin A chain and protein family?
11
11 PIR Text Search (II) Can you find which crystallin that has 3D structure determined? Search: Crystallins that are enzymes ?
12
12 I. Sequence & Genomics Databases GenBank An annotated collection of all publicly available nucleotide and protein sequences. GenBank : An annotated collection of all publicly available nucleotide and protein sequences. RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products UniProt Consortium Database : U niversal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. Entrez Gene: Gene-centered information at NCBI. UniGene: Unified clusters of ESTs and full-length mRNA sequences. OMIM : Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards : Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database
13
13 UniProt Consortium Databases (http://www.uniprot.org) http://www.uniprot.org 2.85 million Universal Protein Resource UniProtKB UniRef UniParc
14
14 UniProt Sequence Report (I) (http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT)http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT What’s the difference between CRYAA_RABIT & CYRBAA?
15
15 UniProt Sequence Report (II) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489 (http://www.pir.uni prot.org/cgi- bin/unipEntry?id= UniRef90_P02489)http://www.pir.uni prot.org/cgi- bin/unipEntry?id= UniRef90_P02489
16
16 Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd =Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq
17
17 OMIM: Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580
18
18 II. Protein Family Databases Whole Proteins PIRSF: A Network Classification System of Protein Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families CDD: Conserved Domain Database Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily
19
19 Protein Clustering COGs: (http://www.ncbi.nlm. nih.gov/COG/) http://www.ncbi.nlm. nih.gov/COG/http://www.ncbi.nlm. nih.gov/COG/
20
20 KOGs: Eukaryotic Clusters (http://www.ncbi.nlm.nih. gov/COG/new/shokog.cgi? KOG3591)http://www.ncbi.nlm.nih. gov/COG/new/shokog.cgi? KOG3591
21
21 Domain Classification (http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT)http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT (http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT)http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT
22
22 Pfam Domain (http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525)http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525
23
23 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac. uk/interpro/search. html)http://www.ebi.ac. uk/interpro/search. html
24
24 PIRSF: Full Length Classification iProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280
25
25 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)http://us.expasy.org/prosite/
26
26 III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins BioCarta: Biological pathways of human and mouse GO: Gene Ontology Consortium Database
27
27 KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1)http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1 KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)http://www.genome.ad.jp/kegg/kegg2.html
28
28 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)http://biocyc.org/
29
29 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)http://www.biocarta.com/index.asp
30
30 Protein-Protein Interaction: BIND (http://www.bind.ca/) http://www.bind.ca/
31
31 Gene Ontology (http://www.geneontology.org/) http://www.geneontology.org/ Three GOs: Molecular Function Biological Process Cellular Component
32
32 IV. Databases of Protein Structures Protein Structure PDB: Structure Determined by X-ray Crystallography and NMR PDBsum: Summaries and analyses of PDB structures MMDB: NCBI’s database of 3D structures, part of NCBI Entrez SWISS-MODEL Repository: Database of annotated protein 3D models ModBase: Annotated comparative protein structure models Structure Classification CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Classification Based on Structure--Structure Alignment
33
33 PDB: Experimental 3D Structure Repository (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/ Rat gamma-crystallin, chain A, B. Can you do a text search at PIR to find this?
34
34 PDBsum: Summary and Analysis Summary and Analysis (http://www.ebi.ac.uk/thornton- srv/databases/pdbsum/)http://www.ebi.ac.uk/thornton- srv/databases/pdbsum/ Search 3-D structure summary 2-D structure
35
35 Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)http://www.biochem. ucl.ac.uk/bsm/cath_new/
36
36 Protein Structural Classification (2) (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.
37
37 SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models A database of annotated three-dimensional comparative protein structure models (http://swissmodel.expasy.org/repository/s mr.php?sptr_ac=CRGE_RAT&job=2)http://swissmodel.expasy.org/repository/s mr.php?sptr_ac=CRGE_RAT&job=2
38
38 VI. Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) http://gelbank.anl.govhttp://www.expasy.org/ch2d/http://gelbank.anl.govhttp://www.expasy.org/ch2d/ PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences http://cubic.bioc.columbia.edu/ pep/http://cubic.bioc.columbia.edu/ pep/ Integr8 ( http://www.ebi.ac.uk/integr8/ ): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets http://www.ebi.ac.uk/integr8/ PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databases http://www.ebi.ac.uk/pride/ GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databases http://gpmdb.thegpm.org/
39
39 2D-Gel Image Databases (1) (http://us.expasy.org/ch2d/2d-index.html)http://us.expasy.org/ch2d/2d-index.html (http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489)http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489
40
40 2D-Gel Image Databases (2) (http://gelbank.anl.gov/2dgels/index.asp)http://gelbank.anl.gov/2dgels/index.asp
41
41 GPMdb MS Data Search http://gpmdb.thegpm.org/ Craig, et al., J Proteome Res. 2004, 3:1234-42.
42
42 iProLINK: Protein Literature Mining Resource http://pir.georgetown.edu/iprolink/ Text mining of Protein phospohrylation Gene/protein name thesaurus: synonyms, ambiguous names…
43
43 Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to. Delta crystallin II (Argininosuccinate lyase) (UniProt: ARLY2_ANAPL/P24058) Alpha crystallin A (UniProt: CRYAA_RABIT/P02493)Lab:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.