Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.

Similar presentations


Presentation on theme: "1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department."— Presentation transcript:

1 1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Tutorial: Bioinformatics Resources (http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html)http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html

2 2 computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Bioinformatics?

3 3 Molecular Biology Database Collection (http://nar.oxfordjournals.org/cgi/content /full/35/suppl_1/D3/DC1)http://nar.oxfordjournals.org/cgi/content /full/35/suppl_1/D3/DC1 -- 968 key databases of 14 categories

4 4 Database Collection in Nucleic Acids Res.

5 5 http://pir.georgetown.edu/pirwww/workshop/2005_database_update.html Online Access to Database Collection http://www.oxfordjournals.org/nar/database/cap/ 2007

6 6 Overview I.Text search / Information retrieval II.Sequence & genomics databases III.Protein family databases IV.Database of protein functions V.Databases of protein structures VI.Proteomics databases Database Contents, Search and Retrieval

7 7 Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)http://www.ncbi.nlm.nih.gov/Entrez/ Lab

8 8 PubMed Literature Database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed)http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed Literature mining Lab

9 9 iProLINK: Protein Literature Mining Resource http://pir.georgetown.edu/iprolink/ Text mining for protein phosphorylation Gene/protein name thesaurus: synonyms, ambiguous names… Lab

10 10 BioThesaurus: Gene/protein name searches - synonyms, ambiguous names… http://pir.georgetown.edu/iprolink/biothesaurus Synonyms: CRYAA crystallin, alpha A CRYA1 HSPB4… Lab

11 11 RLIMS-P: Text mining for protein phosphorylation http://pir.georgetown.edu/iprolink/rlimsp/ Lab

12 12 UniProt Text Search (http://www.pir.uniprot. org/cgi-bin/textSearch)http://www.pir.uniprot. org/cgi-bin/textSearch Google type search vs. Boolean searches: AND, OR, NOT Lab

13 13 PIR Text Search (I) (http://pir.georgetown.edu/pirww w/search/textsearch.html) http://pir.georgetown.edu/pirww w/search/textsearch.htmlhttp://pir.georgetown.edu/pirww w/search/textsearch.html Search: alpha crystallin A chain that are in protein families? Search for synonyms Lab

14 14 PIR Text Search (II) Search: what crystallins are enzymes and what families they belong to? Can you find which crystallins have 3D structure determined? Lab

15 15 I. Sequence & Genomics Databases GenBank: An annotated collection of all publicly available nucleotide and protein sequences. RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products UniProt Consortium Database: Universal protein resource, a central repository of protein sequence and function. Entrez Gene: Gene-centered information at NCBI. UniGene: Unified clusters of ESTs and full-length mRNA sequences. OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards: Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database; International HapMap Project: Genes associated with human disease (http://www.oxfordjournals.org/nar/database/cap/)http://www.oxfordjournals.org/nar/database/cap/

16 16 UniProt Consortium Databases (http://www.uniprot.org) http://www.uniprot.org Universal Protein Resource New! http://beta.uniprot.org/ 5.1 million

17 17 UniProt Sequence Report (I) (http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT)http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT What’s the difference between CRYAA_RABIT & CYRBAA? UniProtKB Lab

18 18 UniProt Report (II): UniRef100 & 90 (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489 (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489 UniRef100 UniRef90

19 19 Entrez Gene – Gene centric information http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

20 20 OMIM: Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580

21 21 II. Protein Family Databases Whole Proteins –PIRSF: Network Classification Based on Evolutionary Relationship of Whole Protein –COG (Clusters of Orthologous Groups) of Complete Genomes –PANTHER: Proteins Classified into Families/Subfamilies of Shared Function –ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains –Pfam: Alignments and HMM Models of Protein Domains –SMART: Protein Domain Families –CDD: Conserved Domain Database Protein Motifs –PROSITE: Protein Patterns and Profiles –BLOCKS: Protein Sequence Motifs and Alignments –PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs) Integrated Family Databases –InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily…

22 22 Protein Clustering COGs: (http://www.ncbi.nlm. nih.gov/COG/) http://www.ncbi.nlm. nih.gov/COG/http://www.ncbi.nlm. nih.gov/COG/ Initial version New version: Includes Eukaryotic Clusters - KOGs

23 23 PIRSF: Full Length Classification iProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280 Lab

24 24 Domain Classification – Pfam Domain (http://pir.georgetown.edu/cgi- bin/ipcEntry?id=P02493)http://pir.georgetown.edu/cgi- bin/ipcEntry?id=P02493 (http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?name= CRYAA_RABIT)http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?name= CRYAA_RABIT

25 25 Pfam Domain (http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525)http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525

26 26 Protein Motifs: PROSITE – A database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)http://us.expasy.org/prosite/

27 27 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/ interpro/search.html)http://www.ebi.ac.uk/ interpro/search.html Mapping of families

28 28 III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds –Enzyme Classification: Classification and Nomenclature of Enzyme- Catalysed Reactions (EC-IUBMB) –KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways –LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes –EcoCyc: Encyclopedia of E. coli Genes and Metabolism –MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) –BRENDA: Enzyme Database –UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Inter-Molecular interactions and Regulatory Pathways –IntAct: Protein interaction data from literature and user submission –BIND: Descriptions of interactions, molecular complexes and pathways –DIP: Catalogs experimentally determined interactions between proteins –Reactome - A curated knowledgebase of biological pathways –BioCarta: Biological pathways of human and mouse –GO: Gene Ontology Consortium Database Pathway Resources - Pathguide

29 29 Biological Pathway Resource Collection http://www.pathguide.org/ Protein-protein interactions Metabolic pathways Signaling pathways Pathway diagrams Transcription factors / gene regulatory networks Protein-compound interactions Genetic interaction networks

30 30 KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1)http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1 KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)http://www.genome.ad.jp/kegg/kegg2.html Lab

31 31 BioCyc: EcoCyc/MetaCyc Metabolic Pathways The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)http://biocyc.org/

32 32 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)http://www.biocarta.com/index.asp

33 33 Reactome: http://www.reactome.org/ http://www.reactome.org/ Collaboration of CSHL, EBI and GO Consortium Curated resource of core pathways and reactions in human biology Authored by biological researchers of field experts Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG… Inferred orthologous events in 22 non-human species (mouse, rat…)

34 34 Transforming Growth Factor (TGF) beta signaling [Homo sapiens] Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] …… (http://reactome.org/cgi- bin/eventbrowser?DB=gk_curre nt&FOCUS_SPECIES=Homo% 20sapiens&ID=170834&)http://reactome.org/cgi- bin/eventbrowser?DB=gk_curre nt&FOCUS_SPECIES=Homo% 20sapiens&ID=170834& Reactome: events and objects (including modified forms and complex)

35 35 Protein-Protein Interaction Database - IntAct (http://www.ebi.ac.uk/intact/)http://www.ebi.ac.uk/intact/

36 36 Gene Ontology (GO) - Molecular Function - Biological Process - Cellular Component (http://www.geneontology.org/)http://www.geneontology.org/

37 37 IV. Databases of Protein Structures Protein Structure –PDB: Structure Determined by X-ray Crystallography and NMR –PDBsum: Summaries and analyses of PDB structures –MMDB: NCBI’s database of 3D structures, part of NCBI Entrez –SWISS-MODEL Repository: Database of annotated protein 3D models –ModBase: Annotated comparative protein structure models Structure Classification –CATH: Hierarchical Classification of Protein Domain Structures –SCOP: Familial and Structural Protein Relationships –FSSP: Protein Fold Classification Based on Structure--Structure Alignment

38 38 PDB: Experimental 3D Structure Repository (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/ Rat gamma-crystallin (chain A, B.) Can you do a text search at PIR to find this (CRGE_RAT)? Lab

39 39 PDBsum: Pictorial Database to Provide Summary and Analysis to PDB Entries Search 3-D structure summary 2-D structure (http://www.ebi.ac.uk/thornto n-srv/databases/pdbsum/)http://www.ebi.ac.uk/thornto n-srv/databases/pdbsum/

40 40 Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures ( http://www.cathdb.info/latest/index.html ) http://www.cathdb.info/latest/index.html

41 41 Protein Structural Classification (2) (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

42 42 SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models A database of annotated three-dimensional comparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?spt r_ac=CRGE_RAT&job=2)http://swissmodel.expasy.org/repository/smr.php?spt r_ac=CRGE_RAT&job=2

43 43 VI. Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with completed genomes.GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with completed genomes.http://gelbank.anl.gov SWISS-2DPAGE (http://www.expasy.org/ch2d/): index of 2D-gelsSWISS-2DPAGE (http://www.expasy.org/ch2d/): index of 2D-gelshttp://www.expasy.org/ch2d/ PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequencesPEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequenceshttp://cubic.bioc.columbia.edu/ pep/http://cubic.bioc.columbia.edu/ pep/ Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setsIntegr8 (http://www.ebi.ac.uk/integr8/): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setshttp://www.ebi.ac.uk/integr8/ PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databasesPRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databaseshttp://www.ebi.ac.uk/pride/ GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics DatabasesGPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databaseshttp://gpmdb.thegpm.org/

44 44 2D-Gel Image Databases ( http://us.expasy.org/swiss-2dpage/ac=P02489) http://us.expasy.org/swiss-2dpage/ac=P02489 Part of WORLD-2DPAGE: index to 2-D PAGE databases and services (http://us.expasy.org/ch2d/)http://us.expasy.org/ch2d/ Lab

45 45 GPMdb: MS Data Search (http://gpmdb.thegpm.org /)http://gpmdb.thegpm.org / Craig, et al., J Proteome Res. 2004, 3:1234-42.

46 46 PRIDE: centralized, standards compliant, public data repository for proteomics data http://www.ebi.ac.uk/pride/ HUPO Plasma Proteome Project

47 47 Protein Examples Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493) Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058) Any additional proteins of your interest for search and retrieval Lab: I.Text search / Information retrieval 1.Literature search and text mining –Finding synonyms (BioThesaurus) –Information extraction (e.g., protein phosphorylation sites) 2.Find the sequence for the rabbit alpha crystallin A chain 3.Find all alpha crystallin A chain classified in protein families 4.Search crystallins that have active enzyme activities 5.Find crystallins that have determined 3D structures II.Database contents (reports) 1.Sequence & genomics databases (UniProt) 2.Protein family databases (PIRSF) 3.Database of protein functions (KEGG) 4.Databases of protein structures (PDB) 5.Proteomics databases (Swiss-2D)


Download ppt "1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department."

Similar presentations


Ads by Google