Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information.

Similar presentations


Presentation on theme: "BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information."— Presentation transcript:

1 BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation, GUMC Tutorial: Bioinformatics Resources

2 2 What is Bioinformatics? NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.

3 3 Bioinformatics Resources The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources 2003 update: http://www3.oup.co.uk/nar/database/ 2003 update: http://www3.oup.co.uk/nar/database/http://www3.oup.co.uk/nar/database/ Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/) Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/)http://nar.oupjournals.org/content/vol31/issue1/ DBcat: A Catalog of > 500 Biological Databases http://www.infobiogen.fr/services/dbcat/ http://www.infobiogen.fr/services/dbcat/ http://www.infobiogen.fr/services/dbcat/

4 4 Molecular Biology Database Collection Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1

5 5 The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.) -- An online resource of 386 key databases of 18 categories Major sequence repositories Comparative Genomics Gene Expression Gene Identification and Structure Genetic and Physical Maps Genomic Databases Intermolecular Interactions Metabolic Pathways and Cellular Regulation Mutation Databases Pathology Protein Sequence Motifs Proteome Resources Retrieval Systems and Database Structure RNA Sequences StructureTransgenics Varied Biomedical Content

6 6 Overview Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses

7 7 I. Sequence Similarity Search Find a protein sequence: text search Based on Pair-Wise Comparisons BLOSUM scoring matrix BLOSUM scoring matrix PAM scoring matrix PAM scoring matrix Dynamic Programming Algorithms Global Similarity: Needleman-Wunsch (GAP/BestFit) Global Similarity: Needleman-Wunsch (GAP/BestFit) Local Similarity: Smith-Waterman (SSEARCH) Local Similarity: Smith-Waterman (SSEARCH) Heuristic Algorithms (Sequence Database Searching) FASTA: Based on K-Tuples (2-Amino Acid) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PSI-BLAST: Iterative Search (NCBI) PSI-BLAST: Iterative Search (NCBI)

8 8 Sequence Search by Text or Unique ID Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) (http://pir.georgetow n.edu/pirwww/search /textsearch.html)

9 9 Pair-Wise Comparisons Scoring matrix lobal local Global and local Similarity: Dynamic Programming ( (Needleman-Wunsch, Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/) http://www.ebi.ac.uk/emboss/align/

10 10 FASTA Search (http://www.ebi. ac.uk/fasta33/)http://www.ebi. ac.uk/fasta33/ (http://pir.georgetown.edu/pirwww/search/fasta.html)http://pir.georgetown.edu/pirwww/search/fasta.html

11 11 Gapped-BLAST Search (http://pir.georgetown.edu/pirwww/search/pirnref.shtml)http://pir.georgetown.edu/pirwww/search/pirnref.shtml (http://www.ncbi.nlm.nih.gov/BLAST/)http://www.ncbi.nlm.nih.gov/BLAST/

12 A BLAST Result

13 13 PSI-BLAST Iterative Search (http://www.ncbi.nlm.nih.gov/BLAST/)http://www.ncbi.nlm.nih.gov/BLAST/

14 14 PSI-BLAST

15 15 II. Family Classification Methods Multiple Sequence Alignment and Phylogenetic Analysis ClustalW Multiple Sequence Alignment ClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic Trees Alignment Editor & Phylogenetic Trees Searches Based on Family Information PROSITE Pattern Search PROSITE Pattern Search Motif and Profile Search Motif and Profile Search Hidden Markov Model (HMMs) Hidden Markov Model (HMMs)

16 16 Multiple Sequence Alignment ClustalW ( http://pir.georgetown.edu/pirwww/search/multaln.html ) http://pir.georgetown.edu/pirwww/search/multaln.html

17 17 Alignment Editor (Jalview) (http://www.ebi.ac.uk/clustalw/)http://www.ebi.ac.uk/clustalw/

18 18 Alignment Editor (GeneDoc) (http://www.psc.edu/biomed/genedoc/)http://www.psc.edu/biomed/genedoc/

19 19 Phylogenetic Analysis Tree Programs: (http://evolution. genetics.washington.edu/phylip.html) Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)http://pauling. mbu.iisc.ernet.in/~pali/index.html

20 20 Phylogenetic Trees Phylogenetic Trees (IGFBP Superfamily) (Radial Tree) (Phylogram)

21 21 PROSITE Pattern Search (http://pir.georgetown.edu/pirwww/search/patmatch.html)http://pir.georgetown.edu/pirwww/search/patmatch.html

22 22 Profile Search (http://bmerc-www.bu.edu/bioinformatics/profile_request.html)http://bmerc-www.bu.edu/bioinformatics/profile_request.html

23 23 Hidden Markov Model Search (http://www.sanger.ac.uk/Software/Pfam/search.shtml)http://www.sanger.ac.uk/Software/Pfam/search.shtml (http://smart.embl -heidelberg.de)http://smart.embl -heidelberg.de

24 24 III. Structural Prediction Methods Signal Peptide: SIGFIND, SignalP Transmembrane Helix: TMHMM, TMAP 2D Prediction (  -helix,  -sheet, Coiled-coils): PHD, JPred 3D Modeling: Homology Modeling (Modeller, SWISS- MODEL), Threading, Ab-initio Prediction

25 25 Structure Prediction: A Guide (http://speedy.embl- heidelberg.de/gtsp/flow chart2.html)http://speedy.embl- heidelberg.de/gtsp/flow chart2.html

26 26 Protein Prediction Server (http://www.cbs. dtu.dk/services/)http://www.cbs. dtu.dk/services/

27 27 Signal Peptide Prediction (http://www.stepc.gr/~synaptic/sigfind.html)http://www.stepc.gr/~synaptic/sigfind.html (http://www.cbs.dtu.d k/services/SignalP-2.0)http://www.cbs.dtu.d k/services/SignalP

28 28 Transmembrane Helix (http://www.cbs.dtu.dk/services/TMHMM/)http://www.cbs.dtu.dk/services/TMHMM/

29 29 Protein Structure Prediction (http://cmgm.stanford.edu/WWW/www_predict.html)http://cmgm.stanford.edu/WWW/www_predict.html (http://restools.sdsc.edu/ biotools/biotools9.html)http://restools.sdsc.edu/ biotools/biotools9.html

30 30 Structure Prediction Server (http://cubic.bioc.columbia.edu/predictprotein/)http://cubic.bioc.columbia.edu/predictprotein/ (http://www.compbio.dun dee.ac.uk/WWW_Servers/ JPred/jpred.html)http://www.compbio.dun dee.ac.uk/WWW_Servers/ JPred/jpred.html

31 31 3D-Modelling (http://www.salilab.org/modeller/modeller.html) (http://www.expasy. ch/swissmod/SWISS -MODEL.html)http://www.expasy. ch/swissmod/SWISS -MODEL.html

32 32 IV. Protein Family Databases Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

33 33 Protein Clustering (http://www.ncbi.nlm.nih.gov/COG/) http://www.ncbi.nlm.nih.gov/COG/

34 34 Protein Domains Pfam (http://www.sanger.ac.uk/Software/Pfam/)http://www.sanger.ac.uk/Software/Pfam/ SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)

35 35 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)

36 36 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)http://www.ebi.ac.uk/interpro/search.html

37 37 V. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia ( Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation

38 38 KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00590+874)http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00590+874 KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)http://www.genome.ad.jp/kegg/kegg2.html

39 39 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)http://biocyc.org/

40 40 Protein-Protein Interactions: DIP (http://dip.doe-mbi.ucla.edu/)http://dip.doe-mbi.ucla.edu/

41 41 Protein-Protein Interaction: BIND (http://www.bind.ca/) http://www.bind.ca/

42 42 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)

43 43 VI. Databases of Protein Structures Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved Regions

44 44 PDB Structure Data (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/

45 45 PDBsum: Summary and Analysis Summary and Analysis (http://www.biochem.ucl. ac.uk/bsm/pdbsum)http://www.biochem.ucl. ac.uk/bsm/pdbsum

46 46 Protein Structural Classification CATH: Hierarchical domain classification of protein structures (http://www.biochem.http://www.biochem. ucl.ac.uk/bsm/cath_new/ucl.ac.uk/bsm/cath_new/)

47 47 Protein Structural Classification (http://scop.mrc-lmb. cam.ac.uk/scop/)http://scop.mrc-lmb. cam.ac.uk/scop/ The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.

48 48 VII. Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) http://gelbank.anl.govhttp://www.expasy.org/ch2d/http://gelbank.anl.govhttp://www.expasy.org/ch2d/ PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences http://cubic.bioc.columbia.edu/ pep/http://cubic.bioc.columbia.edu/ pep/ Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes http://www.proteome.com Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes http://www.ebi.ac.uk/proteome/ Expression Profiling databases: GNF (http://expression.gnf.org/cgi- bin/index.cgi, human and mouse transcriptome), SMD (http://genome- www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html, managing, storing and analyzing microarray data) http://expression.gnf.org/cgi- bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/http://www.ebi.ac.uk/microarray/ index.htmlhttp://expression.gnf.org/cgi- bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/http://www.ebi.ac.uk/microarray/ index.html

49 49 2D-Gel Image Databases (1) (http://gelbank.anl.gov/2dgels/index.asp)http://gelbank.anl.gov/2dgels/index.asp

50 50 2D-Gel Image Databases (2) (http://us.expasy.org/ch2d/2d-index.html) (http://us.expasy.org/cgi- bin/nice2dpage.pl?P06493)

51 51 VIII. Proteome Analysis (http://www.ebi.ac.uk/proteome)http://www.ebi.ac.uk/proteome

52 52 Expression Profiling Human and Mouse Transcriptome (http://expression.gnf.org/cgi-bin/index.cgi)http://expression.gnf.org/cgi-bin/index.cgi (http://genome-www. stanford.edu/serum/)http://genome-www. stanford.edu/serum/

53 53 Lab: Visit selected websites and analyze some protein sequences of your own choices. - List of Bioinformatics Resources of this tutorial available : http://pir.georgetown.edu/~huz/bioinfo_resource.html Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338, T47130 SWISS-PROT:Q9BWT7


Download ppt "BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information."

Similar presentations


Ads by Google