BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information.

Slides:



Advertisements
Similar presentations
Bio-Trac 25 (Proteomics: Principles and Methods) March 26, 2004 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist Protein Information Resource National.
Advertisements

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Pfam(Protein families )
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource.
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Archives and Information Retrieval
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Bioinformatics and Phylogenetic Analysis
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
A number of slides taken/modified from:
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Ch10. Intermolecular Interactions and Biological Pathways
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Biological Databases By : Lim Yun Ping E mail :
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Bioinformatics and Computational Biology
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
InterPro Sandra Orchard.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Protein families, domains and motifs in functional prediction May 31, 2016.
Tutorial: Bioinformatics Resources ( georgetown
Bioinformatics Overview
Bio/Chem-informatics
Demo: Protein Information Resource
Sequence based searches:
Archives and Information Retrieval
Genome Annotation Continued
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Tutorial: Bioinformatics Resources
Sequence Based Analysis Tutorial
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation, GUMC Tutorial: Bioinformatics Resources

2 What is Bioinformatics? NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.

3 Bioinformatics Resources The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources 2003 update: update: Nucleic Acids Research Database Issues (January Annually) ( Nucleic Acids Research Database Issues (January Annually) ( DBcat: A Catalog of > 500 Biological Databases

4 Molecular Biology Database Collection Molecular Biology Database Collection (

5 The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.) -- An online resource of 386 key databases of 18 categories Major sequence repositories Comparative Genomics Gene Expression Gene Identification and Structure Genetic and Physical Maps Genomic Databases Intermolecular Interactions Metabolic Pathways and Cellular Regulation Mutation Databases Pathology Protein Sequence Motifs Proteome Resources Retrieval Systems and Database Structure RNA Sequences StructureTransgenics Varied Biomedical Content

6 Overview Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses

7 I. Sequence Similarity Search Find a protein sequence: text search Based on Pair-Wise Comparisons BLOSUM scoring matrix BLOSUM scoring matrix PAM scoring matrix PAM scoring matrix Dynamic Programming Algorithms Global Similarity: Needleman-Wunsch (GAP/BestFit) Global Similarity: Needleman-Wunsch (GAP/BestFit) Local Similarity: Smith-Waterman (SSEARCH) Local Similarity: Smith-Waterman (SSEARCH) Heuristic Algorithms (Sequence Database Searching) FASTA: Based on K-Tuples (2-Amino Acid) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PSI-BLAST: Iterative Search (NCBI) PSI-BLAST: Iterative Search (NCBI)

8 Sequence Search by Text or Unique ID Entrez ( ( n.edu/pirwww/search /textsearch.html)

9 Pair-Wise Comparisons Scoring matrix lobal local Global and local Similarity: Dynamic Programming ( (Needleman-Wunsch, Smith-Waterman) (

10 FASTA Search ( ac.uk/fasta33/) ac.uk/fasta33/ (

11 Gapped-BLAST Search ( (

A BLAST Result

13 PSI-BLAST Iterative Search (

14 PSI-BLAST

15 II. Family Classification Methods Multiple Sequence Alignment and Phylogenetic Analysis ClustalW Multiple Sequence Alignment ClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic Trees Alignment Editor & Phylogenetic Trees Searches Based on Family Information PROSITE Pattern Search PROSITE Pattern Search Motif and Profile Search Motif and Profile Search Hidden Markov Model (HMMs) Hidden Markov Model (HMMs)

16 Multiple Sequence Alignment ClustalW ( )

17 Alignment Editor (Jalview) (

18 Alignment Editor (GeneDoc) (

19 Phylogenetic Analysis Tree Programs: ( genetics.washington.edu/phylip.html) Tree Searches: ( mbu.iisc.ernet.in/~pali/index.html) mbu.iisc.ernet.in/~pali/index.html

20 Phylogenetic Trees Phylogenetic Trees (IGFBP Superfamily) (Radial Tree) (Phylogram)

21 PROSITE Pattern Search (

22 Profile Search (

23 Hidden Markov Model Search ( ( -heidelberg.de) -heidelberg.de

24 III. Structural Prediction Methods Signal Peptide: SIGFIND, SignalP Transmembrane Helix: TMHMM, TMAP 2D Prediction (  -helix,  -sheet, Coiled-coils): PHD, JPred 3D Modeling: Homology Modeling (Modeller, SWISS- MODEL), Threading, Ab-initio Prediction

25 Structure Prediction: A Guide ( heidelberg.de/gtsp/flow chart2.html) heidelberg.de/gtsp/flow chart2.html

26 Protein Prediction Server ( dtu.dk/services/) dtu.dk/services/

27 Signal Peptide Prediction ( ( k/services/SignalP-2.0) k/services/SignalP

28 Transmembrane Helix (

29 Protein Structure Prediction ( ( biotools/biotools9.html) biotools/biotools9.html

30 Structure Prediction Server ( ( dee.ac.uk/WWW_Servers/ JPred/jpred.html) dee.ac.uk/WWW_Servers/ JPred/jpred.html

31 3D-Modelling ( ( ch/swissmod/SWISS -MODEL.html) ch/swissmod/SWISS -MODEL.html

32 IV. Protein Family Databases Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

33 Protein Clustering (

34 Protein Domains Pfam ( SMART ( smart.embl-heid elberg.de/smart/ show_motifs.pl)

35 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (

36 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (

37 V. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia ( Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation

38 KEGG Metabolic & Regulatory Pathways ( bin/show_pathway?hsa ) bin/show_pathway?hsa KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (

39 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (

40 Protein-Protein Interactions: DIP (

41 Protein-Protein Interaction: BIND (

42 BioCarta Cellular Pathways (

43 VI. Databases of Protein Structures Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved Regions

44 PDB Structure Data (

45 PDBsum: Summary and Analysis Summary and Analysis ( ac.uk/bsm/pdbsum) ac.uk/bsm/pdbsum

46 Protein Structural Classification CATH: Hierarchical domain classification of protein structures ( ucl.ac.uk/bsm/cath_new/ucl.ac.uk/bsm/cath_new/)

47 Protein Structural Classification ( cam.ac.uk/scop/) cam.ac.uk/scop/ The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.

48 VII. Proteomic Resources GELBANK ( 2D-gel patterns from completed genomes; SWISS-2DPAGE ( PEP: Predictions for Entire Proteomes: ( pep/): Summarized analyses of protein sequences pep/ pep/ Proteome BioKnowledge Library: ( Detailed information on human, mouse and rat proteomes Proteome Analysis Database ( Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes Expression Profiling databases: GNF ( bin/index.cgi, human and mouse transcriptome), SMD ( www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics ( index.html, managing, storing and analyzing microarray data) bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/ index.htmlhttp://expression.gnf.org/cgi- bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/ index.html

49 2D-Gel Image Databases (1) (

50 2D-Gel Image Databases (2) ( ( bin/nice2dpage.pl?P06493)

51 VIII. Proteome Analysis (

52 Expression Profiling Human and Mouse Transcriptome ( ( stanford.edu/serum/) stanford.edu/serum/

53 Lab: Visit selected websites and analyze some protein sequences of your own choices. - List of Bioinformatics Resources of this tutorial available : Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338, T47130 SWISS-PROT:Q9BWT7