Tutorial: Bioinformatics Resources (http://pir. georgetown

Slides:



Advertisements
Similar presentations
Bio-Trac 25 (Proteomics: Principles and Methods) March 26, 2004 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist Protein Information Resource National.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research.
BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Protein Databases EBI – European Bioinformatics Institute
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
BIO-TRAC 25 (Proteomics: Principles and Methods) October 10, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
An Introduction to Bioinformatics Molecular Biology Databases.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Protein Sequence Databases Computational Molecular Biology Biochem 218 – BioMedical.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Ch10. Intermolecular Interactions and Biological Pathways
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Biological Databases By : Lim Yun Ping E mail :
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Essential Computing for Bioinformatics Lecture.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
InterPro Sandra Orchard.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Bio/Chem-informatics
Demo: Protein Information Resource
Archives and Information Retrieval
Biological Sequence Databases
생물정보학 Bioinformatics.
UniProt: Universal Protein Resource
Genome Annotation Continued
Mangaldai College, Mangaldai
PIR: Protein Information Resource
Introduction to Bioinformatics
Overview of Microbial Pathway and Genome Databases
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
Introduction to Bioinformatics
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

Tutorial: Bioinformatics Resources (http://pir. georgetown Bio-Trac 25 (Proteomics: Principles and Methods) March 25, 2005 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist Protein Information Resource National Biomedical Research Foundation, GUMC

What is Bioinformatics? computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

Molecular Biology Database Collection (http://nar. oupjournals -- 719 key databases of 14 categories

Database Collection in Nucleic Acids Res.

http://pir.georgetown.edu/~huz/class/2005_database_update.html

Overview Database Contents, Search and Retrieval Text search / Information retrieval Sequence & genomics databases Protein family databases Database of protein functions Databases of protein structures Proteomics databases

Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)

PubMed Literature Database (http://www. ncbi. nlm. nih

UniProt Text Search (http://www.pir.uniprot.org/cgi-bin/textSearch)

PIR Text Search (I) (http://pir.georgetown.edu/pirwww/search/textsearch.html) What’s different between CRAA_RABIT & CYRBAA? How about Search: Crystallin and SuperFamily?

PIR Text Search (II) Can you find which crystallin that has 3D structure determined using PIR text search?

I. Sequence & Genomics Databases GenBank: An annotated collection of all publicly available nucleotide and protein sequences. RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products UniProt Consortium Database: Universal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. Entrez Gene: Gene-centered information at NCBI. UniGene: Unified clusters of ESTs and full-length mRNA sequences . OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards: Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database

UniProt Consortium Database UniProtKB (knowledgebase) UniRef (100,90,50) UniParc (archive) (http://www.uniprot.org)

UniProt Sequence Report (I) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRAA_RABIT)

UniProt Sequence Report (II) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)

Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

OMIM: Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)

II. Protein Family Databases Whole Proteins PIRSF: A Network Classification System of Protein Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families CDD: Conserved Domain Database Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily

Protein Clustering COGs: (http://www.ncbi.nlm.nih.gov/COG/)

KOGs: Eukaryotic Clusters (http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi?KOG3591)

Domain Classification (http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRAA_RABIT) (http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRAA_RABIT)

Pfam Domain (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)

Integrated Family Classification InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

PIRSF: Full Length Classification iProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)

Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)

III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins BioCarta: Biological pathways of human and mouse GO: Gene Ontology Consortium Database MetaCyc is a metabolic-pathway database. The database describes pathways, reactions, and enzymes of a variety of organisms, with a microbial focus.

KEGG Metabolic & Regulatory Pathways KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html) (http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)

BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)

BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)

Protein-Protein Interaction: BIND (http://www.bind.ca/)

Gene Ontology (http://www.geneontology.org/) Three GOs: Molecular Function Biological Process Cellular Component

IV. Databases of Protein Structures PDB: Structure Determined by X-ray Crystallography and NMR PDBsum: Summaries and analyses of PDB structures MMDB: NCBI’s database of 3D structures, part of NCBI Entrez SWISS-MODEL Repository: Database of annotated protein 3D models ModBase: Annotated comparative protein structure models Structure Classification CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Classification Based on Structure--Structure Alignment

PDB 3D Structure Rat gamma-crystallin, chain A, B. Can you do a text search at PIR to find this? (http://www.rcsb.org/pdb/)

PDBsum: Summary and Analysis (http://www.biochem.ucl.ac.uk/bsm/pdbsum)

Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)

Protein Structural Classification (2) SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known. (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)

SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)

VI. Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and analyzing microarray data)

2D-Gel Image Databases (1) (http://us.expasy.org/ch2d/2d-index.html) (http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489)

2D-Gel Image Databases (2) (http://gelbank.anl.gov/2dgels/index.asp)

Expression Profiling (http://genome-www.stanford.edu/serum/) Human and Mouse Transcriptome (http://genome-www.stanford.edu/serum/) (http://expression.gnf.org/cgi-bin/index.cgi) (http://expression.gnf.org/cgi-bin/index.cgi/)

Lab: Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to. Delta crystallin II (Argininosuccinate lyase) (UniProt: CRD2_ANAPL) Alpha crystallin (UniProt: CRAA_RABIT)