Archives and Information Retrieval

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Scientific publications and archives: media, content and access Lesk, Ch 3 (Lesk, 2008)
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Protein structure (Part 2 of 2).
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
An Introduction to Bioinformatics Molecular Biology Databases.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Macromolecular structure
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013.
Gene Annotation and Analysis Lab Work Reference: European Multimedia Bioinformatics Educational Resource.
Biological Databases By : Lim Yun Ping E mail :
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
STRUCTURAL BIOLOGY Martina Mijušković ETH Zürich, Switzerland.
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Bioinformatics Overview
Protein databases Henrik Nielsen
Introduction to RCSB PDB Data, Tools and Resources
Demo: Protein Information Resource
Archives and Information Retrieval
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Genomes and Their Evolution
Introduction to Bioinformatics
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS

Database indexing and specification of search terms An index is a set of pointers to information in a database Search terms Entries: discrete coherent parcels of information. The information retrieval software Keywords AND’ NOT Follow-up questions

The archives Primary data collections related to biological macromolecules include: 􀂃 Nucleic acid sequences, including whole-genome projects 􀂃 Amino acid sequences of proteins 􀂃 Protein and nucleic acid structures 􀂃 Small-molecule crystal structures 􀂃 Protein functions 􀂃 Expression patterns of genes 􀂃 Publications

Nucleic acid sequence databases Triple partnership of: National Center for Biotechnology Information (USA) EMBL Data Library (European Bioinformatics Institute, UK) DNA Data Bank of Japan (National Institute of Genetics, Japan). Entries have a life cycle

Nucleotide sequence databases EMBL, GenBank, and DDBJ are the three primary nucleotide sequence databases EMBL www.ebi.ac.uk/embl/ GenBank www.ncbi.nlm.nih.gov/Genbank/ DDBJ www.ddbj.nig.ac.jp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene Identification Accession Date Description Keyword Organism Source Organism Classification Reference Number

Reference Position Reference Author Reference Title Reference Location Feature Table Header

FT: The feature table may indicate regions that 1 FT: The feature table may indicate regions that 1. perform or affect function 2. interact with other molecules 3. affect replication 4. are involved in recombination 5. are a repeated unit 6. have secondary or tertiary structure 7. are revised or corrected Sequence Header

Protein sequence databases SWISS-PROT: The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT. PIR: Another protein sequence database is produced by The PIR International

http://pir.georgetown.edu/

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROT ENZYME DB, and PROSITE The ENZYME DB stores the following information about enzymes: EC Number: a numerical identifier assigned by the Enzyme Commission (authorized by the International Union of Biochemistry and Molecular Biology; see http://www.chem.qmw.ac.uk/iubmb/enzyme/) Recommended name Alternative names, if any Catalytic activity Cofactors, if any Pointers to SWISS-PROT and other data banks Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases The PIR maintains several databases about proteins: 1. PIR-PSD: the main protein sequence database 2. iProClass: classification of proteins according to structure and function 3. ASDB: annotation and similarity database; each entry is linked to a list of similar sequences 4. P/R-NREF: a comprehensive non-redundant collection of over 800 000 protein sequences merged from all available sources 5. NRL3D: a database of sequences and annotations of proteins of known structure deposited in the Protein Data Bank 6. ALN: a database of protein sequence alignments, and 7.RESID: a database of covalent protein structure modifications (recall that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences, and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structures Structure databases archive, annotate and distribute sets of atomic coordinates Protein Data Bank (PDB). The information contained includes: What protein is the subject of the entry, and what species it came from Who solved the structure, and references to publications describing the structure determination Experimental details about the structure determination, including information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics The amino acid sequence What additional molecules appear in the structure, including cofactors, inhibitors, and water molecules Assignments of secondary structure: helix, sheet Disulphide bridges The atomic coordinates

Protein data bank entry 2TRX, E. coli thioredoxin

Indicators of structure quality X-ray crystal structure analysis Nuclear Magnetic Resonance Web Resource; Protein and Nucleic Acid Structures Home page of protein data bank: http://www.rcsb.org Home page of EBI macromolecular structure database: http://msd.ebi.ac.uk/ Home page of BioMagResBank: http://www.bmrb.wisc.edu/ Searching the protein data bank: Home page of SCOP (Structural classification of proteins): http://scop.mrc-lmb.cam.ac.uk/scop/ List of browsers: http://pdb-browsers.ebi.ac.uk/browse_it.shtml OCA: http://oca.ebi.ac.uk/oca-bin/ocamain

Crystallisation Hanging drop method / vapour diffusion method Microscope 1-Dilute protein solution Microscope slide many different conditions of 1&2 must be tried 2-Concentrated salt solution Crystal Slide courtesy from Shoshana Wodak

Determination of protein structure Diffraction pattern Atomic model Slide courtesy from Shoshana Wodak

The resolution problem q q q A high resolution protein structure : 1.5 - 2.0 Å resolution Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR) Source: Branden & Tooze (1991)

Classifications of protein structures SCOP: Structural Classification of Proteins CATH: Class/Architecture/Topology/Homology DALI:Classification of protein domains Based on extraction of similar structures from distance matrices. [http://www.ebi.ac.uk/dali/domain/] CE: A database of structural alignments

CATH - A protein domain classification In CATH, protein domains are classified according to a tree with 4 levels of hierarchically Class Architecture Topology Homology Class Architecture Topology Figure from Shoshana Wodak

Specialized, or 'boutique' databases VIPER (Virus Particle ExploreR) treats crystal structures of icosahedral viruses. In the field of immunology: IMGT, the international ImMunoGeneTics database, is a high-quality integrated database specializing in Immunoglobulins (Ig), T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species. The IMGT server provides a common access to all Immunogenetics data. At present, it includes two databases: IMGT/LIGM-DB, a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates, with translation for fully annotated sequences, and IMGT/HLA-DB, a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource: Databases for Specific Protein Families Protein kinases http://www.sdsc.edu/kinases/ HIV proteases http://www-fbsc.ncifcrf.gov/HIVdb/ Icosahedral viruses http://mmtsb.scripps.edu/viper/main.html Immunology IGMT: http://imgt.cines.fr KABAT: http://immuno.bme.nwu.edu/ MHCPEP: http://wehih.wehi.edu.au/mhcpep/ Collections of links to databases on specific protein families http://www2.ebi.ac.uk/msd/Links/family.shtml KABAT - Database of Sequences of Proteins of Immunological Interest - North-Western University (USA) MHCPEP - Major Histocompatibility Complex Binding Peptides Database - Walter and Eliza Hall Institute (Melbourne, Australia)

Expression and proteomics databases Expression databases record measurements of mRNA levels, usually via ESTs (short terminal sequences of cDNA synthesized from mRNA). Comparisons of expression patterns give clues to: (1) the function and mechanism of action of gene products, (2) how organisms coordinate their control over metabolic processes in different conditions - for instance yeast under aerobic or anaerobic conditions, (3) the variations in mobilization of genes at different stages of the cell cycle, or of the development of an organism, (4) mechanisms of antibiotic resistance in bacteria, and consequent suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages, to guide effective therapy.