Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe

Slides:



Advertisements
Similar presentations
Protein Structure.
Advertisements

EMBL-EBI Integration of Sequence and 3D structure Databases.
Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
Nucleic Acid Database By Pooja Awatramani. Database Utilities Provides structural references in the form of base pair annotation for DNA, RNA, and some.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
The European Molecular Biology Laboratory (EMBL) is supported by sixteen countries. Consists of the main Laboratory in Heidelberg (Germany), Outstations.
Protein Structure, Databases and Structural Alignment
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Interfaces, Surfaces and Assemblies
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Macromolecular structure
EMBL-EBI MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining For the advanced MSDSD researcher Interactive ad-hoc.
Archives and Information Retrieval
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EMBL-EBI the European Macromolecular Structure Database (EMSD).
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
MSDmotif 1 Adel Golovin Protein Site and Motif search Biosapiense network of excellence.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
EMBL-EBI Integration of Sequence and 3D structure Databases “The key to Bioinformatics is integration, integration, integration” Bioinformatics: Bringing.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
EMBL-EBI Representative sets and Clustering.. EMBL-EBI Representative sets A subset of data that provides a statistically valid sample set for the complete.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
What does the future hold? SAPHIRE CCP4 libraries Program Developments More automation 3D viewer Project CCP4 Study Weekend 2003 BAR!
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBe Protein Interfaces, Surfaces and Assemblies
Take a REST from manual searching: PDBe, programmatically
Introduction to RCSB PDB Data, Tools and Resources
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Interactions and Ontologies
Demo: Protein Information Resource
Getting the Most out of the PDBe
Prediction of Protein Structure and Function on a Proteomic Scale
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe

Temblor EBI-MSD Spine Oxford Autostruct York NMRQual Utrecht EMBL Wellcome Trust CCPN Cambridge EHTPX Daresbury BBSRC CCP4 IIMS EBI-MSD EU MRC Integration Sanger Inst SCOP CATH pfam harvesting E-science Advanced search CLRC EU BBSRC USA Data Exchange BMRBRCSB Validation Structural Genomics Electron Microscopy Grant & co-ordinator Grant Funding Core Funding Data Exchange

 clean biological data  integrated data  a single web access point  query interfaces for different users  interconnected views of the data relating structure, sequence, text & experimental details E-MSD Provides

SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB Ligand Active site Structure Sequence Keyword Search Query Sorted Hit List Atlas page Structure Sequence Active Site Expt data Query Results and Interactive viewer Web Interface For Biologist, Chemist, Structural Biologist, Teacher SSM FastA Methods

Web services  Data API’s  Methods - as web services SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB SSM FastA Methods

Web based pages  Search interfaces  Interactive Visualisation

DATA INTEGRATION

A Database for all ? MSD SEARCH DATABASE

Data integration  We want to include all types of biological data Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS  But we can’t do everything ! So can the Grid allow the integration of data from other sources ? SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB

Problems for Grid (1- Provenance)  We are a funded institute. We have to be seen to be useful or we do not get funded !  Industry need to be seen - share holders  Origin of the Distributed information: User and funding body need to see who provided the information. How do we retain and present detail of this ?

Problem for Grid (2)  We do not know “best practice” in much of biology Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data….  There will be conflict of information Data/methods have associated validity information - the different data/methods may be only inconsistent in part.  How is conflicting information going to be presented to and filtered for a user  Who is going to assign data validity !

Grid problem (3- Data access control)  Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000 human genes - not a problem yet, but…..  This is more than data security : Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search result order as been “updated”

Summary  We want to be able to provide a scientific service Web pages and Web services  We would like to be able to expand the results to include information from other data resources.  The 3 issues are only a small number of issues, but represent fundamental problems

CLEAN DATA : Quaternary structure Chains Residues Atoms Xray Experiment AssemblySub-Assembly Biology

CLEAN DATA :Example of experimental result Authors would know structure, we have to derive it at submission M.BOCHTLER et al, NATURE, 403, 800 (2000) Asymmetric unit

Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer Hexamer Dodecamer Assembly

RESOLUTION SLIDING SCALE FOR RULES electron density at different resolutions - phenylalanine Correctly placed into the 1.2 Å data. This still can be done with confidence in the 2 Å case. But at 3 Å we already observe a deviation of the centroid of the ring from the correct model Clean data

1qi3 1rmg Zscore=(Fit- )/sigma A large positive spike is indicative of a residue which is worse than the average for that residue type in structures of similar resolutions. 1f83 Good Terrible

PHENYLALANINE Geometric outliers

Loader LIGAND DB

Site environment DB  Covalent Bonds  Coordinate bonds  Hydrogen bonds  Planes  Non-bonding  Electrostatics  Di-Sulphide bonds PHE O N S ASP VAL