Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Temblor EBI-MSD Spine Oxford Autostruct York NMRQual Utrecht EMBL Wellcome Trust CCPN Cambridge EHTPX Daresbury BBSRC CCP4 IIMS EBI-MSD EU MRC Integration Sanger Inst SCOP CATH pfam harvesting E-science Advanced search CLRC EU BBSRC USA Data Exchange BMRBRCSB Validation Structural Genomics Electron Microscopy Grant & co-ordinator Grant Funding Core Funding Data Exchange
clean biological data integrated data a single web access point query interfaces for different users interconnected views of the data relating structure, sequence, text & experimental details E-MSD Provides
SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB Ligand Active site Structure Sequence Keyword Search Query Sorted Hit List Atlas page Structure Sequence Active Site Expt data Query Results and Interactive viewer Web Interface For Biologist, Chemist, Structural Biologist, Teacher SSM FastA Methods
Web services Data API’s Methods - as web services SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB SSM FastA Methods
Web based pages Search interfaces Interactive Visualisation
DATA INTEGRATION
A Database for all ? MSD SEARCH DATABASE
Data integration We want to include all types of biological data Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS But we can’t do everything ! So can the Grid allow the integration of data from other sources ? SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB
Problems for Grid (1- Provenance) We are a funded institute. We have to be seen to be useful or we do not get funded ! Industry need to be seen - share holders Origin of the Distributed information: User and funding body need to see who provided the information. How do we retain and present detail of this ?
Problem for Grid (2) We do not know “best practice” in much of biology Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data…. There will be conflict of information Data/methods have associated validity information - the different data/methods may be only inconsistent in part. How is conflicting information going to be presented to and filtered for a user Who is going to assign data validity !
Grid problem (3- Data access control) Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000 human genes - not a problem yet, but….. This is more than data security : Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search result order as been “updated”
Summary We want to be able to provide a scientific service Web pages and Web services We would like to be able to expand the results to include information from other data resources. The 3 issues are only a small number of issues, but represent fundamental problems
CLEAN DATA : Quaternary structure Chains Residues Atoms Xray Experiment AssemblySub-Assembly Biology
CLEAN DATA :Example of experimental result Authors would know structure, we have to derive it at submission M.BOCHTLER et al, NATURE, 403, 800 (2000) Asymmetric unit
Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer Hexamer Dodecamer Assembly
RESOLUTION SLIDING SCALE FOR RULES electron density at different resolutions - phenylalanine Correctly placed into the 1.2 Å data. This still can be done with confidence in the 2 Å case. But at 3 Å we already observe a deviation of the centroid of the ring from the correct model Clean data
1qi3 1rmg Zscore=(Fit- )/sigma A large positive spike is indicative of a residue which is worse than the average for that residue type in structures of similar resolutions. 1f83 Good Terrible
PHENYLALANINE Geometric outliers
Loader LIGAND DB
Site environment DB Covalent Bonds Coordinate bonds Hydrogen bonds Planes Non-bonding Electrostatics Di-Sulphide bonds PHE O N S ASP VAL