Presentation is loading. Please wait.

Presentation is loading. Please wait.

Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe

Similar presentations


Presentation on theme: "Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe"— Presentation transcript:

1 Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe http://www.ebi.ac.uk/msd

2 Temblor EBI-MSD Spine Oxford Autostruct York NMRQual Utrecht EMBL Wellcome Trust CCPN Cambridge EHTPX Daresbury BBSRC CCP4 IIMS EBI-MSD EU MRC Integration Sanger Inst SCOP CATH pfam harvesting E-science Advanced search CLRC EU BBSRC USA Data Exchange BMRBRCSB Validation Structural Genomics Electron Microscopy Grant & co-ordinator Grant Funding Core Funding Data Exchange

3  clean biological data  integrated data  a single web access point  query interfaces for different users  interconnected views of the data relating structure, sequence, text & experimental details E-MSD Provides

4 SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB Ligand Active site Structure Sequence Keyword Search Query Sorted Hit List Atlas page Structure Sequence Active Site Expt data Query Results and Interactive viewer Web Interface For Biologist, Chemist, Structural Biologist, Teacher SSM FastA Methods

5 Web services  Data API’s  Methods - as web services SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB SSM FastA Methods

6 Web based pages  Search interfaces  Interactive Visualisation

7 DATA INTEGRATION

8 A Database for all ? MSD SEARCH DATABASE

9 Data integration  We want to include all types of biological data Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS  But we can’t do everything ! So can the Grid allow the integration of data from other sources ? SwissProt Medline Active Sites Ligands Folds- Scop/Dali Secondary Struct PDB

10 Problems for Grid (1- Provenance)  We are a funded institute. We have to be seen to be useful or we do not get funded !  Industry need to be seen - share holders  Origin of the Distributed information: User and funding body need to see who provided the information. How do we retain and present detail of this ?

11 Problem for Grid (2)  We do not know “best practice” in much of biology Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data….  There will be conflict of information Data/methods have associated validity information - the different data/methods may be only inconsistent in part.  How is conflicting information going to be presented to and filtered for a user  Who is going to assign data validity !

12 Grid problem (3- Data access control)  Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000 human genes - not a problem yet, but…..  This is more than data security : Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search result order as been “updated”

13 Summary  We want to be able to provide a scientific service Web pages and Web services  We would like to be able to expand the results to include information from other data resources.  The 3 issues are only a small number of issues, but represent fundamental problems

14 CLEAN DATA : Quaternary structure Chains Residues Atoms Xray Experiment AssemblySub-Assembly Biology

15 CLEAN DATA :Example of experimental result Authors would know structure, we have to derive it at submission M.BOCHTLER et al, NATURE, 403, 800 (2000) Asymmetric unit

16 Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer Hexamer Dodecamer http://pqs.ebi.ac.uk Assembly

17 RESOLUTION SLIDING SCALE FOR RULES electron density at different resolutions - phenylalanine Correctly placed into the 1.2 Å data. This still can be done with confidence in the 2 Å case. But at 3 Å we already observe a deviation of the centroid of the ring from the correct model Clean data

18 1qi3 1rmg Zscore=(Fit- )/sigma A large positive spike is indicative of a residue which is worse than the average for that residue type in structures of similar resolutions. 1f83 Good Terrible

19 PHENYLALANINE Geometric outliers

20 Loader LIGAND DB

21 Site environment DB  Covalent Bonds  Coordinate bonds  Hydrogen bonds  Planes  Non-bonding  Electrostatics  Di-Sulphide bonds PHE O N S ASP VAL


Download ppt "Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe"

Similar presentations


Ads by Google