Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.

Similar presentations


Presentation on theme: "EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton."— Presentation transcript:

1 eBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

2 eBankII Workshop 2 Scientific Data Overload!

3 eBankII Workshop 3 CombeChem: eScience testbed Properties X-Ray e-Lab Analysis Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database

4 eBankII Workshop 4 Chemistry Publications Ideas and interpretationsHooks into the literature Results & derived data Raw data!

5 eBankII Workshop 5

6 eBankII Workshop 6 Establishing common ground Understand the data creation process Terminology and definitions –Data –Metadata –Datafile –Dataset –Data holding Different views –Digital library researchers, computer scientists, chemists –Generic vs specific –Modeller vs practitioner Aim for a common ontology Modelling the domain Creating a metadata schema

7 eBankII Workshop 7 Crystallography workflow RAW DATADERIVED DATARESULTS DATA

8 eBankII Workshop 8 Crystallography datasets Initialisation: mount new sample on diffractometer & set up data collection Collection: collect data Processing: process and correct images Solution: solve structure Refinement: refine structure CIF: produce CIF (Crystallographic Information File format) Report: generate Crystal Structure Report Validation: generate report from structure checks Within a dataholding are the following datasets:

9 eBankII Workshop 9 Publishing, Informatics & Schemas Current schema is for publishing / advertising only eCrystals publishing requires lightweight schema only eBank harvesting requires lightweight schema only Aggregation and Linking requires a comprehensive schema Data management, Information delivery and Searching services require a very rich schema

10 eBankII Workshop 10 Deposition into the archive

11 eBankII Workshop 11 An Archive entry ecrystals.chem.soton.ac.uk

12 eBankII Workshop 12 Access to the underlying data

13 eBankII Workshop 13 Some metadata issues Using simple and qualified Dublin Core Additional chemical information in schema for harvesting e.g. empirical formula Schema contains International Chemical Identifier (InChI) Specifies which datasets are present in an entry Links to ePrints (and other published literature) derived from the data Using vocabularies specific to crystallography

14 eBankII Workshop 14 Harvesting: OAIster

15 eBankII Workshop 15 Linking and aggregating

16 eBankII Workshop 16 Embedded in a science portal

17 eBankII Workshop 17 Current situation Version 2.0 eBank metadata schema Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software Exports records as ebank_dc and oai_dc Validation of schema & discussion with International Union of Crystallography for developments (and wider deployment) Pilot eBank UK aggregator service Developing search interface Version 1.0 Testing with PSIgate physical sciences portal – embedding eBank UK

18 eBankII Workshop 18 Whats next? Generic metadata schema vs Subject specific schema Validation against other schema (CCLRC Model) (Eprints.org software: allow for more generic scientific data and schemas?) Metadata enhancement: keywords based on knowledge of keywords in related publications? Investigate identifiers: International Chemical Identifier Explore context sensitive linking Embedding into chemical and crystallographic research and publishing e-Learning embedding and pedagogic evaluation Feasibility study in related domains

19 eBankII Workshop 19 Crystallography Schema Breakout Describing non dc: terms –METS –SET container Rights –IPR –Copyright –Publisher –Funder Linking –DOI –Keyword ontology –Identifiers Data validation - Add validation dataset - Other forms of validation: Mogul Chemical representation - Naming conventions - Empirical formula representation Relationship between repositories and harvesters - Registration / subscription Syndication - FRIENDS container - RSS feeds


Download ppt "EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton."

Similar presentations


Ads by Google