Download presentation
Presentation is loading. Please wait.
Published byAbigail Roberts Modified over 10 years ago
1
eBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton
2
eBankII Workshop 2 Scientific Data Overload!
3
eBankII Workshop 3 CombeChem: eScience testbed Properties X-Ray e-Lab Analysis Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database
4
eBankII Workshop 4 Chemistry Publications Ideas and interpretationsHooks into the literature Results & derived data Raw data!
5
eBankII Workshop 5
6
eBankII Workshop 6 Establishing common ground Understand the data creation process Terminology and definitions –Data –Metadata –Datafile –Dataset –Data holding Different views –Digital library researchers, computer scientists, chemists –Generic vs specific –Modeller vs practitioner Aim for a common ontology Modelling the domain Creating a metadata schema
7
eBankII Workshop 7 Crystallography workflow RAW DATADERIVED DATARESULTS DATA
8
eBankII Workshop 8 Crystallography datasets Initialisation: mount new sample on diffractometer & set up data collection Collection: collect data Processing: process and correct images Solution: solve structure Refinement: refine structure CIF: produce CIF (Crystallographic Information File format) Report: generate Crystal Structure Report Validation: generate report from structure checks Within a dataholding are the following datasets:
9
eBankII Workshop 9 Publishing, Informatics & Schemas Current schema is for publishing / advertising only eCrystals publishing requires lightweight schema only eBank harvesting requires lightweight schema only Aggregation and Linking requires a comprehensive schema Data management, Information delivery and Searching services require a very rich schema
10
eBankII Workshop 10 Deposition into the archive
11
eBankII Workshop 11 An Archive entry ecrystals.chem.soton.ac.uk
12
eBankII Workshop 12 Access to the underlying data
13
eBankII Workshop 13 Some metadata issues Using simple and qualified Dublin Core Additional chemical information in schema for harvesting e.g. empirical formula Schema contains International Chemical Identifier (InChI) Specifies which datasets are present in an entry Links to ePrints (and other published literature) derived from the data Using vocabularies specific to crystallography
14
eBankII Workshop 14 Harvesting: OAIster
15
eBankII Workshop 15 Linking and aggregating
16
eBankII Workshop 16 Embedded in a science portal
17
eBankII Workshop 17 Current situation Version 2.0 eBank metadata schema Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software Exports records as ebank_dc and oai_dc Validation of schema & discussion with International Union of Crystallography for developments (and wider deployment) Pilot eBank UK aggregator service Developing search interface Version 1.0 Testing with PSIgate physical sciences portal – embedding eBank UK
18
eBankII Workshop 18 Whats next? Generic metadata schema vs Subject specific schema Validation against other schema (CCLRC Model) (Eprints.org software: allow for more generic scientific data and schemas?) Metadata enhancement: keywords based on knowledge of keywords in related publications? Investigate identifiers: International Chemical Identifier Explore context sensitive linking Embedding into chemical and crystallographic research and publishing e-Learning embedding and pedagogic evaluation Feasibility study in related domains
19
eBankII Workshop 19 Crystallography Schema Breakout Describing non dc: terms –METS –SET container Rights –IPR –Copyright –Publisher –Funder Linking –DOI –Keyword ontology –Identifiers Data validation - Add validation dataset - Other forms of validation: Mogul Chemical representation - Naming conventions - Empirical formula representation Relationship between repositories and harvesters - Registration / subscription Syndication - FRIENDS container - RSS feeds
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.