Download presentation
Presentation is loading. Please wait.
Published byToby Phillips Modified over 9 years ago
1
CombeDay 2005 1 Making Data Openly Available Simon Coles
2
CombeDay 2005 2 Data Overload!
3
CombeDay 2005 3 CombeChem: eScience testbed Properties X-Ray e-Lab Analysis Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database
4
CombeDay 2005 4 Chemistry Publications Ideas and interpretationsHooks into the literature Results & derived data Raw data!
5
CombeDay 2005 5
6
CombeDay 2005 6 Learning & Teaching workflows Research & e-Science workflows Aggregator services: eBank UK Repositories : institutional, e-prints, subject, data, learning objects Data curation: databases & databanks Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Validation Harvesting metadata Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Deposit / self- archiving Peer-reviewed publications: journals, conference proceedings Publication Validation Data analysis, transformation, mining, modelling Resource discovery, linking, embedding Deposit / self- archiving Learning object creation, re-use Searching, harvesting, embedding Quality assurance bodies Validation Presentation services: subject, media-specific, data, commercial portals Resource discovery, linking, embedding Linking
7
CombeDay 2005 7 Establishing common ground… Understand the data creation process Terminology and definitions –Data –Metadata –Datafile –Dataset –Data holding Different views –Digital library researchers, computer scientists, chemists –Generic vs specific –Modeller vs practitioner Aim for a common ontology Modelling the domain Creating a metadata schema
8
CombeDay 2005 8 Crystallography workflow Initialisation: mount new sample on diffractometer & set up data collection Collection: collect data Processing: process and correct images Solution: solve structures Refinement: refine structure CIF: produce CIF (Crystallographic Information File format) Report: generate Crystal Structure Report RAW DATADERIVED DATARESULTS DATA
9
CombeDay 2005 9 Deposition into the archive
10
CombeDay 2005 10 An Archive entry ecrystals.chem.soton.ac.uk
11
CombeDay 2005 11 Access to the underlying data
12
CombeDay 2005 12 Some metadata issues Using simple and qualified Dublin Core Additional chemical information in schema for harvesting e.g. empirical formula Schema contains International Chemical Identifier (InChI) Specifies which ‘parts’ of a dataset are present Links to eprints (and other published literature) derived from the data Using vocabularies specific to crystallography Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge
13
CombeDay 2005 13 ebank_dc record (XML) Crystal structure (data holding) Crystal structure report (HTML) Dataset Institutional repository eBank UK aggregator service ePrint UK aggregator service Subject service Deposit Harvesting OAI-PMH ebank_dc Harvesting OAI-PMH oai_dc Dataset dc:identifier dcterms:references Linking dc:type=“CrystalStructure” and/or “Collection” Model input Andy Powell, UKOLN. Eprint oai_dc record (XML) dcterms:isReferencedBy dc:type=“Eprint” and/or ”Text” Data flow in eBank Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Linking
14
CombeDay 2005 14 Harvesting: OAIster
15
CombeDay 2005 15 Linking and aggregating
16
CombeDay 2005 16 Embedded in a science portal
17
CombeDay 2005 17 Current situation Version 2.0 eBank metadata schema Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software Exports records as ebank_dc and oai_dc Validation of schema & discussion with International Union of Crystallography for final developments and wider deployment Pilot eBank UK aggregator service Developing search interface Version 1.0 Testing with PSIgate physical sciences portal – embedding eBank UK
18
CombeDay 2005 18 What’s next? Progress towards generic metadata schemas Validation against other schema (CCLRC Model) Eprints.org software: allow for more generic scientific data and schemas? Metadata enhancement: keywords based on knowledge of keywords in related publications? Investigate identifiers: International Chemical Identifier Explore context sensitive linking Full embedding into chemical and crystallographic research and publishing e-Learning embedding and pedagogic evaluation Feasibility study in related domains
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.