Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
Advertisements

IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
ePrints UK: progress so far Ruth Martin UKOLN, University of Bath
Open Archives Forum Workshop University of Bath, September 2003 Workshop summing up Rachel Heery UKOLN, University of Bath
Creating Institutional Repositories Stephen Pinfield.
AHM, Nottingham, September eBank UK : linking research data, scholarly communication and learning. Dr Liz Lyon, UKOLN, University of Bath Dr Simon.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Linking Data and Publications: the Chemistry Way Simon Coles School of Chemistry, University of Southampton, U.K. CLADDIER workshop.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
© S.J. Coles 2006 eCrystals: A Route for Open Access to Small Molecule Crystal Structure Data Simon Coles School of Chemistry, University of Southampton,
Supporting education and research Repositories in Context Digital repositories as components of an integrated infrastructure for education Leona Carpenter.
E-Prints What are they?! How do they relate to CombeChem? Simon Coles CombeDay (08/01/2004)
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
UKOLN is supported by: Realising the scholarly knowledge cycle: The experience of eBank UK Dr Liz Lyon, UKOLN, University of Bath, UK CNI Task Force Meeting.
UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference.
Digital | Curation | Centre Adding value to open access research data: reflections on the process of data curation Dr Liz Lyon, DCC Associate Director.
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
UKOLN is supported by: Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK.
UKOLN is supported by: eBank UK : linking research data, scholarly communications and learning. Dr Liz Lyon, UKOLN, University of Bath, UK JISC CNI Conference.
UKOLN is supported by: Data, information and knowledge repositories: developing infrastructure to support the e-Research landscape. Dr Liz Lyon, Director.
JISC Joint Programmes Meeting eBank UK : linking research data, learning and scholarly communications. Dr Liz Lyon, UKOLN, University of Bath Dr.
A centre of expertise in digital information management UKOLN is supported by: Digital repositories as research infrastructure: a UK perspective.
UKOLN is supported by: Adding value to open access research data: the eBank UK Project. Dr Liz Lyon, Director UKOLN, University of Bath, UK OAI4, CERN.
A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.
UKOLN is supported by: Emergent technologies & digitisation: the institutional impact. Liz Lyon & Kevin Edge VCs Retreat, October a.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
UKOLN is supported by: The JISC Information Environment Metadata Schema Registry (IEMSR): Update DC-2006, Manzanillo, Mexico October 3-6, 2006 Rachel Heery.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
UKOLN is supported by: Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of.
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
UKOLN is supported by: Developing e-Infrastructure to support new research and learning paradigms. Dr Liz Lyon, Director UKOLN, University of Bath, UK.
OAI and Publishers metadata Using the static repositories approach to disclose small journals.
EBank UK CCLRC Workshop February eBank and CCLRC Workshop February 2005 University of Bath.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
© S.J. Coles 2005 ACS 2005, San Diego Furthering Chemoinformatics through ‘Crystalloinformatics’ Simon J. Coles EPSRC National Crystallography Service.
© S.J. Coles 2006 Data Management in the Chemistry Domain Simon Coles School of Chemistry, University of Southampton, U.K.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
© S.J. Coles 2005 eChemInfo2005 Open Archives as a Route for Capture, Dissemination and Access to Chemical Data and Information Simon Coles School of Chemistry,
CEN/ISSS DC workshop, January The UK approach to subject gateways Rachel Heery UKOLN University of Bath UKOLN is.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
UKOLN is supported by: Enhancing access to research data: the e-Science project eBank UK A centre of expertise in digital information management.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
CombeDay Making Data Openly Available Simon Coles.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.
eCrystals Federation: Open Repositories for global Open Science
VI-SEEM Data Repository
Realising the scholarly knowledge cycle:
‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya,
JISC Joint Programmes Meeting 2005
Developing Institutional Data Repositories
eCrystals Federation: Open Repositories for global Open Science
Presentation transcript:

Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath PV-2004, ESRIN Centre, Frascati, 5-7 October 2004

Overview eScience agenda –Imperative to re-use data –Publication at source Innovations in scholarly communications –Open Access –Institutional repositories eBank UK –Integrating research data and journal articles –Information architecture and data flow –Data model and schemas Challenges for the future More effective curation by integrating research data and publications

eBank project team UKOLN, University of Bath Michael Day Monica Duke Rachel Heery Liz Lyon University of Southampton Les Carr Simon Coles Jeremy Frey Chris Gutteridge Mike Hursthouse University of Manchester John Blunden-Ellis

Imperative to re-use research data

The next generation of research breakthroughs will rely upon new ways of handling the immense amounts of data that are being produced by modern research methods and equipment, such as telescopes, particle accelerators, genome sequencers and biological imagers….Similar developments are having an impact in the arts and humanities, and in the social sciences. A Vision for Research, Research Councils UK, December 2003

It is envisaged that the sharing of primary data would prevent unnecessary repetition of experiments and enable scientists to build directly on each others work, creating greater efficiencies and productivity in the research process. UK Parliamentary Committee report

Current chemistry publishing protocols Ideas and interpretations Results & derived data Hooks into the literature Raw data!

Calls for new modes of curation for digital data Publication Discovery Re-use Preservation

eBank motivation Publication bottleneck in many scientific communities Small percentage of data referenced in literature Limited amount of results data Publication at source Open repositories Link data to research literature More timely access

eBank focus on crystallography Computer controlled instruments Generates large quantities of digital data and metadata automatically Requirement for curaton of data Strict workflow Data formatted to international standard –Crystallographical Information File (CIF) maintained by the International Union of Crystallography CombeChem: funded by UK eScience programme

CombeChem: an eScience project X-Ray e-Lab Analysis Properties Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database

Emerging infrastructure to support curation of digital data

Improving access to research publications Repositories –Subject based (arXiv, CogPrints) –Institutional (CDL, MIT) –Supporting technology (DSpace, eprints.org) Open Access –Self archiving peer reviewed journal articles –Toll free journals (free at point of use) –Supporting technology (OAI-PMH) Potential for integrating access to data and publications

Supporting technology: Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Architecture of the OAI-PMH Harvest available metadata from Data Providers Place aggregated metadata in a repository Expose aggregated metadata via a Web interface Potential for added value services…

Architecture of the OAI PMH Consistent interfaces for data provider and service provider Low barrier protocol / effortless implementation Based on existing standards (e.g. HTTP, XML, DC) Harvester Repository Requests (based on HTTP) Metadata (encoded in XML) Metadata and DataMetadata Data Provider Service Service Provider

eBank in a nutshell Create institutional repository of Crystallography Data (at Southampton) Modify repository software to handle datasets (eprints.org at Southampton) Demonstrate eBank search service linked to ePrints UK, indexing harvested descriptions of datasets and journal articles (at UKOLN) Embed eBank service into PSIgate subject gateway (at Manchester) To develop pilot service linking journal articles and scientific datasets (September October 2005)

Institutional repository (Southampton repository) eBank UK aggregator service (metadata describing datasets) ePrint UK aggregator service (metadata describing journal articles) Harvesting OAI-PMH ebank_dc Harvesting OAI-PMH oai_dc Searching, linking and embedding PSIgate portal eBank architecture

Institutional repositories at various sites – providing links to data and journal articles, providing metadata for harvesting Various aggregators of metadata describing datasets – international subject based services, publishers services etc Various aggregators of metadata describing journal articles – international subject based services, publishers services etc Harvesting OAI-PMH ebank_dc Harvesting OAI-PMH oai_dc Searching, linking and embedding Embedded services in various specialist portals Potential extended architecture

First steps: establishing common ground… Understand the data creation process Terminology and definitions –Data –Metadata –Datafile –Dataset –Data holding Different views –Digital library researchers, computer scientists, chemists –Generic vs specific –Modeller vs practitioner Data modelling Defining metadata schema

Crystallographic data workflow Set up data collection Collect data Process + correct images Solve structure Refine structure CIF RAW DATA DERIVED DATA RESULTS DATA

Crystallographic data workflow Set up data collection Collect data Process + correct images Solve structure Refine structure CIF RAW DATA DERIVED DATA RESULTS DATA

Linking Crystallograpy data and journal ePrints EPrint (Local) eBank World JOURNAL PUBLICATION DATA HOLDING INVESTIGATION RAW DERIVED RESULTS CIF STRUCTURE REPORT DATASET (Contains DATAFILES) REPORT (EPrint) EBank REPORT

Crystallography data model

Metadata approach Extended Dublin Core for structure reports within institutional repository Both simple Dublin Core and extended Dublin Core are offered as alternative schemas for harvesting using OAI-PMH Exploring use of extended DC schema within DCMI –impact on aggregator service Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge

Extended Dublin Core schema Additional chemical information in schema for harvesting e.g. empirical formula Schema contains International Chemical Identifier (InChI) Links to all datasets associated with an experiment Links to individual datasets within an experiment Links to eprints (and other published literature) derived from the data Using vocabularies specific to crystallography

Structure reports link back to the underlying data…

eBank aggregator : search

Ebank aggregator: browse

And finally… eBank search embedded in a science portal

ebank_dc record (XML) Crystal structure (data holding) Crystal structure report (HTML) Dataset Institutional repository eBank UK aggregator service ePrint UK aggregator service Subject service Harvesting OAI-PMH ebank_dc Harvesting OAI-PMH oai_dc Searching, linking and embedding dc:identifier dcterms:references Linking dc:type= CrystalStructure and/or Collection Model input Andy Powell, UKOLN. PSIgate portal Eprint oai_dc record (XML) dcterms:isReferencedBy dc:type=Eprint and/or Text Eprint jump-off page (HTML) Eprint manifestation (e.g. PDF) Linking dc:identifier

Challenges for the future

Progress update Version 2.0 eBank metadata schema Enhanced ePrints.org software Pilot institutional e-data repository for harvesting (raw, derived, results data) Exports records as ebank_dc and oai_dc Pilot eBank UK aggregator service Developing search interface Version 1.0 Testing with PSIgate physical sciences portal – embedding eBank UK

Plans for eBank Phase 2 Progress towards generic data model for description of research datasets –Validate eBank schema against other schema –CLRC Scientific Metadata Model Modify eprints.org software to allow for more varied scientific data and schemas Investigate identifiers e.g. International Chemical Identifier (InChI code)

Plans for eBank Phase 2…….(contd.) Explore embedding in chemistry workflow Potential to expand remit to wider range of crystallography data other chemistry sub-domains broader physical sciences

eBank (potential) links with eLearning Provide access to primary research data within learning materials –in the taught postgraduate curriculum in chemistry, undergraduate project work, chemical informatics courses Inclusion of e-research data in e-learning courses. –through links in reading lists, through essay assignments, through analytical problems, through practical work, through RDN PsiGATE links

In conclusion eBank demonstrates benefits to research community Potential for integration into digital library services –Moving from demonstrator to service, need to involve publishers and specialist services

The end… Questions?