UKOLN is supported by: Enhanced support for eScience: the role of Digital Libraries Digital Libraries Go eScience, ECDL, Alicante September 2006 Rachel Heery Deputy Director R&D, UKOLN A centre of expertise in digital informaion management
Summary New modes of scholarship –eScience service portfolio –Emerging eResearch ecology Infrastructural elements Data creation and capture Data curation and preservation Data citation, discovery and use Adding value and knowledge extraction
Vision 2010 Richer scholarly communication based on open access to and re-use of scholarly materials Integrated life-cycle of knowledge from research to learning Access and re-use of scholarly materials Added value services on scholarly materials (involving HE and commercial sectors)
More repositories and more content! Working papers, primary data, audiovisual, images Hardware in research labs will automatically deposit experimental data Desktop tools will deposit content Rich data flow between networks of repositories Rich data flows between repositories and other components in information landscape National and institutional preservation strategies in place!
Repositories interworking with other eResearch components repository Repositories Experimental equipment Authoring tools Name authority services Field study capture tools Terminology services Content Packaging tools
Where are we now?
Scholarship today? OA landscape
23 June 2006 Architecture of Participation?
Data- centric 2020 vision Reference datasets as infrastructure?
New forms of publication: integration of data and journals
Emerging ecology
Defining workflows and dataflows Analyse roles and interactions within and beween repositories What does the user want? Identify and define services –Potential for shared services, re-use of services Explore potential dataflows –Aggregation, data exchange, metadata extraction and enhancement
Dataflows and Workflows How is primary research data captured in faculty and academic departments? Where and how is primary research data stored? Made accessible? What are processes for deriving further data and how is this is structured and stored? Made accessible? How is data curated for the long term?
Understanding the research process Project StORe: Source-to-Output Repositories (Edinburgh) –Primary data : research publications –Survey questionnaire RepoMMan: Repository Metadata and Management (Hull) –Survey questionnaire and interviews –Activity diagram and workflow DCC SCARP –Curation staff working within research teams
Repository ecology Institutional Repository Departmental repository Authoring tool Subject repositories Institutional research system Data Centres Learned society repositories Laboratory repository Experimental machine Aggregators: OAIster, Google Regional, national Text mining tools Terminology services Research council repositories
Digital libraries & eScience Infrastructure
Data capture
Digital repositories, OA & preservation Long-term access: trust, responsibility, policy Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005 Defined criteria under 4 categories –Organisation –Functions, processes & procedures –Designated community & usability –Technologies & technical infrastructure UK Digital Curation Centre: advice, tools & services RepInfo Registry EU CASPAR Integrated Project Task Force on the Permanent Access to the Records of Science
Data, metadata and discovery Validation, publication & discovery of data models & schema Metadata packaging standards –METS, MPEG 21 DIDL –Complex object model? Semantic descriptions –Formal high-level and domain ontologies –Inter-disciplinary discovery ePrints DC Application Profile UK Intute IR search service (eprints) Informal social network approaches folksonomies What data models and metadata schema are in place? Have librarians been involved in their development?
Persistent identifiers for data citation How will they be used? We need use cases: depositor, author, service provider, researcher, publisher? Schemes: DOI, Handle, ARK, PURL Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets What persistent identifiers have been assigned to your data? Is there a data citation policy? Was the Library involved?
Adding value: repository services Tools: for deposit, normalisation, manipulation, transformation….. Linking, annotation, visualisation Aggregators: generic, (sub-) disciplinary Knowledge extraction: Mining (data, text, structures) Modelling (economic, climate, mathematical, biological…) Analysis (statistical, lexical, gene….) Is your data OA? How is your data being used and re-used?
Nature 23 March 2006 OTMI: Open Text Mining Interface NaCTeM Emerging tools: TerMine, GENIA, Cafetiere
A Case Study in Crystallography
Data capture
R4L Deposit scenario (…part of….) 1.Produce strategy for synthesis (=idea) 2.Submit plan to SmartTea system (incl. identifiers) 3.Retrieve and follow instructions (sub-workflow?) 4.Experimental synthesis metadata automatically recorded on instruments (Smart Lab) 5.Create record for synthesised sample (+ proposed chemical identifier) in R4L laboratory data management system 6.Run spectral analyses on sample capturing further analysis metadata (incl. time-stamp, analysis software version, researcher details etc.) 7.Save spectrum in native and common formats 8.Invoke R4L data capture service and deposit files + metadata in laboratory repository… RAW DATADERIVED DATARESULTS DATA
eBank UK Project Promote open access crystallography data Aggregator service harvests OAI metadata from institutional data repository (e-Crystals archive) Service linking from data to derived research publication Embedding eBank service in learning workflows: pedagogy Future federation plans for crystallography data repositories UKOLN (lead), University of Southampton, University of Manchester
A data repository entry ecrystals.chem.soton.ac.uk
Access to the underlying data: complex objects
eBank Metadata Publication Using simple Dublin Core Crystal structure Title (Systematic IUPAC Name) Authors Affiliation Creation Date Additional chemical information through Qualified Dublin Core Empirical formula International Chemical Identifier InChI Compound Class & Keywords Specifies which datasets are present in an entry Application Profile DOIs from TIB Data citation policy
Discovering data: Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10), DOI: /b502828k Domain identifier: International Chemical Identifier (INChI) code Google molecule using INChI Slide from Simon Coles
Adding value: eBank linking data to publications
Linking research to learning - embedding eBank aggregator service in a science portal for student learners
Integration into the curriculum and e- Learning workflows MChem course Assess role in Undergraduate Chemical Informatics courses Pedagogic evaluation April – June 2006 Report to follow.
Roles & responsibilities: new challenges?
Workforce development and capacity building NSF Draft Report 2005 Data scientist - hybrid skills Facilitate collaboration –Multidisciplinary teams: computer scientists, domain scientists, digital library experts, statisticians/modellers e.g. eBank project –Lessons learnt: e-Science Human Factors Audit Report (to be published 2006) Roy Kawalsky, Loughborough CURL/SCONUL e-Research Taskforce Has your (digital) library engaged with the e-Research agenda?
Repositories roadmap :vision 2010 Richer scholarly communication based on open access to and re-use of scholarly materials Integrated life-cycle of knowledge from research to learning Available metadata about scholarly materials Added value services on scholarly materials (involving HE and commercial sectors)
More repositories and more content! Working papers, primary data, audiovisual, images Hardware in research labs will automatically deposit experimental data Desktop tools will deposit content Rich data flow between networks of repositories Rich data flows between repositories and other components in information landscape National and institutional preservation strategies in place!
Repository interworking with other components repository Repository Virtual Learning Environment Authoring tool Name authority service Institutional research system Automated classification service Packaging tool
Where are we now?
Scholarship today? OA landscape
Repository ecology Institutional Repository Departmental repository Authoring tool Subject repositories Institutional research system Data Centres Learned society repositories Laboratory repository Experimental machine Aggregators: OAIster, Google Regional, national Text mining tools Terminology services Research council repositories
Defining workflows and dataflows Analyse roles and interactions within and beween repositories What does the user want? Identify and define services –Potential for shared services, re-use of services –In context of JISC e-Framework Explore potential dataflows –Aggregation, data exchange, metadata extraction and enhancement
Deposit a priority! To enable users to populate repositories simply, effectively and preferably automatically To capture content from desktop applications, experimental equipment (smart labs), learning content development tools etc To enable repository of deposit to exchange data with further repositories in predictable manner To hide complexity from end-user To be compatible with follow-on added value services layered on repository content Deposit API Working group meeting July 11/12, Warwick
Thank you!