Download presentation
Presentation is loading. Please wait.
Published byNelson James Modified over 9 years ago
1
Digital | Curation | Centre Digital Curation Centre www.dcc.ac.uk Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and Seamus Ross Funded by:
2
Digital | Curation | Centre 2 Session Overview 1. Introduction & Briefing 2. Towards a Technical Model of Digital Curation: our R&D 3. Planning Delivery of Services & the Associates Network
3
Digital | Curation | Centre 3 1. Introduction & Briefing Background story on the DCC ‘So who’s that new kid on the block?’ What is digital curation anyway? –‘adding value’ & ‘ensuring longevity’ Aims & objectives for the DCC –‘improving the quality of what is done’ Our planning & our progress –timelines & deliverables How does this relate to the JISC Programme?
4
Digital | Curation | Centre 4 Background to the DCC (1) Two parallel policy concerns 1. Neglect of digital heritage, especially given investment in digitsation programmes JISC Continuing Access and Digital Preservation Strategy, 2002-2005 –eLib Programme, eLib3, Circular 5/97: Digital Preservation Digital Preservation Coalition formed in 2002 2. Differing data sharing practices in eScience, especially given huge data volumes Links between eScience Programme and JISC Report commissioned by JISC Cttee for Support of Research (Lord & Macdonald, May 2003) –twin drivers: Digital Preservation & Continuing Access (e-Science) –identified need for national digital curation centre
5
Digital | Curation | Centre 5 Interpretation of JISC policy JISC plays 3 roles 1.promotes, supports & develop management & preservation of institutional and community digital materials for UK benefit 2.partner to Research Council/AHRB & other national/international bodies 3.as organization, appropriate grant conditions for JISC-funded creation of digital resources; good practice for JISC created/managed materials “escalating scale and complexity of digital resources to be curated and the subsequent urgency of developing a critical mass of expertise, shared services and tools, for long-term digital preservation … require a step change in investment and approaches. –“Over the next three years a greater emphasis on development of production services and tools … needed to build on previous research studies and projects.” “Digital preservation remains a challenging area in which techniques, costs, and skills are still in development: advocacy, dissemination and training, to embed preservation needs as appropriate in JISC funding programmes.”
6
Digital | Curation | Centre 6 Interpreting the implementation plan Risk assessment studies, eg ePrints –Calls to implement studies’ recommendations for services and integration of preservation activity & standards into repositories funded by JISC. Series of community calls to support records management and digital preservation in institutions - cf FOI compliance. Establish Digital Curation Centre to: Provide central focus of skilled staff & research links to wider network of development activity, researchers, & services Develop set of central services, standards, and tools for a range of distributed digital data centres & preservation services, across the Information Environment & Research Grid. JISC Partnership funding, –eg Web-archiving study: jointly funded by JCIE and Wellcome Trust » Digital Preservation Coalition as an independent entity with JISC membership and sector activity supported by JISC. National preservation of e-journals, through RLN/RSLG
7
Digital | Curation | Centre 7 Back to the DCC Background (2) JISC Circular 6/03, initially issued June 2003 –Call postponed, revised & re-issued with more significant research component –Joint funding: JISC and e-Science Core Programme –£750K pa (outreach, services & development) £250K pa (research) –Unlikely that any single organisation could do what’s expected –Expressions of Interest & Full Proposals from Consortia –Final selection made in December 2003 –Negotiations & clarification in January 2004
8
Digital | Curation | Centre 8 Designation of DCC Task entrusted to Consortium of four institutional partners –Universities of Edinburgh (lead), Glasgow & Bath together with CCLRC (Rutherford Appleton and Daresbury Laboratories) –brought together through the National eScience Centre jointly managed by Universities of Edinburgh & Glasgow Two 3-year awards made: –JISC funding started on 1st March 2004 –EPSRC grant-funded starts on 1st September 2004 Phase One set-up –some ‘early deliverables’ of website & helpdesk –preparation for full operation & launch of services in October –planning formal opening for early November 2004
9
Digital | Curation | Centre 9 Responsibilities across the DCC Them with titles … –Peter Burnhill, Director (Phase One) with Robin Rice, Phase One Project Co-ordinator (from EDINA & Data Library, University of Edinburgh) –Peter Buneman Research Director (& PI on EPSRC grant) Professor of Informatics, University of Edinburgh –Liz Lyon, Associate Director (Community Support & Outreach) Director of UKOLN, University of Bath –Seamus Ross, Associate Director (Service Definition & Delivery) Director of HATII [ERPANET], University of Glasgow –David Giaretta, Associate Director (Development) Head of Astronomical Software & Services, CCLRC Two significant & well known ‘Ex Portfolio’ names –Malcolm Atkinson, Director, NeSC –Chris Rusbridge, Director, Information Services, UofGlasgow
10
functional management & collaboration Industry research collaborators standards bodies testbeds & tools communities of practice: users community support & outreach research development co-ordination service definition & delivery management & admin support curation organisations eg DPC Collaborative Associates Network of Data Organisations
11
Digital | Curation | Centre 11 What is this digital curation anyway? The term Digital Curation is a new invention. Digital Data Curation Task Force - Report of Strategy Discussion Day (2002) –citing Tony Hey citing use by Dr John Taylor, Director General of the Research Councils, to distinguish the actions involved in caring for digital data beyond its original use, from digital preservation. The concept’s reach extends beyond libraries. – The e-Science Curation Report (2003) proposed the following distinctions: –Curation : managing & promoting the use of data from point of creation, to ensure fit- for-contemporary-purpose, available for discovery & re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will involve maintaining links with annotation & with other published materials. –Archiving : curation activity which ensures that data are properly selected, stored, can be accessed logical and physical integrity is maintained over time, including security and authenticity. –Preservation : activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.
12
Digital | Curation | Centre 12 digital curation:... digital objects and data, over their life-cycle, for current & future generations of use... = f(data curation & digital preservation) data curation [when high current/ongoing interest] –actions needed to maintain and utilise digital data & research results over entire life-cycle –data creation & management; adding value; generating new sources of information & knowledge, for use digital preservation [for longevity;fall off in interest] –long-run technological/legal accessibility & usability –storage, maintenance & accessibility of information content in digital material over the long-term, for use OAIS concept of designated community Digital curation redefined...
13
Digital | Curation | Centre 13 Data curation in action Astronomy Integrating and analysing distributed data (AstroGrid) publishing multi-TB sky surveys (SuperCOSMOS & WFCAM) interoperability standards (IVO Alliance) BioInformatics data publishing: generic tools for XML export (EBI Biomart) annotation tools for massive data sets (Pubmed, VOTable) archiving tools for dynamic data sets (biological DBs) Environmental sciences spatio-temporal annotation (OS Mastermap/ Mouse Atlas) Document management Repository certification (RLG Task Force)
14
Digital | Curation | Centre 14 Digital preservation approaches Migration & Refreshment Emulation & Encapsulation Digital Archaeology & Rescue Document Format Specification Robin Rice & Najla Semple, http://www.lib.ed.ac.uk/sites/digpres/
15
Digital | Curation | Centre 15 Communities of Practice: Social Sciences (IASSIST) History of sharing – economical in terms of both data collector and respondent Data about humans – problems of confidentiality confronted early on Mixed blessing of agreed proprietary formats (OSIRIS, SPSS, etc.) allows migration ‘Future-proofing’ - 30 years of data advocacy! –Tradition of data archiving & data citation –Building new data standards out of common experience data archivists, & data librarians: the new digital curators? www.iassistdata.org
16
Digital | Curation | Centre 16 Unifying Themes for D C C ‘data as evidence’ –for one or more designated communities ‘archival responsibility’ –at one or more institutional levels –with institutional policies & individuals’ competence engage/discover communities of practice, to invoke/provoke good practices –appraisal & retention/disposal –logical & physical integrity: authenticity/security research problems in productive research domains –eg Informatics, Law School
17
Digital | Curation | Centre 17 Aims & Objectives for the DCC ‘quality improvement in data curation & digital preservation’ –Initial focus: data as evidence for scholarly conclusions –Wider remit: worlds of scholarly communication & eLearning twin aims:excellence in research & excellence in service need to bridge across communities: –universities & research institutes –scientific data tradition & document tradition –multi-sectoral, international
18
Digital | Curation | Centre 18 We are all curators now... The term “curation” builds on our understanding of the word “curator” –who keeps something for the public good, value of which often needs to be brought out by the curator. 1. this open context implies more support for explicit policies with regard to data sharing, and it has major implications for structuring and tools. 2. the digital curator as ‘store-keeper’ closely linked to promoting new science, looking forward to identify new ways to serve present and future researchers. digital curator should take an active role in promoting and adding value to holdings – manage the value of collection –adding links and annotation to provide context –recording provenance of changes made
19
Digital | Curation | Centre 19 Planning & Progress We must plan for the Long, with our 2020 Vision - 15yrs –we have large territory, and large expectation multi-disciplinary, multi data type, multi tradition/profession national and international, but also local and hidden from view a lot is going on –how to ensure that we do something sensible with the ££’s and the trust we have been given? –who/what should we plan to affect/effect? policy-makers; ‘responsible curators’; (researchers?) how do we wish to be judged, and when? collaboration & win-win-win scenarios
20
Digital | Curation | Centre 20 focii of attention in set-up phase Users: client, peer and policy communities –outreach & community support; service definition/delivery; development co-ordination; research agenda –user requirements analysis: Leona Carpenter (Focus Groups) Consortium: ‘organisation’ from partner participation –roles; commitment; norming/performing; operational communication; consortium agreement (IPR) Employers: institutional settings –re-deployment/appointments; accommodation; commitment/reporting -> Project Plan, as living document
21
Digital | Curation | Centre 21 weekly AccessGrid/telecon; two face2face meetings –defining programme of deliverables; re-deploying & recruiting staff; planning appointment of full time director in time for Launch early ‘deliverables’: –www.dcc.ac.uk with links, presentations & progress updates –digitalcuration@ed.ac.uk for contacts & offers of collaboration project plan submitted to JISC, late May 2004 defining R & D programme & services for delivery eg curation architecture; repository of tools & technical information engaging curators in existing community of practice Phase One Progress, March -
22
Digital | Curation | Centre Towards a Technical Model of Digital Curation: our R&D David Giaretta Funded by:
23
Digital | Curation | Centre 23 What can we rely on in the Long Term The bits - BIT PRESERVATION Paper documents that people can read –ISO standards The information we collect – either in the far future DCC or its successor Some kind of remote access Some kind of computers People?
24
Digital | Curation | Centre 24 Preservation “vs” Current Use There are already very many architectures to support immediate use of information –Including JISC architecture –Aim to support these Therefore chose to be guided by –long-term preservation aspects –to promote this we should emphasise “interoperability” and “automated use” as far as possible. –based initially on OAIS Reference Model – but add other ideas later –bear e-Science in mind
25
Digital | Curation | Centre 25 OAIS Reference Model – Functional Model
26
Digital | Curation | Centre 26 OAIS – Preservation Planning - key aspects Representation Net Designated Communities & Knowledge Base
27
Digital | Curation | Centre 27 Representation Net
28
Digital | Curation | Centre 28 Preservation Issues Given a file or a stream of bits how does one know what Representation Information is needed (this question applies to Representation Information itself as well as to the digital objects we are primarily interested in preserving and using); how does one know, for example, if this thing is in FITS format? Someone may simply “know” what it is and how to deal with it i.e. the bits are within the Knowledge Base One may be able to recognise the format by looking for various types of patterns. One may feed the bits into all available interpreters to see which accept the data as valid Other means…. The only safe way: have an associated label which points to the appropriate Representation Information –Note this does not exclude the other methods e.g. for data rescue
29
Digital | Curation | Centre 29 High Level View Example of use of Representation Information Labelling
30
Digital | Curation | Centre 30 Implications A label must be attached to each piece of digital object as a necessary (but not sufficient) condition for long-term preservation –logical attachment or packaging TBD by the DCC. The label should at least identify Representation Information. For long-term preservation this label must therefore be a DCC persistent identifier. –allow some normalisation In order for the Representation Information to be persistent then it should either be held with the data object itself or be part of a central repository – part of the DCC. Thus the DCC needs a DCC Representation Information Repository. This repository would include –a Format Repository (covering structural information) *automated use would be supported by use of formal description languages such as EAST (ISO 15889, http://east.cnes.fr/ ) or DFDL (http://forge.gridforum.org/projects/dfdl-wg/) http://east.cnes.fr/http://forge.gridforum.org/projects/dfdl-wg/ –a Semantic Repository with, for example, Data Dictionaries and Ontologies –Software Repository – with appropriate emulation capabilities Each piece of digital RI is also a digital object – which is understood either by the users’ Knowledge Base OR by further Representation Information. Therefore each piece of RI also has a label pointing to further RI.
31
Digital | Curation | Centre 31 Designated Community Techniques must be created for –defining a Knowledge Base –linking a Knowledge Base to a Designated Community –linking Representation Information to a Knowledge Base if possible
32
Digital | Curation | Centre 32 Representation Information (1) Structure – including Formats –Distinguish formats which are used mainly for rendering – to be followed by human inspection, and formats used for automated processing Implications: –Representation Information Repository should define selected file formats using EAST and DFDL –Definitions should include scientific objects and humanities objects
33
Digital | Curation | Centre 33 Representation Information (2) Semantics –Hard problem start with Data Dictionaries –Implications: the Representation Information Repository should include Data Dictionaries, followed by more general semantics
34
Digital | Curation | Centre 34 Representation Information (3) Time Dependent Information –Many, perhaps most, datasets change over time and the state at each particular moment in time may be important. It may be useful to break the issue into separate parts. at each moment in time we could, in principle, take a snapshot and store it. That snapshot has its associated Representation Net. efficient storage of a series of snapshots may lead one to store differences or include time tags in the data (see for example P.Buneman, S. Khanna, and Wang-Chiew Tan. On the Propagation of Deletions and Annotations through Views. Proc.21st ACM Sym. on Principles of Database Systems.). –Additional Representation Information would be needed which describes how to get to a particular time's snapshot from the efficiently encoded version. –Also applies to ANNOTATION – who said what and when did they say it –Implications: These are area of active research within the consortium and the DCC should be able to provide –advice and well tested tools for certain forms of efficient encoding of time dependent information –advice on annotation –identifiers and Representation, perhaps in the form of software, for the associated encodings
35
Digital | Curation | Centre 35 Representation Information (4) Actions and Processes (Behaviour?) –Some information has, as an integral part of its content, an implicit or explicit process associated with it – this could be argued to be a type of semantics, however it is probably sufficiently different to need special classification. An examples of this is a database or other time dependent or reactive system such as a Neural Net. –Emulations – Universal Virtual Computer (UVC) –Implications: Support Software emulation via a UVC (possibly based on JVM) Support time dependent or reactive systems
36
Digital | Curation | Centre 36 Persistent Ids Implications: –Use of existing, or creation of new, infrastructure (standards, protocols, servers etc) for persistent IDs with adequate flexibility and longevity as part of the succession planning, agreement would be needed with appropriate organisation to act as backup and inheritor of DCC data.
37
Digital | Curation | Centre 37 Archival Information Package
38
Digital | Curation | Centre 38 Preservation Description Info
39
Digital | Curation | Centre 39 AIP implications – PDI define standard Preservation Metadata – based initially on OCLC work – including Michael Day’s work and also CCLRC work etc define adequate Packaging technique – almost certainly XML based define recommended tools and procedures for creating Fixity Information such as checksums and digests, together with associated Representation Information investigate authentication systems
40
Digital | Curation | Centre 40 Audit and Certification Implications: –facilitate production of standard(s) on which a certification program can be based –work to establish accreditation and certification body in preparation for offering audit and certification services –audit, certification and accreditation are potential sources of long term funding for the DCC –software certification will require testbeds and testing procedures. Hardware and software systems will need to be purchased, hired or borrowed. The DCC associates would be useful partners. We might expect Commercial software to be offered to us by the manufacturer for testing Testing commercial software could be fee based.
41
Digital | Curation | Centre 41 Implications for Research Research needed on Representation Information (Structure and Semantics) e.g. –Investigate fundamental limitations of bit-level descriptions and existing tools. –Contribute to DFDL definition –Investigate capabilities needed to describe rendered format (including Word, PDF etc) Data Virtualisation – define Science objects and “Humanities” objects Research is needed to: –Support Software emulation via a UVC (possibly based on JVM) –Support time dependent or reactive systems Research is needed to provide a solid basis on which we can develop persistent IDs with adequate flexibility and longevity Research is needed to allow the DCC to: –define standard Preservation Metadata – based initially on OCLC work –define adequate Packaging technique – almost certainly XML based –investigate authentication systems with a view to preparing recommendations for users and consider offering, for example, a (fee-based) key storage service. A rigorous theoretical basis must be put in place from which we can create techniques for: –defining a Knowledge Base –linking a Knowledge Base to a Designated Community –linking Representation Information to a Knowledge Base if possible
42
Digital | Curation | Centre 42 Curation Manual Put in place quickly using international experts Updates annually Build to “curation encyclopaedia”
43
Digital | Curation | Centre 43 Document format specification They borrowed from records management tradition - institutions to create documents in standard or open formats, which are easier to preserve. Much easier to do in a strict records management environment with a published policy of retention schedules and a clear knowledge of internally produced records. Stipulating a specific file format is harder in a research environment where a wide range of digital materials are produced and have to be preserved. The move to DDI DTD in social science data world may be seen as an example of this preservation technique.
44
Digital | Curation | Centre 44 Services & Development Turns Research into ‘Products for Research’ that our communities can use with confidence –tracking and testing tools and standards that are correct, usable, reliable, well documented e.g. for ingest, repository management, data exchange, ontologies working with tool developers wherever possible developing testbeds & interworking with other testbeds –aim to gain leverage formats working with other projects worldwide using generic tools and techniques –to develop strategies for emerging digital formats –Metadata standards long-term viability of metadata Registries underpin, to provide basis of Advisory Service
45
Digital | Curation | Centre Scientist Research Process Secondary (derived) data Tertiary data for publication Primary publication Secondary publication Tertiary publication Peer Review Pre-prints & e-Prints Publication archives Library - Peers - Public - Industry Publication Process Primary data Web Content Patent data Research Process Level 1 curation © Philip Lord, 2003
46
Digital | Curation | Centre Scientist Research Process Secondary (derived) data Tertiary data for publication Primary publication Secondary publication Tertiary publication Peer Review e-Prints Publication archives Library - Peers - Public - Industry Publication Process Primary data Web Content Patent data Research Process Research based on data Metadata Archivist © Philip Lord, 2003 Level 2 curation Archived data
47
Digital | Curation | Centre Scientist Research Process Secondary (derived) data Tertiary data for publication Primary publication Secondary publication Tertiary publication Peer Review e-Prints Publication archives Library - Peers - Public - Industry Publication Process Primary data Web Content Patent data Research Process Research based on data Metadata Curation Curator Curation Process Data repositories © Philip Lord, 2003 Level 3 curation Archived data
48
Digital | Curation | Centre 48 Faith in the medium ?
49
Digital | Curation | Centre 49 Faith in the technology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.