Download presentation
Presentation is loading. Please wait.
Published byValerie Thomas Modified over 8 years ago
1
Long term preservation: an overview Michael Day Digital Curation Centre UKOLN, University of Bath http://www.ukoln.ac.uk/ Joint Workshop on Electronic Publishing, Lund, Sweden, 15 April 2005 Digital Curation Centre a centre of expertise in data curation and preservation Funded by:
2
2 Session overview –Quick introduction –A fifteen year view –Overview of current issues
3
3 What is digital preservation? –Dealing with the potential technical problems that impede continued access to digital resources (of all types) –No longer possible to place physical artefact on a shelf and ignore for 100+ years –Not just a technical problem: "... The planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable" - Margaret Hedstrom (1998)
4
4 What is digital curation? –New(ish) term, from science data world (e.g. bioinformatics) –Reflects those extra things that need to be done to facilitate access and reuse –"The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse" - Philip Lord, et al. (2004)
5
5 Why is it a problem? (1) –An increasing flood of 'born-digital' data The World Wide Web –Comprises billions of pages + "deep Web" –Internet Archive = >1 petabyte, and growing @ 20 Tb. per month (http://www.archive.org/) Data deluge in science and engineering –Petabytes generated by high throughput instruments, streamed from sensors and satellites, etc. 5 exabytes of new information created in 2002: –http://www.sims.berkeley.edu/research/projects/how- much-info-2003/
6
6 Why is it a problem? (2) –Need for (open) access to this data Results in added scientific value New analytic techniques 2004 - OECD member states endorsed the principle that publicly funded research data should be openly available to the maximum extent possible
7
7 Technical problems –Media longevity Estimated lifetimes are short compared to paper or good quality microform Solutions: more durable media, 'refreshing' regimes –Hardware and software obsolescence Relatively short obsolescence cycles for hardware, peripherals, media, and software For example, BBC Domesday Project (1986) - hybrid videodisc
8
8 Preservation strategies (1) –Technology preservation The preservation of an information object together with all of the hardware and software needed to interpret it –But will lead to museums of "ageing and incompatible computer hardware" - Mary Feeney (1999) –Has key role in the rescue of digital objects (digital archaeology) –Emulation The preservation of original application software and to run this on emulators that mimic the behaviour of obsolete hardware and operating systems –Development of ‘virtual machines’ that will be migrated to work on different platforms (Jeff Rothenberg, 1998) –Universal Virtual Computer (UVC) concept
9
9 Preservation strategies (2) –Migration –Managed transformations –The periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) –Widely used strategy, e.g. on ingest into a repository –Problems with preserving the integrity of an object –Encapsulation –Self-describing objects, e.g. information package in OAIS model, METS, Buckets, Universal Preservation Format
10
10 Preservation strategies (3) –Metadata and documentation –All digital preservation strategies depend - to some extent - on the creation, capture and maintenance of metadata –"Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003) –The various types data that will allow the re-creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, 2001) –Reference Model for an Open Archival Information System (OAIS) - ISO 14721:2003 –PREMIS working group
11
11 A fifteen year retrospective –Based on my dissertation: "Preservation problems of electronic text and data" - Loughborough University (1989) Overview of the state of the art in digital preservation in the late 1980s Hardware and software used = IBM PC XT, MS DOS, 5¼" floppy disks, shareware word processing program (Galaxy)
12
12 The 1980s - contexts –Still faith in the "paperless" future –Electronic publishing in its infancy Online databases (mainly bibliographic) Viewdata systems (e.g., Minitel, Prestel) Experiments with electronic journals (e.g. BLEND, project quartet) and electronic document delivery systems (ADONIS) CD-ROM databases
13
13 The 1980s - issues (1) –Digital preservation issues: Major focus on the longevity of media –e.g., BNB Research Fund funded comparison of microfilm, magnetic media, and optical disks for archival storage (1983) –Interest in the potential value of new types of optical media, e.g. videodisc, Compact Disc (CD-ROM, CD-R) –No promising results from initial research
14
14 The 1980s - issues (2) –Knowledge that media longevity was not the only issue "The problem with machine-readable records is the long term availability of the machines rather than the physical decay of the recording mechanism" - John Mallinson (1986) Brief consideration of COM (microform) for long- term storage
15
15 The 1980s - experiences (1) –National archives: A focus in some countries on machine-readable records from the 1960s The principle that machine-readable records should be treated in the same manner as conventional records was established very early on, e.g. by Meyer Fishbein (1972) Also, there was an early recognition of the importance of documentation and economic factors
16
16 The 1980s - experiences (2) –Data archives: Storage of social science survey data started in the punched-card era (1940s) ESRC Data Archive established 1967 –Recognised the importance of developing procedures to manage data (e.g., migration on ingest) and of standardised descriptions (metadata) –National libraries: Were considering legal deposit obligations
17
17 The 1980s - summing up –Some differences with the position today, e.g.: –General lack of awareness –Focus on media longevity, 'refreshing' strategies –Little practical experience (except for data archives) –Some continuity, e.g. it was recognised: –That the obsolescence of hardware (and software environments) was a serious problem –That data management strategies and documentation/metadata were important –That digital resources were not conceptually different to non-digital ones
18
18 The current context (1) –The World Wide Web –Changes in scholarly communication, e.g.: Increased use of electronic journals, e-print repositories Changes in scientific practice: data-intensive science, Grid computing, petabyte-scale storage, e-research Current focus on open access –Similar developments elsewhere, e.g.: Broadcasting, e-commerce, e-government,...
19
19 The current context (2) –Task Force on Archiving of Digital Information (1996) in UK led to influential research projects like Cedars, eventually to the Digital Preservation Coalition (DPC) –Major current initiatives: US National Digital Information Infrastructure and Preservation Program (NDIIPP) ERPANET, NESTOR, KB's e-Depot, etc. UK Digital Curation Centre
20
20 Digital Curation Centre (1) –Funded from 2004 for three years by the JISC and the e-Science Core Programme –Main aim: "continuing improvement in the quality of data curation and digital preservation" –Will focus on all aspects of the research process, e.g. from data creation to publication and beyond, also on the work of repositories and data archives –Not itself a digital repository, but offering outreach and practical services to assist those who curate data …
21
21 Digital Curation Centre (2) –Main activities: Advisory services and outreach Development –Registries of Representation Information, testing of tools, … Research programme –Role of annotation, legal and socioeconomic issues, … Collaborative network of associates –Partners: Universities of Edinburgh (lead), Glasgow and Bath (UKOLN), CCLRC –http://www.dcc.ac.uk/
22
22 Key developments (1) –Greater awareness of the issues –Digital preservation now beginning to be taken seriously by governments and NGOs (e.g. Unesco Charter on the Preservation of Digital Heritage, World Summit on the Information Society) –More experience with developing systems and tools, e.g.: –DIAS (IBM), DSpace, Fedora, Internet Archive, LOCKSS, OCLC Digital Archive, PANDAS, PubMed Central, Storage Resource Broker, etc. –Journal publishers co-operating with KB on e-Depot
23
23 Key developments (2) –Standards Reference Model for an Open Archival Information System (OAIS) - ISO 14721:2003 –A reference model, not a blueprint - but increasingly influential Preservation metadata –Current focus on PREMIS working group, supported by OCLC and Research Libraries Group –Other activity ongoing, e.g. in scientific research domains
24
24 Research (1) Some key requirements identified in: It's about time: research challenges in digital archiving and long-term preservation, National Science Foundation and Library of Congress (2003): http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report. pdf Invest to Save: report and recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation (2003): http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint- WGs/digitalarchiving/Digitalarchiving.pdf
25
25 Research (2) –DELOS preservation cluster: Frameworks for the analysis of preservation strategies Building preservation functionality into digital libraries File formats and metadata Workshop on Digital repositories: interoperability and common services, Crete, 11-13 May 2005: http://www.ukoln.ac.uk/events/delos-rep-workshop/
26
26 Research (3) –Current JISC research programmes: Supporting Digital Preservation and Asset Management in Institutions –Relatively small-scale projects: assessment tools, training, user guides, etc. Digital Repositories (deadline last week) –Building on Focus on Access to Institutional Resources (FAIR) programme http://www.jisc.ac.uk/
27
27 Some issues (1) –Open access repositories and preservation: –Exact role of repositories still evolving: »Some advocates of open access treat digital preservation concerns as a distraction to the primary task of "filling up the archives" »But the recent National Institutes of Health public access policy requests grantees to submit publications to PubMed Central - emphasising its role for permanent preservation –Disaggregated model proposed, whereby not all repositories will have preservation responsibilities »Possible need for mechanisms for transferring content to third parties, e.g. national libraries
28
28 Some issues (2) –Trusted repositories: Attributes and responsibilities of 'trusted repositories' defined by RLG and OCLC working group (2002) –Builds on 1996 Task Force report and OAIS model –Attributes include the viability and financial sustainability of the organisation, and the need for accountability –Question whether these (and other criteria) could be used as a basis for certification is being explored by the Task Force on Digital Repository Certification, supported by RLG and the National Archives and Records Administration (NARA)
29
29 Some issues (3) –Collection development: Selection/appraisal, storage, access, 'de-selection' Preservation issues need to be considered early in an object's life-cycle (the traditional 'transfer to repository' model will not work) –Rethinking concept of 'custody' Cannot be done in isolation –Sharing responsibilities across repositories while maintaining useful redundancy
30
30 Some issues (4) –Legal issues: Repositories need the legal right to copy, migrate, reverse engineer software, etc. Problems with identifying rights holders Access - are "dark archives" the answer?
31
31 Some issues (5) –Economic issues: Still very little known about costs over the long term No widely used economic models Research-type funding is not long-term –Recent draft report for National Science Foundation asks whether digital collections should be treated like scientific facilities
32
32 Summing up (1) –Major differences from the late 1980s Problem has grown, but awareness of it is now much higher Many research projects, vendors, services, etc. now investigating this problem - not always particularly co-ordinated Encouraging signs in funding of NDIIPP, DCC and other recent initiatives
33
33 Summing up (2) –Co-operation is essential Some progress, e.g. DPC, ERPANET Need to work out how trusted repositories will work together in a distributed network Need for training –Many problems remain to be resolved Research (e.g. into provenance of data, the role of file format registries) Development of tools Integrating existing work
34
34 More information –National Library of Australia's Preserving Access to Digital Information (PADI) gateway: http://www.nla.gov.au/padi/ –Joint DPC and PADI bulletin What's New in Digital Preservation: http://www.dpconline.org/graphics/whatsnew/ –UK Digital Curation Centre: http://www.dcc.ac.uk/
35
35 Acknowledgements The Digital Curation Centre is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils and the Core e-Science Programme of the UK research councils. The consortium comprises the University of Edinburgh (lead partner), the University of Glasgow, the Council for the Central Laboratory of the Research Councils, and the University of Bath (UKOLN). http://www.dcc.ac.uk/ UKOLN is funded by the Council for Museums, Libraries and Archives (MLA) and the JISC, as well as by project funding from the JISC, the European Union and other sources. UKOLN also receives support from the University of Bath, where it is based. http://www.ukoln.ac.uk/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.