Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National.

Slides:



Advertisements
Similar presentations
National Library of New Zealand : strategies for interoperability: metadata projects and activities Karen Rollitt Douglas Campbell DCMI Localisation and.
Advertisements

Canada Managing the Government of Canadas Grey Information: Progress and Challenges Special Libraries Association 9 June 2004 Fay Hjartarson Library and.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Joint Information Systems Committee 11/03/07 | | Slide 1 Joint Information Systems CommitteeSupporting education and research JISC Conference 2007 Managing.
Action Programmes KAG / HoPS / PAD Workshop 3 rd March 2014.
Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
BORN DIFFICULT? MICHELLE LIGHT Director, UNLV Libraries Special Collections (formerly Head of Special Collections, Archives, and Digital Scholarship at.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
0 Jim Suderman Member Canadian Research Team, InterPARES 2 / Archives of Ontario Jim Suderman Member Canadian Research Team, InterPARES 2 / Archives of.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Applying Theoretical Archival Principles and Policies to Actual Born Digital Collections LEIGH ROSIN | Digital Archivist | National Library of New Zealand.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
Ethics and Information in the Digital Age Rafael Capurro University of Applied Sciences, Germany LIDA 2001, Dubrovnik, Croatia, May, 2001.
Developing PANDORA Mark Corbould Director, IT Business Systems.
WORLD BANK Publications The reference of choice on development The Promise, and Challenge, of Implementing Open Access at the World Bank Carlos Rossel.
Use of METS in CDL Digital Special Collections Brian Tingle.
Copyright Sayeed Choudhury 2004 This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial,
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
Supporting further and higher education Digital Preservation: Legal Issues Chinese National Academy of Sciences July04 Neil Beagrie, BL/JISC Partnership.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
WebArchiv Czech Web Archive IIPC 2007, Paris.
Examples of problems with teacher/school site violations: A company’s logo and link on footer of homepage when company is not their business partner—only.
our digital memory accessible tomorrow A future with no history meets a history with no future: perspectives on how much we should know.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Digital Library Collections (DLC) Website A platform for integrated access to CUL/IS specialized, digital collections September 2014 Status Report.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Annick Le Follic Bibliothèque nationale de France Tallinn,
27. August Kyung-Ho Choi Manager of Digital Archiving Division The National Library of Korea Sang-hoon Oh Secretary of General in.
Digital Preservation System ExLibris Rosetta OAI6 | Geneva | June 2009 Dr. Axel Kaschte, Strategy Director Europe.
Preservation – Why the Urgency? “A National Library is a place where a nation nourishes its memory and exerts its imagination – where it connects with.
Caring and Sharing Collaboration in Digital Curation outside North America Ross Harvey Simmons College, Boston Curation Matters: 17 June 2010.
THE ROAD TO OPEN ACCESS A guide to the implementation of the Berlin Declaration Frederick J. Friend OSI Open Access Advocate JISC Consultant Honorary Director.
The Real At Risk E-Content: University Web Resources EDUCAUSE Joanne Kaczmarek University of Illinois at Urbana-Champaign Taylor Surface OCLC October 12,
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
IFAP Special Event: Information and Knowledge for All, Emerging Trends and Challenges Information Preservation 4000 Years of Traditions Challenged by Digital.
Metadata in a distributed information environment: Interoperability as recombinant potential Lorcan Dempsey OCLC/SCURL pre-IFLA conference, 15/16 Aug 02.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
UK LOCKSS Alliance: Investigation into Private LOCKSS Networks Adam Rusbridge EDINA, University of Edinburgh.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Defining Submission Agreements and Policies DigCCurr Professional Institute May 16-21, 2010 & January 5-6, 2011 Chapel Hill, North Carolina, USA Carolyn.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
Digital Preservation Initiatives in the United States A Summary Deanna B. Marcum.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Working with personal digital archives Susan Thomas Project Manager & Digital Archivist project Manuscripts Matter, Electronica panel London, October.
Building A Repository for Digital Objects
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Data Stewardship Interest Group WGISS-45 Meeting
Objectives, activities, and results of the database Lituanistika
The Bentley Digital Media Library
Current use and trends of
Presentation transcript:

Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National Library of New Zealand Te Puna Mātauranga o Aotearoa

Agenda Relationship with National Library of China National Library mandate Collection analysis Preservation of those collections Future plans

Agreement on Cooperation: NLZ and NLNZ -Drawn up in the spirit of friendship -Principles of equity and reciprocity - Purpose is to enhance the ability to contribute to improvement of cultural and economic life of their respective nations - Cooperation on digital library matters Share knowledge and information in relation to Digital Preservation

Mandate “The Minister may, by notice in the Gazette, authorise the National Librarian to make a copy, at any time or times and at his or her discretion, of public documents that are internet documents in accordance with any terms and conditions as to format, public access, or other matters that are specified in the notice.” National Library Act 2003 Te Wehenga, by Cliff Whiting, CAC-readingroomdetail_

Two web archiving programmes Domain Harvesting of the entire “New Zealand Internet” (2008, 2010, 2013) Selective harvesting of specific websites or parts of websites (since 1999)

Collections Selective Harvests (since 1999) No. of Websites14,943 No. of ARC files71,374 Size on disk (Tb)5.63 Total number of files in harvest IE (excluding those within the ARC) c.291,000 No. of Files contained within ARCs87,112,551

Whole of Domain Harvests Collections

Digital preservation Thoughts for today 1.How much information? 2.Whole of domain into preservation system 3.ARC to WARC migration

Digital preservation Simply: 1. What are the technical characteristics of the content? -What format is it in? -What are the properties the file exhibits? -Is it a valid expression of the specification? 2. What is the institution’s capability in terms of rendering the content?

Digital preservation We use that information for: Repository analysis Risk assessment Preservation planning Preservation action Evaluation Rice, Anne Estelle, Artist unknown :[Embroidered Chinese silk shawl belonging to Katherine Mansfield. Made ca Detail of bird in flight]. Artist unknown :[Embroidered Chinese silk shawl belonging to Katherine Mansfield] [made ca 1900]. Ref: D detail-2. Alexander Turnbull Library, Wellington, New Zealand.

Validation stack - current

Thoughts 1.Mimetype is insufficient 2.Not enough information about the website files to support preservation management

Extracted information - mimetypes Total number of mime types across selective harvests= 543

Validation stack – proposed

Thoughts 1.METS file is ‘large’ 2.Take only format and fixity? 3.Take only format summary at ARC level?

NDHA Overview Publishing Delivery Search Tools Eg. Find / Tapuhi / Voyager / Beta PERMANENT REPOSITORY DEPOSIT Management UIBack-Office UI STAGING TA Assessor Arranger Approver Web Curator Tool EnrichmentValidation Stack Rosetta Domain Harvests into preservation system

Whole of Domain legal considerations - Films, Videos, and Publications Classifications Act 1993; - Criminal Procedure Act 2011; - Defamation Act 1992; - Human Rights Act 1993; - Copyright Act 1994; - Privacy Act 1993

Legal -Objectionable/illegal material -Copyright -Is giving access “republishing” Technical/Intellectual -One corpus? -Full text search? -Themed? -What is the data model? Whole of Domain legal considerations

ARC to WARC conversion Why? Take advantage of most up to date standard Normalise all web content Critical considerations - Stakeholder engagement - Development of tools - Proofs of conversion

Conclusions Scale of harvests challenge processes for description, access and preservation Preservation of harvests at NLNZ is at a basic level Want to move to fuller preservation management of the website files