How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER
Context: National Library of Ireland Beginnings: Established by the Dublin Science and Museum Act, 1877 Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”. The Digital Record: Born Digital Programme established in 2010, covering web archiving. Web Archive Projects: 2 pilot projects in 2011 LIBER
Context: Internet Memory European Archive / Internet Memory Foundation Established in 2004 in Amsterdam (offices also in Paris) Mission: to preserve Web content as a new media for current and future generations Actions: Sensibilization, partnerships, R&D Open Access Collections: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland Internet Memory Research Spin-off of IM established in June 2011 in Paris Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER
Web Archiving Project: Project Origins National Library of Ireland Building a 21 st Century Library: –Born Digital –Digitisation –Single Integrated Catalogue –Digital Repository –OSCAIL, the Digital Library Programme LIBER
Web Archiving Project: Project Origins National Library of Ireland Born Digital Materials: Natural progression for NLI’s strong political, cultural and historical collections How best to approach this in time of unprecedented financial difficulty? Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER
Web Archiving Project: Project Origins National Library of Ireland The Hand of History: –Snap General Election –Five Weeks LIBER
Web Archiving Project: Project Origins National Library of Ireland Just do it LIBER
Web Archiving Project: Project Origins National Library of Ireland Just do it How? LIBER
Web Archiving Project: Project Origins National Library of Ireland Collaborative Partnership: Partner that suited our requirements and that had experience with others in the cultural sector Requirements: –Technical skills in the NLI but working on other projects – needed these skills –Leverage NLI’s on strong curatorial experience, esp. in politics –Fast! LIBER
Web Archiving Project: Project Origins National Library of Ireland Project phases: –Project scoping and contract –Site selection –Permissions gathering –QA (look and feel) –Publication and promotion LIBER
Site Selection and Permissions National Library of Ireland Selection Criteria: –Website presence –Technical reasons –Cut-off date –Women candidates Permissions: –All sites contacted and provided with a brief –Pressurised but necessary phase LIBER
Scope of projects National Library of Ireland General Election: –Crawl: 200 snapshots –Scope: 100 seeds –Frequency: 2 times –Date: Feb Presidential Election: –Crawl: 80 snapshots –Scope: 70 seeds –Frequency: 3 times –Date: Oct-Nov LIBER
Crawl Internet Memory Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness Specific incidents technical changes on the fly Modification of scope ; Pending crawls ; Adaptation of the politeness Improvement of second crawl LIBER
Quality Assurance (QA) National Library of Ireland Manual QA Jira software IM – Technical QA NLI - ‘Look and Feel’ QA Multiple browsers Communication with site owners (building relationships and promotion) LIBER
Quality Assurance (QA) Internet Memory Why? How? Manual and visual method: homepage + 2 Resolution of issues Temporal Coherence LIBER
Access National Library of Ireland Available to the public Full text search IM website – search by keyword, URL NLI catalogue – keyword via widget developed by NLI IS team and IM Future – access through NLI’s own interfaces, issue of integrating results LIBER
Publication and Promotion National Library of Ireland NLI social media initiative (Twitter and blog) Project participants Print media (esp. in area of technology) And IM! Usage figures have increased but real value more apparent in 5-10 years LIBER
Usage Statistics of Web Archive National Library of Ireland 21/09/2011: Official launch of NLI Web archives (Tweets) 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie 25/11/2011: Paper on irishtimes.com 20/01/2012: Paper on irishtimes.com 17/03/2012: Post on soundofthearchives.wordpress.com 04/05/2012: Paper on irisheconomy.ie LIBER
Advantages of Web Archiving National Library of Ireland Web archiving: –New opportunities for delivery of materials to users –Work with existing users expectations that content be online –Reach new audiences LIBER
Advantages of Web Archiving National Library of Ireland Political web archives;Irish General Election: –Researchers can compare online content pre- and post-election –Facilitates research into how ‘online’ this election was –Assess impact of technological developments in campaign communications –Record of campaign information LIBER
Benefits of Working Together National Library of Ireland Pilot project for a long-term activity: –Allowed us to enter a new collecting area despite lack of tech expertise –Facilitated collection of important material that one else was collecting –Collect material quickly –Leverage curatorial skills –Gained new technical skills LIBER
Benefits of Working Together Internet Memory To supporte the development of Web archiving initiatives To operate rapid deployment of Web archives To address new challenges in this area: Social media content QA Automatization LIBER
Conclusion General Election: 18,495,771 URLs 1.14 TB 10,405 ARCs Presidential Election: 7,333,399 URLs GB 2,513 ARCs View the NLI collections at: collections.aspx View the Web archive blog entry at: /26/general-election-2011-web- archiving/ View Internet Memory Collections at: To be continued… LIBER
LIBER Questions? Thanks for your attention! Chloe Martin Internet Memoryhttp://internetmem ory.orghttp://internetmem Catherine Ryan National Library of Ireland