Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta
Official children’s site of the 2000 Sydney Olympics - MIA:
GeoCities: ocities_we_forgot_you_still_existed.html
Mind the Gap - UK “If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics - perhaps exacerbated by the current economic climate that is killing companies - the memory of the nation disappears too. Historians and citizens of the future will find a black hole in the knowledge base of the 21st century.” Quote: ernet-heritage
“New definitions need to be created for determining the scope of digital special collections, so that stakeholders can understand the nature of special collections professionals’ responsibilities. These include a responsibility for harvesting and preserving endangered web sites, wikis and other dynamic information resources.” Digital Special Collections Special Collections in ARL Libraries – March 2009 A Discussion Report from the ARL Working Group on Special Collections
Looking ahead… 234 million – The number of websites as of December 47 million – Added websites in 126 million – The number of blogs on the Internet (as tracked by BlogPulse). 27.3 million – Number of tweets on per day (November, 2009) 350 million – People on 4 billion – Photos hosted by (October 2009). 12.2 billion – Videos viewed per month on in the US (November 2009).
Does the web matter? Only if our cultural, historical, political, economic, and social memories matter. Valuable BUT vulnerable – e.g. foundation losses funding; can only afford digital publishing. Research and analysis – longitudinal view requires a complete picture. SOMEONE needs to take responsibility for it.
Web Archiving Web Archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for automated collection. Wikipedia, “Web Archiving”
how web archiving works A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner. A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner.
ARCHIVE-IT
Web Archive Admin Screen
HCF Collection
Seed Management
Reports
Reports
File Type Report
Blocked Content Robots.txt
Web Archive Launch Page
Exposing Hidden Content
U of A Web Archive Partner with Internet Archive on the use of Archive-It Partner with Internet Archive on the use of Archive-It Three targets: (criteria: thematic, regional, event-based, organizational) Three targets: (criteria: thematic, regional, event-based, organizational) 1)Heritage Community Foundation (collection at risk) 2)University of Alberta websites 3) Western Canadian materials (e.g. political websites)
A few resources University of Alberta Web Archive: University of Alberta Web Archive: Archive-it! and Wayback Machine Archive-it! and Wayback Machine IIPC – International Internet Preservation Consortium IIPC – International Internet Preservation Consortium Use Cases for Access to Internet Archives, IIPC Access Working Group, Use Cases for Access to Internet Archives, IIPC Access Working Group, Special Collections in ARL Libraries, Report March 2009 Special Collections in ARL Libraries, Report March 2009 GoC Web Archive GoC Web Archive
thanks Geoff Harder Digital Initiatives Coordinator Kenton Good Web Development Librarian