Download presentation
Presentation is loading. Please wait.
Published byGarry Holmes Modified over 9 years ago
1
Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta
2
Official children’s site of the 2000 Sydney Olympics - MIA: http://www.olympics.com/eng/kids/index.html?/eng/kids/home.html
3
GeoCities: 1995-2009 http://www.pcworld.com/article/163765/so_long_ge ocities_we_forgot_you_still_existed.html
4
Mind the Gap - UK “If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics - perhaps exacerbated by the current economic climate that is killing companies - the memory of the nation disappears too. Historians and citizens of the future will find a black hole in the knowledge base of the 21st century.” Quote: http://www.guardian.co.uk/technology/2009/jan/25/int ernet-heritage
5
“New definitions need to be created for determining the scope of digital special collections, so that stakeholders can understand the nature of special collections professionals’ responsibilities. These include a responsibility for harvesting and preserving endangered web sites, wikis and other dynamic information resources.” Digital Special Collections Special Collections in ARL Libraries – March 2009 A Discussion Report from the ARL Working Group on Special Collections
6
Looking ahead… 234 million – The number of websites as of December 2009. 47 million – Added websites in 2009. 126 million – The number of blogs on the Internet (as tracked by BlogPulse). 27.3 million – Number of tweets on per day (November, 2009) 350 million – People on 4 billion – Photos hosted by (October 2009). 12.2 billion – Videos viewed per month on in the US (November 2009). http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/
7
Does the web matter? Only if our cultural, historical, political, economic, and social memories matter. Valuable BUT vulnerable – e.g. foundation losses funding; can only afford digital publishing. Research and analysis – longitudinal view requires a complete picture. SOMEONE needs to take responsibility for it.
8
Web Archiving Web Archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for automated collection. Wikipedia, “Web Archiving”
9
how web archiving works A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner. A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner.
10
ARCHIVE-IT
11
Web Archive Admin Screen
12
HCF Collection
13
Seed Management
14
Reports
15
Reports
16
File Type Report
17
Blocked Content Robots.txt
18
Web Archive Launch Page
23
Exposing Hidden Content
24
U of A Web Archive Partner with Internet Archive on the use of Archive-It Partner with Internet Archive on the use of Archive-It Three targets: (criteria: thematic, regional, event-based, organizational) Three targets: (criteria: thematic, regional, event-based, organizational) 1)Heritage Community Foundation (collection at risk) 2)University of Alberta websites 3) Western Canadian materials (e.g. political websites)
25
A few resources University of Alberta Web Archive: University of Alberta Web Archive: Archive-it! and Wayback Machine Archive-it! and Wayback Machine IIPC – International Internet Preservation Consortium IIPC – International Internet Preservation Consortium Use Cases for Access to Internet Archives, IIPC Access Working Group, Use Cases for Access to Internet Archives, IIPC Access Working Group, Special Collections in ARL Libraries, Report March 2009 Special Collections in ARL Libraries, Report March 2009 GoC Web Archive GoC Web Archive
26
thanks Geoff Harder Digital Initiatives Coordinator geoffrey.harder@ualberta.ca Kenton Good Web Development Librarian kenton.good@ualberta.ca
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.