Presentation is loading. Please wait.

Presentation is loading. Please wait.

Archiving & Preserving Digital Content

Similar presentations


Presentation on theme: "Archiving & Preserving Digital Content"— Presentation transcript:

1 Archiving & Preserving Digital Content
Archive-It: Archiving & Preserving Digital Content

2 Internet Archive We are a Digital Library
Founded in 1996 by Brewster Kahle Located in San Francisco California Zhè shì wǒmen de bàngōngshì This is our office Tā hěn piàoliang It is very beautiful Zhè shì yīgè gǔlǎo de jiàotáng. /Jīntiān shì wǒmen de túshūguǎn It was a church. Now it is our library Tā shì shí bā wàn píngfāng mǐ It is 1800 sq meters

3 361 Billion pages saved www.archive.org
Largest publicly available web archive in existence Accessible starting in 2001 400 Billion+ URLs 80+ million websites Content in 40+ Languages Collect a snapshot of the web every days We started the wayback machine project in 1996 with a big goal. Archive the internet and preserve if forever. This year we looked at some smaller goals – archiving a single page on request, making pages available more quickly, and letting you get information out of the wayback in an automated way. I’d like to thank the engineers who built the Wayback, and who crawl the content, and our partners who have donated crawls and guided specialized crawls. We’ve spent 17 years building this amazing collection. Let’s use it to make the web a better place. Thank you. 361 Billion pages saved

4 Web Archiving Service: Archive-It
Archive-It is a subscription service launched in February 2006 Web based application that allows users to create, manage, access and store collections of digital content The service is a fully hosted solution, and includes access and storage. Provides tools for selection and scoping including cataloging with metadata Ability to capture content using 10 different time frequencies Archived content includes: html, text, videos, audio, social media, PDF, images, online newspapers Can browse archived content 24 hours after a capture is complete; and full text search is available within 7 days Restricted access options are available

5 Archive-It Partners

6 What is Web Archiving? Web archiving is the process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re use. A web archive is a collection of archived URLs grouped by theme, event, subject area, or web address.

7 Challenge: a lot of data
Amount of content that is being archived Amount of data being created by content providers

8 Challenge: What to archive?
How do you decide what collections to build, and what websites to archive? You will discuss this in your classes throughout the year, but just to get you started, think about what information will help people in the future to learn about what people your age were interested in and what was important to you in 2012. …What is important to you? What do you want people to know about? What are your organization’s collecting activities? Vision?

9 Archive-It Use Cases Create a thematic/topical web archive on a specific subject or event. Different perspectives and social commentary (tweets, blogs, comments). Can include Spontaneous Events Often related to traditional collecting activity around the same focus Mandate to capture/preserve institutional memory and history. Construct an historical record of an institution’s web presence over time. Support an electronic records system to meet records retention requirements. Capture publications that aren’t being deposited in print form. Closure crawls

10 Access to Public Collections
Partners: Can view through private web application with login/password General Public: Can view from Archive-It website: Landing Pages: view from organization’s website with a branded page that links back to Archive-It hosted data Integration with existing systems and catalogs

11 Storage & Preservation
Multiple ways to Store and Preserve Storage: 2 copies of the archived data (primary and back-up) are stored at San Francisco Data Center Collections transferred to the General Archive as a third copy A copy of archived data can be shipped on a hard drive Ability to download files from Internet Archive servers Digital Preservation: 2008: LOCKSS 2013: Duracloud

12 Web Archiving Life Cycle Model

13 Questions & Answers Lori Donovan Thank you!


Download ppt "Archiving & Preserving Digital Content"

Similar presentations


Ads by Google