The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.

Slides:



Advertisements
Similar presentations
Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
A “bundle of rights” controlled by the owner Distribute the work Reproduce the work Display the work Perform the work Create derivative works.
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Latin American and Human Rights Web Archiving as part of Research Library Special Collections Kent Norsworthy LLILAS Benson Digital Curation Coordinator,
Ontario University Library Consortia Activity Ontario University Library Consortia Activity Gwendolyn Ebbett Dean of the Library University of Windsor.
Serving up Statistics to an International Community IASSIST Conference Brian Buffett May 2003.
Embracing Digital Collections: Embracing Digital Collections: Access Issues and Practices for Academic Libraries Oregon Library Association Salem, OR April.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Araba Dawson-Andoh 122 A Alden Library
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
DISCUS - South Carolina’s Virtual Library
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Marty Harris aka TEXT QUERY SYSTEM Marty Harris Mgr TRD.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Oklahoma City, Oklahoma
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
1 Information Literacy Program Module 6 Emalus Campus.
The web has revolutionized our access to information. Documents and publications that were once difficult to fin are now readily available to anyone. Government.
The New Digital World and the Transformation of Information and Libraries Patricia L. Thibodeau Associate Dean Library Services & Archives Oct. 26, 2011.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Discovering Resources at Friedsam Memorial Library.
Digital Special Collections Users Council Annual Meeting May 9, 2008.
Can we be doing more? Beth Tillinghast University of Hawaii at Manoa October 19, 2011 Archive-It Partner Meeting ACCESS TO OUR ARCHIVED WEBSITE COLLECTIONS.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
Searching for Information and Library Databases. Knowing… When When Where Where How to find information isn’t easy How to find information isn’t easy.
DISCUS South Carolina’s Virtual Library A program overview.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
The State of Collaborative Digitization: issues and approaches Tom Clareson, PALINET October 22, 2007.
I NVESTIGATING Ashley Butler, Rebecka Embry, Jo Lammert INF385S Digital Libraries February 17, 2011.
November 13~14, 2002 The Revolutionary Searching & Technology for Electronic Databases Vincent Cheah Gale Solutions Consultant Thomson Learning Asia.
DISCUS South Carolina’s Virtual Library A program overview.
Researching the African Diaspora and Creolité on the Internet Karen Hartman Information Resource Officer U.S. Embassy, Nairobi, Kenya February 5, 2008.
The Web-at-Risk NDIIPP Sponsored Project Partners include: California Digital Library – project lead University of North Texas New York University California.
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
A Training Program for Shareable Metadata Metadata for You & Me is a collaboration between the University of Illinois Library and Indiana University. This.
The University of Texas at Austin If We Build it, Will They Come? Providing Enhanced Access to an Archive-It Collection LAGDA - Latin American Government.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
The Boston TV News Digital Library: Partners WGBH Media Library and Archives (WGBH) Northeast Historic Film (NHF) Boston Public Library (BPL)
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
15th North Carolina Serials Conference - March 31, Accessing Yesterday’s Information for Tomorrow’s Research: The Growth of Electronic Backfiles.
Research and Scholarly Communication in the Humanities New Partnerships Between Librarians and Scholars Presented to the Humanities Research Institute.
Databases vs the Internet. QUESTION: What is the main difference between using library databases and search engines? ANSWER: Databases are NOT the Internet.
Copyright © The Polis Center GIS for Historians The North American Religion Atlas and Indiana Online Bloomington, Indiana April 16, 2002 Karen Frederickson.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
Redefining the Library’s Role through an Institutional Repository Sharon Mader, Dean Jeanne Pavy, Scholarly Communications Librarian Earl K. Long Library.
The world’s libraries. Connected. The Benefits of CONTENTdm Hosting Services OCLC’s Digital Lifecycle Webinar Series April 9, 2013.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Archiving & Preserving Digital Content
Latin American Government Documents Archive, LAGDA
Christopher C. Brown Reference Librarian
Metadata to fit your needs... How much is too much?
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Presentation transcript:

The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive

We are a Digital Library Mission Statement: Universal access to all knowledge o Founded by Brewster Kahle in San Francisco, California in 1996 o Largest publicly available web archive in existence o Officially designated a Library by the State of California in 2007 About Internet Archive

What is Web Archiving? The goal of web archiving is to document changes to web resources over time, archive them and make them accessible.

What is a Web Archive? A web archive is a collection of archived Urls grouped by theme, event, subject area, or web address. A web archive contains as much as possible from the original resources. It is a priority to recreate the same experience a user would have had if they had visited the live site.

Why Web Archiving? o Billions of people around the world have grown accustomed to using the web as their primary resource to acquire information. o The web is a crucial part our culture and our social fabric, and we don’t want to lose any of it, so it is essential that we collect and preserve these digital resources and make them accessible in creative ways. o The availability of this digital information is taken for granted and it is a fallacy that if something is on the web it will be there forever.

Limited lifespan of a webpage It is a a fairly common misconception that content that exists on the web will remain there forever. A report in Scientific American claims 44 days. A subsequent academic study in IEEE suggests 75 days. A Washington Post article indicates the number is 100 days. Over 95% of government information today is born- digital. But less than 50% is being maintained with an active preservation plan. State of the Federal Web Report

Historically important events for researchers and scholars Much of the record of any historic event in today’s world is “born digital.” And many items born in print are also available in digital form, or soon will be. To understand major world events—not only disasters but political upheavals—and to keep a record and a memory of them for survivors, for scholars, for policy-makers, and for a wider public, it is simply essential that we collect and preserve these digital resources and make them accessible in creative ways. Andrew Gordon, Harvard University.

It’s a requirement. o Records Retention policy. Several state and federal laws or policies require universities to maintain various statistics and reports. o Responsibility: preserve things like course information, course roster information and policies — documents now showing up only as digital content

The Role of Libraries o Libraries and archives have long collected information that serve scholars and the general public in understanding history, culture, and society. o So much of today's information is easily (and only) found on the world wide web -- web pages have replaced hard copy records and documents, blogs are today's diaries, and newspapers and socio-political commentary exist solely online. o As part of an effort to appropriately document and capture today's information for tomorrow's use, institutions must adopt a web archiving strategy. o However, for many institutions, the prospect of capturing and storing web pages, websites, or entire web domains is a daunting prospect

First deployed in February 2006 Web based application allowing users to create, manage and preserve collections of digital content Includes tools for selection and scoping, harvesting, cataloging with metadata, full text search, and QA Ability to capture content using 10 different crawl frequencies Archived content includes: html, videos, audio, PDF, images, social networking sites, online newspapers View archived content within 24 hours after a capture is complete Annual subscription service, includes hosting, access and storage (primary and back-up) About Archive-It

205 partners around the world in 43 U.S. States and 15 countries Who Uses Archive-It?

How Partners Use Archive-It

o Essential part of a mandate to capture and preserve institutional memory and history. Construct an historical record of an institution’s web presence over time. o Capture state/ local agency publications that aren’t being deposited in print form. Collect and aggregate state/ local government websites and presence. o Capture websites that relate to historical/traditional collections and link them with existing collections around the same thematic focus. o Create a thematic/topical web archive on a specific subject or event, including different perspectives and social commentary (tweets, blogs, comments). Gather thematically-related resources of value to researchers and scholars o Support an electronic records system to meet record retentions requirements. o Closure crawls Archive-It Use Cases

Stanford University/New York University Islamic & Middle Eastern Collection Purpose: harvest and preserve Iranian Blogs o Archiving 300+ blogs written by and for Iran and the Iranian people o Includes coverage of 2009 Iranian elections and the current Middle East unrest

Stanford University/New York University Islamic & Middle Eastern Collection

University of Texas at Austin: LAGDA Purpose: Archive documents from 18 different countries, 300 government ministries/presidencies. Content includes: o Full-text versions of official documents o Original video and audio recordings of key regional leaders o Thousands of annual and "state of the nation" reports o Specific collections for Latin American elections and political parties

University of Texas at Austin: LANIC Honduras Presidential site 2008 (before the Coup)

University of Texas at Austin: LANIC Honduras Presidential site 2009 (during the Coup)

University of Texas at Austin: LANIC Honduras Presidential site (after the Coup)

Purpose: archive born digital literature – works created explicitly for the computer. o ELO seeks to foster and promote the reading, writing, teaching, and understanding of literature as it develops in a digital environment o Content includes: individual works, collections and journals, poems and stories Electronic Literature Organization

Indiana University Purpose: archive all university records to maintain strong electronic records systems o Main university website, 8 different campus websites and other organizations on campus university culture, teacher blogs, student groups, and online publications

Indiana University Main University website

Columbia University Purposes: Archive copies of its university web presence in order to meet required mandates Archive websites on thematic/topical subjects.

Columbia University Human Rights Collection

Columbia University Avery Architectural & Fine Arts Library

Columbia University Archives Collection

North Carolina State Archives & State Library of North Carolina Purpose: archive state agency websites and publications o Includes pages in a variety of formats: text, images, audio, video and social networking sites

North Carolina State Archives & State Library of North Carolina

Access to Collections Partners: o Can view through private web application with login/password General Public: o Can view from Archive-It website: o Can view from organization’s website from a landing page that links back to Archive-It hosted data o Host from organization’s own servers -Restricted and private access options are available

What’s next for Archive-It Collaboration and Partnerships Web application development o Continue to develop features and functionalities requested by partners o Enhance our preservation policy/access model o Integrate our data with partner’s external services, systems and catalogs

Thank you! Lori Donovan Partner Specialist Questions?