Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.

Slides:



Advertisements
Similar presentations
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
Advertisements

Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.
1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Looking Ahead Archive-It Partner Meeting November 12, 2013.
1 Building a “Virtual Library Collection” through freely-accessible web sites: ‘Select Web Sites database’ at University of Vermont Wichada SuKantarat.
Providing Access to Wisconsin State Government Documents By Abby Swanton, Librarian Dept. of Public Instruction, Reference and Loan Library Minnesota Capitol.
1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.
1 Strategies for Collecting and Preserving Open Access Materials on the Web William Y. Arms Cornell University Federal Library and Information Center Committee.
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Isabel Silver and Laurie Taylor IMLS Library Publishing Services Workshop May 5, 2011 UF Smathers Libraries Publishing Services.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Human Rights Archives and Documentation, CHRDR Conference 4- 6 October 2007 Issues in Human Rights Web Archiving Robert Wolven Columbia University Libraries.
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta.
Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
ERIKA Eesti Ressursid Internetis Kataloogimine ja Arhiveerimine Estonian Resources in Internet, Indexing and Archiving.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Can we be doing more? Beth Tillinghast University of Hawaii at Manoa October 19, 2011 Archive-It Partner Meeting ACCESS TO OUR ARCHIVED WEBSITE COLLECTIONS.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
Introduction to metadata
Selection Strategies for Digital Institutional Repositories Kent Woynowski 30 September 2004.
The Web-at-Risk NDIIPP Sponsored Project Partners include: California Digital Library – project lead University of North Texas New York University California.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
1 Advanced Archive-It Application Training: Crawl Scoping.
The 3 M’s: MINERVA, MODS, and METS Allene Hayes (LC) Rebecca Guenther (LC) Leslie Myrick (NYU) DLF -- New Orleans April 20, 2004.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Current Quality Assurance Practices in Web Archiving Brenda Reyes Ayala, Mark Phillips, and Lauren Ko University of North Texas
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
Search and Access Technologies for Large Scale Web Archives Joseph JaJa, Sangchul Song, and Mike Smorul Institute for Advanced Computer Studies Department.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
1 NetarchiveSuite Workshop Paris November , 2011.
Digitization and the Infinite Archive (History 9808A) 22 September 2014.
1 Advanced Archive-It Application Training: Reviewing Reports and Crawl Scoping.
DigiBoard Curator Tools Fair IIPC GA 2014 Abbie Grotke ~ Library of Congress
A RCHIVAL COLLECTIONS IN A D IGITAL W ORLD Cheryl Walters Nov. 6, 2008.
Preserving the End of a Digital Era Kate Kosturski December 16, 2008.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
2008 DOT GOV HARVEST PRESERVING ACCESS UNIVERSITY OF NORTH TEXAS LIBRARIES Cathy N. Hartman Mark E. Phillips FDLC Oct 21, 2008.
Archiving & Preserving Digital Content
Joanne Archer University of Maryland Libraries
National Digital Stewardship Alliance Web Archiving Survey Update
Creating Web Collections with Archive-It
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Wisconsin County and Municipal Government Collections in Archive-It
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Márton Németh – László Drótos How to catalogue a web archive?
Presentation transcript:

Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web Archives

Session Goals Provide an overview of web archiving and the tasks involved Discuss workflow management and copyright issues Talk about collection strategies and collection development for web archives Analyze the different options for web archiving Discuss some of the commonly encountered technical challenges and problems Examine methods of access and description

What is web archiving? Web Archiving is the capture, management, and preservation of websites and web resources.

Web Archiving Initiatives Prominent Web Archiving Initiatives include: Internet Archive International Internet Preservation Consortium Large National Libraries: –Australia –United Kingdom –United States –Denmark Web at Risk Project

Workflow Management Resource planning Determine crawling approach Identify services and tools to use Collection development and planning Determine permissions approach Monitor along the way Access for researchers

– Legal deposit requirement only applies to “published works” (§ 407)§ 407 –§ 108 of the Copyright Act provides library exceptions but doesn’t address digital preservation and web archiving§ 108 –Varying approaches taken: Crawl permissions Access permissions Notification of crawling Respecting robots.txt (or not!) –Risk and web archiving policies should be determined by each institution - talk to your lawyers! Copyright/Permissions

Collection Strategies Whole Domain used by some national libraries and by the Internet Archive. -- capture everything within a geographic domain such as in the case of Sweden, all sites within the.se domain. Selective Archiving capture certain portions of the web based on predefined criteria or collection policies. Thematic event driven (September 11) or theme driven (human rights) deposit Combination

Collection Development: Topical Distributed Survey Nomination Targeted Domain Nomination forms Delicious social bookmarks Survey/forms Bookmarklet Subject/general Project-specific Collaborative Institutional history Event-specific Data set Finite/Ongoing Active/Inactive Public Organization Academic Subject specialists Curators Collaborators SCOPEFOCUS SELECTION TOOLS

Collection Development: Technical Technical considerations FlashJavascriptDatabases Hidden content Multiple domains Social media CopyrightLanguagesStorage

Collection Development Policies or Similar Documents: –Center for Human Rights Documentation and Research, Human Rights Web Archive –Library of Congress –Tamiment Library Web Archive – University of Michigan Bentley Historical Library –National Library of Ireland general election 2011 web archive Collection Development Policies/Guidelines

Tools: HTTrack

Tools: In-House Program Web Curator Tool

Tools: In-House Program DigiBoard

Tools: Subscriptions, Web Archiving Service

Tools: Subscriptions, Archive-It

How does web archiving work? Curator Selects Websites (Seeds) to Archive Curator Specifies Scope (how much of the websites are archived) Archived content is processed and stored (.warc format) Crawler visits seed sites and archives the Urls that are discovered (following the scoping rules) Seeds and scoping are sent to the Crawler (usually Heritrix) Access tools (Wayback) allow archived content to be viewed and browse

Quality Review Quality Review is different for everyone. Why? The tool(s) being used for harvesting and access Your institution’s goals, needs, and preferences How much time you have Review Reports Were there any blocked content or unreachable sites? Did you get more content than expected? Less? Review Archived Web Pages Some issues can only be found with the human eye (for now!) Was look-and-feel properly captured? Make Desired Changes Scoping, Seeds, Crawl Settings, etc. Crawl Again

Some web technologies can be tricky (though not impossible!) to capture or to view in the archived version: Database driven sites Javascript (only sometimes) Flash (only sometimes) Certain video formats Websites change – what archived perfectly yesterday, might not after today’s redesign Common Problems – “The Web is a Mess”

Access Options: Subscription Service Access Page (i.e. Archive-It website) Website of Your Organization or Project (i.e. Human Rights Web Portal, LOC’s Web Archives site) OPAC (i.e. Columbia’s CLIO) OCLC’s WorldCat Examples of Description: Columbia University Dublin Core MARC Internet Resource Cataloging Request (IRCR) Library of Congress Creates MODS records for each “site” Collection level records in MARC (for the OPAC) Archive-It Dublin Core Coming soon: Automated transformation to MARC, MODS, and more. Access and Description

Archive-It Partner Page

Library of Congress Web Archives Page

Library of Virginia

CLIO Record (public view)

Worldcat Link back to the Archive-It collection

Staff needed include: Project Management Selectors/Curators Technical staff for Seed URL preparation (scoping), Quality Review, analysis of reports, etc. Catalogers Training for Staff: Use of Tools Selection - and how what can and cannot archive affects that Permissions Quality Review Helpful skills: comfortable with web (not all are, in our experience!), flexibility, good sense of humor Staffing

Is there web content within your collection scope? –Your organization’s website(s) –Print material that has migrated to web publication –Subject related websites –Websites related to manuscript or archival collections –State or local government websites Research and talk to similar organizations Talk to subscription services about trial accounts Try out some of the lower barrier tools (i.e. HTTrack) Get involved with collaborative web archiving efforts Just do it! Jump in! Taking the First Steps…

The National Digital Stewardship Alliance (NDSA) Content Working Group [ ] is sponsoring this survey of organizations in the United States who are actively involved in or planning to archive content from the web. The survey will close October 31, NDSA Web Archiving Survey

Questions? Comments? Suggestions? Joanne Archer Tessa Fallon Abbie Grotke Kate Odell