The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown.

Slides:



Advertisements
Similar presentations
K-12 Web Archiving Project Archive-It Partner Meeting November 4, 2009.
Advertisements

World Digital Library OSI | WEB SERVICES World Digital Library Arab Peninsula Regional Group Meeting Doha, Qatar, December 12-14, 2010 An Introduction.
Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.
European Clearing-House Mechanism Portal Toolkit Expert Group Meeting
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Global Resources Forum October 21, 2010 The Western Waters Digital Library: Building a Resource Through Multi- State Collaboration and Technology
National Digital Information Infrastructure and Preservation Program (NDIIPP) Data-PASS/NDIIPP: A new effort to harvest our history A funder view May 25,
1 Archiving and Preserving the Web Kristine Hanna Internet Archive July 2008.
Spring Depository Library Council March 31, 2008 U.S. Government Printing Office FDsys Update.
Deanna Marcum Associate Librarian Library of Congress California State Universities June 8, 2007 Rethinking Library Organization.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Building a Network of Preservation Partners CNI Spring Task Force Meeting.
CERES AND COLORADO STATE UNIVERSITY LIBRARIES. PROJECT CERES Begun in 2013, Project CERES is a Center for Research Libraries Global Resources Agriculture.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
Access to Individual Harvested Sites in a Web Archive Tracy Meehleib DLF Fall Forum, Providence, RI November 13th, 2008.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Web and Twitter Archiving at the Library of Congress Nicholas Taylor Web Archiving Team Library of Congress Web Archive Globalization.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Ymchwil Research Ymchwil Research RESAW Ioan Isaac-Richards Ingest Processes Manager Head of Web Archiving
Free e-journal publishing services Timothy S. Deliyannides Director, Office of Scholarly Communication and Publishing and Head, Information Technology.
Mid-Michigan Digital Practitioners, March 14, 2014 The National Digital Stewardship Alliance Agenda Mid-Michigan Digital Practitioners Meeting Abigail.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Copyright © 2008, Open Geospatial Consortium, Inc., All Rights Reserved. NDIIPP Partnership Update: North Carolina and Multi-state Demonstration Projects.
The Western Waters Digital Library: Building a Resource Through Multi- State Collaboration and Technology Dawn Paschal Assistant Dean, Digital Library.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.
The World Digital Library Initiative John Van Oudenaren Library of Congress Presentation to the Third SEEDI Conference Cetinje, Montenegro September 13,
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
The National Digital Information Infrastructure and Preservation Program Annual Partners Meeting 2008 Since we met last year… Martha Anderson, Director.
1 Digital Archives - Past, Present & Future Issues Anne Van Camp Manager, Member Initiatives The Research Libraries Group Digital Archives Directions (DADs)
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
The Legal Agreements of the National Geospatial Digital Archive Julie Sweetkind-Singer Stanford University NDIIPP National Conference, Washington, DC June.
Martin Halbert President, MetaArchive Cooperative DigCCurr 2009 Meeting Chapel Hill, NC Friday, April 3, 2009.
North Carolina Geospatial Data Archiving Project : Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Partners: NCSU.
Big Heads July 10, 2009 Next Generation Technical Services Rethinking Library Technical Services for the University of California.
Research Data Services from the ASU Libraries Mary Whelan GIS Data Manager.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
The Web-at-Risk NDIIPP Sponsored Project Partners include: California Digital Library – project lead University of North Texas New York University California.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
The 3 M’s: MINERVA, MODS, and METS Allene Hayes (LC) Rebecca Guenther (LC) Leslie Myrick (NYU) DLF -- New Orleans April 20, 2004.
NCSU Libraries 13 June 2006 JCDL 2006 NDIIPP Preservation Network: Progress, Problems, and Promise Jim Tuttle, Geospatial Data Librarian.
Metadata Extraction and Web Archives: Automating the Record Creation Process Tracy Meehleib Library of Congress, NDMSO NDIIPP June 25, 2009.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
Models for Shared Responsibility: Collaboration and Engagement with the NCGDAP and GeoMAPP Partnerships Steve Morris North Carolina State Libraries Zsolt.
Distributed Digital Preservation Networks Across a Region, Across a State: Stretching LOCKSS Gail McMillan, Virginia Tech Martin Halbert, Emory Aaron Trehub,
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Library of Congress Partnerships for Managing Geospatial Data North Carolina Geographic Information Coordinating Council Raleigh, NC November 7, 2007 William.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
U.S. Department of Agriculture eGovernment Program Smart Choice Pre-Select Phase Transition September 2002.
1 An Overview of Process and Procedures for Health IT Collaboration GSA Office of Citizen Services and Communications Intergovernmental Solutions Division.
DigiBoard Curator Tools Fair IIPC GA 2014 Abbie Grotke ~ Library of Congress
Volunteer Program Overview. What is an Archives? An archives is a place where people can go to gather firsthand: – facts – Data – Evidence.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
Advocacy Activity of the Eastern Partnership Territorial Cooperation Support Programme November 5, 2014 Tbilisi, Georgia Lyubov Palyvoda, Advocacy Expert.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Rebecca L. Mugridge LFO Research Colloquium March 19, 2008.
Grant Writing 2012 Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium.
2008 DOT GOV HARVEST PRESERVING ACCESS UNIVERSITY OF NORTH TEXAS LIBRARIES Cathy N. Hartman Mark E. Phillips FDLC Oct 21, 2008.
National Digital Stewardship Alliance Web Archiving Survey Update
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Preserving Our Collective Digital History
Presentation transcript:

The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown University November 4, 2009

The Library of Congress 2 Agenda LC’s Web archiving program Overview of the Cooperative Project Featured Partner: Georgetown University Lessons Learned

The Library of Congress 3 Library of Congress Web Archives: loc.gov/lcwa

The Library of Congress 4 LC Collections: over 130 TB –US National Elections—2000, 2002, 2004, 2006, 2008 –Iraq War 2003 –September & September 11 Remembrance 2002 –Olympics 2002 –Congress—106 th, 107 th, 108 th, 109 th, 110 th, –Supreme Court Nominations –Legal Blawgs –Papal Transition –Overseas Operations: Indian and Indonesian Elections –Case Studies: health care, terrorism, visual image content, organizational Web sites, Crisis in Darfur, “single site”

The Library of Congress 5 Organizational Structure INFORMATION TECHNOLOGY OFFICE and TECHNICAL ARCHITECTURE TEAM Also in OSI. Supports Wayback and Web Curator Tool development, Repository development and Data Transfers. Contractors are also used in this area. BIBLIOGRAPHIC ACCESS MODS records are created in Library Services: the Network Development & MARC Standards Office & Acquisitions & Bibliographic Access staff do the cataloging. WEB ARCHIVING TEAM In the Office of Strategic Initiatives (OSI). We are project managers and technical staff focused on capture, tools, and permissions. CURATORS/RECOMMENDING OFFICERS In Library Services, Congressional Research Service, and the Law Library pick the collections and what URLs to archive, and research who to contact for permission.

The Library of Congress 6 Collaborations and Partnerships Early collections: Election 00 and 02, September 11 End of Term Project Hurricane Katrina Archive IIPC – upcoming Olympics Collection NDIIPP Partners K-12 Web Archiving Cooperative Archive-IT projects

The Library of Congress 7 Problem Web content that will be important for future research is disappearing before it can be collected Identification of sites, and review of captured sites, is labor-intensive; LC staff are stretched thin Outside institutions may not have resources/budgets for collecting web sites

The Library of Congress 8 Cooperative Archive-IT Project Concept Enlist Library Services subject experts to identify international and national high-value collecting areas, with a focus on foreign countries experiencing volatile political situations Enlist Library Services subject experts to identify scholarly centers, or partner institutions, with recognized expertise in the collecting areas, to assist in the collection and preservation of important at-risk materials Prioritize collecting areas/centers of expertise (7 priority areas selected)

The Library of Congress 9 Goals To enable institutions outside the Library to gain experience creating Web site collections To extend the network of NDIIPP partners working to identify and collect high value, at-risk Web materials To develop subject areas collections that could become part of the Library’s collections in the future, and To broaden the understanding of issues related to the development of curated collections of Web content.

The Library of Congress 10 Library of Congress agreed to: Establish and fund an Archive-It account for the partner for up to one year (with possible extension); Provide support as needed; Provide subject matter expertise as requested by the partner; Invite partner institutions to at least one conference at the Library (if funding is available); Maintain a second copy of the harvested content.

The Library of Congress 11 Each Center Was Asked To: Identify high risk, high value web sites for their area, and use Archive-It to harvest the sites; Document their selection criteria and provide it to the Library; Document issues, lessons learned, etc. related to their web collecting; Participate in a conference with Library experts and other participants (if scheduled).

The Library of Congress 12 Electronic Literature Organization Literary SitesJuly 12, 2008 – (ongoing) 9,214,920 documents GB George Washington University, Institute for European, Russian, and Eurasian Studies Russian Parliamentary Elections, Dec. 07, and the Russian Presidential Election 08 August 13, 2007 – August 12, ,175,664 documents GB Georgetown UniversityBelarus, Moldova, Ukraine September 17, (ongoing) 19,880,435 documents 580 GB University of North Carolina, Chapel Hill Islam in AsiaSeptember 27, 2007 – February 1, ,856,205 documents GB Stanford University Libraries, Islamic Studies Iranian BlogsFebruary 29, (ongoing) 27,997,040 documents 2, GB George Washington University, Center for Global Health Avian bird flu in Asian countries June 3, 2008 – January 6, ,699,986 documents GB

The Library of Congress 13 Featured Partner: Georgetown University Belarus, Moldova, Ukraine Collection Proposed by LC Curator: Grant Harris Aim: the web capture of fragile websites from Belarus, Moldova, and Ukraine, to include selected government websites, opposition parties, ethnic and religious groups, elections, and security issues.

The Library of Congress 14

The Library of Congress 15

The Library of Congress 16

The Library of Congress 17

The Library of Congress 18 Lessons Learned Finding good partners was KEY - partners should be committed and really “get” the concept of web archiving and archiving primary source materials Crawling ALL of Twitter – not so good. Confusion over LC’s own web archiving program vs. this project

The Library of Congress 19 Lessons Learned Collaborative collection building is a good thing –New partnerships formed –New ways for our curators to get engaged with web archiving –LC might not have been able to archive some content collected on our own (permissions, staff time, etc.)

The Library of Congress 20 Next Steps Three partners collecting (at least) for another year: ELO, Georgetown, and Stanford Focus on description and access: George Washington University/Russian Elections Future: Data transfer to LC

The Library of Congress 21 For more information LC Web Archiving: LCWA: National Digital Information and Infrastructure Preservation Program: Georgetown’s Archive-IT collections:

The Library of Congress 22 Questions? Abbie Grotke Grant Harris Jennifer Long