Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.

Slides:



Advertisements
Similar presentations
Ubiquity of Grey Literature in a Connected Content Context Julia Gelfand University of California, Irvine Paper presented at GL5 Conference.
Advertisements

The Messy World of Grey Literature in Cyber Security 8 th Grey Literature Conference 4-5 December 2006 New Orleans, Louisiana Patricia Erwin – I3P Senior.
OCLC Online Computer Library Center Steering Around the Iceberg: Economic Sustainability for Digital Collections Brian Lavoie Research Scientist OCLC Economics.
Data management, data sharing and the activities of the UKDA Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
Can We Talk? MICHAEL Conference London May 23, 2008Joyce Ray.
Moving Forward With Digital Preservation at the Library of Congress Laura Campbell Associate Librarian for Strategic Initiatives Library of Congress.
Big Data Forum April 18, 2013 Beth Oehlerts Digital Management Librarian Nancy Hunter Coordinator of Acquisitions and Metadata Services.
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Data-PASS/NDIIPP: A new effort to harvest our history A funder view May 25,
Digital Preservation and the Open Web: A Curatorial Perspective Terence K. Huwe Institute of Industrial Relations University of California, Berkeley Computers.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
2009 Ex Libris Mid-Atlantic User Group (EMA) Meeting November 5, 2009 Elizabeth Brown Scholarly Communications Officer Binghamton University Libraries.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Collection and Assets Management: One University Library's Journey to the Future Dr. Sheeja N.K. Dr. Susan Mathew K Smt. Sreerekha S.Pillai CUSAT Sri.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
Supporting further and higher education Digital Preservation: Legal Issues Chinese National Academy of Sciences July04 Neil Beagrie, BL/JISC Partnership.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Building a Network of Preservation Partners CNI Spring Task Force Meeting.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Research Libraries As Knowledge Producers: A Shifting Context for Policy and Funding ARL Membership Meeting October 18, 2006 Yvonna S. Lincoln Principal.
Impact of Cyberinfrastructure on Large Research Libraries Grace Baysinger Stanford University 2006 ACS National Fall Meeting.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Supporting the local research data environment via cross-campus collaboration and leveraging of national expertise Hannah F. Norton, Rolando Garcia Milian,
Web Science and Web Archive L3S Wolfgang Nejdl L3S Research Center Hannover, Germany.
Proposition: Digital Collections Are Easier to Find and Use through DLF Aquifer’s American Social History Online Katherine Kott, Aquifer Director Library.
The web has revolutionized our access to information. Documents and publications that were once difficult to fin are now readily available to anyone. Government.
13 September 2012 The Libraries’ Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering,
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Science Librarians in the 21 st Century Michael Leach Harvard University
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Was.cdlib.org California Digital Library University of California Rosalie Lack
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
IFAP Special Event: Information and Knowledge for All, Emerging Trends and Challenges Information Preservation 4000 Years of Traditions Challenged by Digital.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
BUILDING ON COMMON GROUND: EXPLORING THE INTERSECTION OF ARCHIVES AND DATA CURATION Lizzy Rolando & Wendy Hagenmaier 6/3/2015IASSIST 2015.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
CyberCemetery Preserving At-Risk Government Web Content.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
Research Data Services from the ASU Libraries Mary Whelan GIS Data Manager.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
The Web-at-Risk NDIIPP Sponsored Project Partners include: California Digital Library – project lead University of North Texas New York University California.
Web Archiving Service Public Access Release Date: July
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
The Web Archiving Service Spring 2009 Update User’s Council Annual Meeting Tracy Seneca California Digital Library Capture Today’s Web;
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
From Access to Archive Transforming Scholars Portal into an E-Journal Archive.
Library of Congress Partnerships for Managing Geospatial Data North Carolina Geographic Information Coordinating Council Raleigh, NC November 7, 2007 William.
Challenges in Web Archiving UNT Perspective NDIIPP – July 21, 2010.
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Digital Stewardship Lee Dotson Digital Initiatives Librarian University of Central Florida John C. Hitt Library Presentation available at
Preserving the End of a Digital Era Kate Kosturski December 16, 2008.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Archiving & Preserving Digital Content

Impact of the Alternative e-Publishing Model: From Open Access Resources & Self-Publishing toward Librarian’s New Challenges 溫達茂 飛資得資訊 中華民國九十三年十一月.
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Copyright Policy & Education Officer
Wisconsin County and Municipal Government Collections in Archive-It
Presentation transcript:

Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library

Ground to cover: Brief background: Web Archiving Service What changes about collecting 2 Case studies in collaboration Across institutions Between faculty, librarians Across disparate archiving systems Between libraries, content owners

Web Archiving Service Developed by the University of California Curation Center of the California Digital Library – Formerly the Digital Preservation Group Outcome of the Web-at-Risk grant – 1 st round of NDIIPP grant work UC campus libraries, NYU, Stanford, University of North Texas

The Web Archiving Service

What Changes About Collecting? 1.The target of collection becomes debatable: – An archive? – A site? – A document?

What Changes About Collecting? 2.Theres a lot we dont know about what were collecting – How big is it? How much storage will it use? – Whats in it? – What are the new publications on the site? – Is the site linking to valuable, relevant information?

Mining Sites for Documents

What Changes About Collecting? 3.Its not always clear what to collect – A national library may have a clear mandate to capture nations web domain – The Institute of Transportation Studies may have an immediately obvious scope of content to collect – What does a large research library collect?

What Changes About Collecting? 4.We dont know how scholars will use this information Object of study could be: – Content of the documents – Site change – Acts of citizen journalism – Blog spam, viruses

There is an ongoing need for case studies that can illustrate possible approaches to early interventions with digital records creators, institutional collaborations, and partnerships with information technology specialists.

Case 1: 2003 Recall

2003 California Recall Archiveb 200+ sites selected by UC Librarians, Stanford Sites crawled by Stanford Computer Science Dept. as part of WebBase project Content captured in entirely different format from the WARC archival format used by WAS & Archive-It Content migrated to WARC format, transferred to CDL in 2008 – public access via WAS in July 2009

Case 1 Collaborative content selection across campuses, institutions Data stewardship across institutions Migration of data across formats, archive data models Collaboration between Social Science faculty, Computer Science grad students A dark archive goes light!

Case 2: California Government

Collaborative Collection State of California Government Information librarians across UC campuses manage the archive. 300 sites derived from California State Agency Directory Source for shared cataloging of key California State Documents Twice yearly captures of all agency sites; more frequent captures of approximately 30 priority sites

Collaborative Collection: Local California Seven archives of local California agencies maintained by separate UC campuses Testing cross-archival search tools to combine all state, local search results 518 sites preserved in local archives Challenge: varying resources, priorities at UC campuses, some geographic areas missed Cornell Web Lab study identifies.ca.gov as the third largest U.S. government subdomain.

Need for Collaboration with Content Owners

Robots.txt Patterns in California State Agency Sites Restricted: California State Library California State Controller Office of State Publishing Secretary of State Not Restricted: Office of Information Security and Privacy Protection Office of Systems Integration Legislative Analyst's Office

Consistent design Strong patterns to restrictions User-agent: * Disallow: /images Disallow: /classes Disallow: /cgi-bin Disallow: /htdig Disallow: /js Disallow: /styles Disallow: /ssi Disallow: /css Disallow: /javascript 28 Sites read exactly:

Is Robots.txt Really a Copyright Management Tool? The conversation used to be between the library and the publisher. Now, it is between the library and a webmaster. - Gildas Illien, Bibliothèque nationale de France

A Potentially Fruitful Conversation? The National Archives comprehensively archive UK Central Government sites Continuity and Preservation: The National Archives approach to maintaining permanent access to the web presence of UK Central Government - Amanda Spencer and Alison Heatherington

Case 2 Collaboration across campuses in selection, resource allocation Shared collection of material relevant to all campuses Communication underway with state agencies, webmasters Potential to provide service directly to agency site Potential to begin linking archives together

Parting thoughts MANY more examples of collaborative work! – End of Term Harvest – International Internet Preservation Consortium – international Olympics archive – Zepheira: data visualization portal to NDIIPP content …longer-term preservation costs for these kinds of materials are not well understood. In the digital world, it is all too easy to acquire materials that a library cannot afford to keep in perpetuity.