From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.

Slides:



Advertisements
Similar presentations
Creativeindustries.qut.com Contemporary Culture and the Web Dr Axel Bruns Creative Industries Faculty Queensland University of Technology
Advertisements

JISC/BL Workshop Digital Libraries and their services March 6, 2006 Richard Boulderstone Director eStrategy, The British Library.
Ross Thomson Halton District School Board OLA SuperConference 2012.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
14 mai 2007Evolution of Scientific Publications, Colloque de l'Académie des sciences1 Preservation of electronic publications mission Catherine Lupovici.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Managing and Preserving Electronic Records at McGill University: The digitalpermanence initiative.
Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor.
Web 2.0 The Read/Write Web. Marc Prensky Terms Digital Natives Digital Natives Digital Immigrants--maintain a pre-digital accent Digital Immigrants--maintain.
1 Strategies for Collecting and Preserving Open Access Materials on the Web William Y. Arms Cornell University Federal Library and Information Center Committee.
CONTENTdm Important Features and Capabilities. CONTENTdm provides an “out of the box” solution to a complex web programming challenge. With minimal customization,
VCT May 20, 2009 Sapna Blesson Advisor: Dr.Christopher Pollett.
Embedding - for better collaborations - Toshiyuki Takahei RIKEN.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Unit 2, Lesson 5 Website Development Tools AOIT Web Design Copyright © 2008–2012 National Academy Foundation. All rights reserved.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Australian web domain harvests 2005, 2006 & 2007.
Danish Legal Deposit Experiences & the Need for Adjustments by Birgit N. Henriksen Head of Digitization and Web Department The Royal Library, Denmark.
Web site archiving by capturing all unique responses Kent Fitch, Project Computing Pty Ltd Archiving the Web Conference Information Day National Library.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Danish Legal Deposit on the Internet National Diet Library, Tokyo, January 2002 by Birgit N. Henriksen Head of Digitization and Web Department The Royal.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
LIBER Digitisation Conference, Copenhagen The cost of digitisation and preservation: The LIFE Project October 2007 Richard Davies LIFE 2 Project.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Human Rights Archives and Documentation, CHRDR Conference 4- 6 October 2007 Issues in Human Rights Web Archiving Robert Wolven Columbia University Libraries.
Web Archiving at the National Library of Australia National Library of Indonesia Staff 5 October 2010 Paul Koerbin Manager, Web Archiving National Library.
Project Proposal Interface Design Website Coding Website Testing & Launching Website Maintenance.
Adobe FLASH What & Why? Where & When? Is Flash dead? What about HTML5?
Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
HTML Use of Multimedia on web page. HTML Media Q. How to call Image file in our web page ? A. That is the easy syntax for defining an image. 2.
What makes a good interactive resume? Click for detailed information Multimedia Navigation Communication.
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.
Class 02 – 03 Feb 2014 Setup Where do we begin? Know your content Discovering your target user.
Web Archiving: Avery Fisher Center for Music & Media Rhiannon Bettivia, Zack Lischer-Katz, Samantha Losben & Erica Wilson November 29, 2010 Digital Preservation.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
RUBRIC IP1 Ruben Botero Web Design III. The different approaches to accessing data in a database through client-side scripting languages. – On the client.
OAIS: From Requirements to Reality at OCLC FLICC / CENDI Symposium, Dec Pam Kircher Product Manager, Digital Archive OCLC Digital & Preservation.
Introduction to the Semantic Web and Linked Data
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
Website design and structure. A Website is a collection of webpages that are linked together. Webpages contain text, graphics, sound and video clips.
CERN IT Department CH-1211 Genève 23 Switzerland t Web Content Management IT Considerations Tim Smith IT/UDS.
Digital Archives You Can Do It! The Collective - March 2016 Paul Kelly - Digital Archivist - The Catholic University of America.
Preserving the End of a Digital Era Kate Kosturski December 16, 2008.
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
CM143- WEB CM143-WEB Page Layout live sites HTML Images User Considerations Planning Navigation CSS Architecture File Management Cascading Style Sheets.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Archiving & Preserving Digital Content

Ford Foundation International Fellowship Program Records
Joanne Archer University of Maryland Libraries
BTEC NCF Dip in Comp - Unit 15 Website Development Lesson 05 – Website Performance Mr C Johnston.
Unit 2, Lesson 5 Website Development Tools
Jon Dunn, Indiana University Marcel LaFlamme, Rice University
Web page a hypertext document connected to the World Wide Web.
Unit 2, Lesson 5 Website Development Tools
Dignitas Digital Pvt. Ltd.
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
NSLA Digital Collecting Project - Scope
The Bentley Digital Media Library
Presentation transcript:

From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving National Library of Australia

PANDORA web archive Began (collecting) in 1996 Developed from proof-of-concept project Complex born digital online material No control over native formats Best effort QA (hand crafting) Permissions based (no legal deposit) Accessible to the public Small scale (6Tb, 130m files)

Web archiving Web sites (and all they contain) –documents, images, media, style elements, client side scripts Includes sites with embedded (multi)media –lots of formats (mpeg, flv, QT, wmv, rm, Shockwave) –audio/video ~2% archive data? Content is harvested with crawl robots –deposit is harder to deal with Dynamic content becomes static HTML Media files just harvested (hopefully )

Web archiving A browser view – a snapshot Creating the heritage artefacts of the web AV collected in the context of the website –our intent is to retain that context –others suggest decoupling collecting and generic access Collecting is not the full story of archiving Preservation intent and actions Long term access

5 WABAC (‘PANDORA’) Machine

Web archive AV – examples Yothu Yindi (1999) linklink Web of poets (2000) linklink Queensland Election (2001) linklink Federal Election (2007) linklink –YouTube –Independent videos Linsey Pollak (2008) linklink

Observations Trying to capture and retain the AV media in the context of the website/page – but not always possible Balancing the need for timely collecting and legacy systems with implementing and working with standards Faced with the challenge of managing the objectives (and tensions) of collecting, preserving and access Keeping up skills and experience – understanding formats and web publishing technologies Awareness and management of the problems we are filling our archive with

“ Web archivists have a difficult time gathering web video that are, more often than not served with non- standard tools and protocols... it is difficult to design a general solution for dealing with all the Web sites hosting video content... [so]... the harvesting technique should be adapted to each particular case. The crawl engineering effort needed to adapt the tools is generally dependent on the complexity of the Web site.” Pop, Vasile, Masanes (2010) “ Web archivists have a difficult time gathering web video that are, more often than not served with non- standard tools and protocols... it is difficult to design a general solution for dealing with all the Web sites hosting video content... [so]... the harvesting technique should be adapted to each particular case. The crawl engineering effort needed to adapt the tools is generally dependent on the complexity of the Web site.” Pop, Vasile, Masanes (2010)