Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.

Slides:



Advertisements
Similar presentations
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
Advertisements

Update for CDNL Milan 26 August 2009 Caroline Brazier, Chair of ICADS IFLA-CDNL Alliance for Digital Strategies.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
14 mai 2007Evolution of Scientific Publications, Colloque de l'Académie des sciences1 Preservation of electronic publications mission Catherine Lupovici.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
| IFLA2010. Newspaper Section | Newspaper Resources in transition: Digital Preservation and Access - keynote - IFLA International Newspaper.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
PANDORA and Beyond: Managing Web Archiving at the National Library of Australia Digital Preservation Seminar National Library of Australia, 21 November.
Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor.
PANDORA Australia’s Web Archive Library Science Talks SNL/CERN, September 2004 Paul Koerbin Digital Archiving Branch National Library of Australia
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Archiving the Web: the PANDORA archive at the National Library of Australia Preserving the Present for the Future Copenhagen, June 2001 Warwick Cathro,
Developing PANDORA Mark Corbould Director, IT Business Systems.
Debbie Campbell Director Collaborative Services National Library of Australia Electronic Resources Australia Annual Forum Sydney 10 July 2012 Trove’s Application.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Australian web domain harvests 2005, 2006 & 2007.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Web Archiving at the National Library of Australia National Library of Indonesia Staff 5 October 2010 Paul Koerbin Manager, Web Archiving National Library.
Re-imagining the national data store Warwick Cathro Assistant Director-General, Innovation.
Wellcome Library & JISC Web Archiving Project Presented by Michael Day, UKOLN, University of Bath [Author of the Web Archiving feasibility study] Digital.
Planning Digitisation Projects Aly Conteh The British Library 30/11/2012 CERL Annual Seminar.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
The Real At Risk E-Content: University Web Resources EDUCAUSE Joanne Kaczmarek University of Illinois at Urbana-Champaign Taylor Surface OCLC October 12,
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Lifecycle Metadata for Digital Objects September 11, 2002 Major archival and digital library metadata schemes.
Web Archiving: Avery Fisher Center for Music & Media Rhiannon Bettivia, Zack Lischer-Katz, Samantha Losben & Erica Wilson November 29, 2010 Digital Preservation.
Building the Mother of all Collections: the future of the National Library’s discovery services Warwick Cathro Assistant Director-General, Innovation National.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Service updates: People Australia ARROW Discovery Service Picture Australia Basil Dewhurst Manager, Resource Discovery Services
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
LIBRARY SERVICES Moreton Island Lighthouse Rocket Drill, 1894.
Planning for Digitization
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
HSC: All My Own Work What is copyright and what does it protect? How does it relate to me?
1 NetarchiveSuite Workshop Paris November , 2011.
Introduction to Podcasting. Discuss the steps that someone must take to create a podcast, providing details on what software and hardware you recommend.
Finnish web-archive and digital legal deposit copies
Joanne Archer University of Maryland Libraries
Challenges and Opportunities of Archiving the UK Web
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
The Australian Government Web Archive
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Márton Németh – László Drótos How to catalogue a web archive?
Managing the Institutional Repository for OA Khawulile Radebe: Librarian: Repository Administrator & Metadata.
Presentation transcript:

Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving National Library of Australia

1.Background – the what, why and how 2.What makes a valuable resource for archiving? 3.What can you do to help?

What is web archiving about and why do it? Archiving = long-term preservation and access Building collections Building ‘documentary’ historical record Creating artefacts from the web experience Discovering what is produced online An act of consciousness

What’s involved in web archiving? At the NLA it’s: Identifying, selecting, scoping Seeking permission to collect and make accessible Creating and recording metadata –administrative, descriptive, preservation Crawling/harvesting (including scheduling) Processing for quality assurance (best effort) Storing and maintaining the data Planning and implementing preservation strategies Preparing and rendering for public display Providing access and discovery mechanisms

What is the NLA doing? PANDORA Archive 1996→ –PANDORA participants NLA, state libraries (not Tas), NFSA, AWM, AIATSIS (and soon the NGA) –Highly selective, small scale, ‘quality’ collection, open access –PANDAS workflow management system, 2001→ Australian (.au) domain harvests –Annual since 2005 –Internet Archive –No access (yet)

Comparative statistics of NLA web collections PANDORA (selective) Files:73 million Size:3.26 TB Domain Harvest Unique files 185 million596 million516 million1 billion Hosts crawled 811,5231,046,0381,247,6143,038,658 Size 6.69 TB TB34.55 TB. au Domain Harvests Files:2.3 billion Size:78.75 TB

Music in the PANDORA Archive 500+ titles available from the PANDORA public listing of music –NFSA 33% –NLA 30% –Others 37% Musicians, bands, orchestras, composers, organisations, festivals, blogs, instrument makers, magazines … Plus 280 considered but not available –35% (no permission, rejected, yet to be selected)

What makes a valuable resource for archiving? Content –substantial, original Provenance ‘Long-term research value’ Cultural or social significance and interest –including events Curatorial/expert suggestion (e.g. Music Australia) Different collecting approaches based on ‘value’ Priorities, but never say never

How can you help? 10 tips: 1.Think about the issue of long term access – what is your intention? 2.Communicate interest and intentions – with collecting institutions; let us know about your site – respond to requests for permission 3.Organise and structure sites simply – its all about links 4.Comply with standards – limit use of proprietary technology if possible 5.Make it robot friendly – indexing, discovery, capture

How can you help? 10 tips: 6.Keep contributors informed and involved – make sure contributors understand and agree to long-term preservation and access from the beginning 7.Clear copyright, rights and contact information – it helps to know what and who (oh, and trust us too) 8.Maintain content online as much as possible – increases chance of it being collected 9.Learn to love and live with your past – archives are not the same as the ‘live’ web – archived versions cannot be altered 10.Do your own back up, of course

PANDORA Australia’s Web Archive