The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management.

Slides:



Advertisements
Similar presentations
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
Advertisements

A survey of Web preservation initiatives Michael Day UKOLN, University of Bath 7 th European Conference on Research and Advanced Technology.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Título de la presentación NetarchiveSuite at the BNE Juan Carlos García Arratia – Chief of IT Development Service, NLS Mar Pérez Morillo – Chief of Web.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Lithuanian Documentary Cinema on the Internet (e-Cinema) Valerija Jusevičiūtė Deputy Director, Lithuanian Central State Archive.
PANDORA and Beyond: Managing Web Archiving at the National Library of Australia Digital Preservation Seminar National Library of Australia, 21 November.
Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor.
PANDORA Australia’s Web Archive Library Science Talks SNL/CERN, September 2004 Paul Koerbin Digital Archiving Branch National Library of Australia
1 Co-developing access to the UK Web Archive Helen Hockx-Yu Head of Web Archiving, British Library.
Archiving the Web: the PANDORA archive at the National Library of Australia Preserving the Present for the Future Copenhagen, June 2001 Warwick Cathro,
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Debbie Campbell Director Collaborative Services National Library of Australia Electronic Resources Australia Annual Forum Sydney 10 July 2012 Trove’s Application.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Web Archiving at the Innsbruck Newspaper Archive Innsbrucker Zeitungsarchiv / IZA Presentation by Renate Giacomuzzi, Elisabeth Sporer, Armin Schleicher.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Australian web domain harvests 2005, 2006 & 2007.
WebArchiv Czech Web Archive IIPC 2007, Paris.
Metadata: Integral Part of Statistics Canada Quality Framework International Conference on Agriculture Statistics October 22-24, 2007 Marcelle Dion Director.
1 News and media websites harvesting. 2 A daily crawl since December 2010 The selective crawl contains 92 websites National daily newspapers (
1 Meeting on the Management of Statistical Information Systems (MSIS 2010) (Daejeon, Republic of Korea, April 2010) NIS ICT Strategy in the Production.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Web Archiving at the National Library of Australia National Library of Indonesia Staff 5 October 2010 Paul Koerbin Manager, Web Archiving National Library.
WHS joined Archive-It in the fall of 2010 Began capturing state information with the capture of Governor Jim Doyle’s websites at the end of the administration.
Re-imagining the national data store Warwick Cathro Assistant Director-General, Innovation.
The Development of National Archives of Malaysia (NAM) as National Research Centre & SARBICA’s Roles Presented by : Ahmad Sukri Abdul Kadir National Archives.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
February 17, 1999Open Forum on Metadata Registries 1 Census Corporate Statistical Metadata Registry By Martin V. Appel Daniel W. Gillman Samuel N. Highsmith,
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.
The Development of the Ceramics and Glass website Mia Ridge Museum Systems Team Museum of London.
Building the Mother of all Collections: the future of the National Library’s discovery services Warwick Cathro Assistant Director-General, Innovation National.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Revision Project of the Business Register (BR) and Business Statistics in September 2013 Tuula Viitaharju.
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) 2.3 Digital Preservation Activities 36 th Consultative Meeting.
Rose Holley: Trove Manager National Library of Australia Royal Australian Historical Society Conference, Richmond, NSW October.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Content Management, Not Content Micromanagement Colin McFadden.
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
The Web Archiving Service Spring 2009 Update User’s Council Annual Meeting Tracy Seneca California Digital Library Capture Today’s Web;
1 NetarchiveSuite Workshop Paris November , 2011.
NESC Worshop – 07 September 2005 Development of a Marine Metadata Standard Greg Reed Executive Officer Australian Ocean Data Centre Joint Facility.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
Archives, Libraries, Museums: Possibilities of Co-operation within the Enwirinment of the Global Information Infrastructure - Croatian experience Vlatka.
POWERED BY TROVE How you can build, and build on the success of our national discovery service Marie-Louise Ayres 1 ADG, Resource Sharing.
Building Capacities for Establishment of Social Science Digital Data Archives Aleksandra Bradić-Martinović, Institute of Economic Sciences, Belgrade Achievements.
Rose Holley: Trove Manager Resource Sharing and Innovation National Library of Australia ALIA Conference, Brisbane 1-3 September 2010 Trove: More than.
The OECD-UNSD Trade System – A Progress Report OECD Trade Experts Meeting – September 2007.
Library and IT Services. Marc van den Berg Some UvT facts.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Richard Marciano Professor, University of Maryland iSchool Affiliate Professor, Computer Science Director, Digital Curation Innovation Center (DCIC) University.
Copenhagen 11 March 2015 Dias 1 Theme 2a: Media Tools — NetLab, a Research Infrastructure for Internet Studies Niels Brügger, Aarhus University Advisory.
Open Research Data and Open Access publications: How do they sit in the Web of Science? Guillaume Rivalle, Manager, Europe solution specialists
SEEDS Kick-off meeting May 6, Lausanne Bojana Tasic Renate Kunz
Joanne Archer University of Maryland Libraries
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
VI-SEEM Data Repository
Extraction, aggregation and classification at Web Scale
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
The Australian Government Web Archive
Web archive data and researchers’ needs: how might we meet them?
Márton Németh – László Drótos How to catalogue a web archive?
Panel on Web Archiving Government Information: LAC’s Program Update
Technical Issues in Sustainability
Integrated Statistical Systems
Webarchive Austria NetarchiveSuite Meeting Madrid 2019
Presentation transcript:

The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management

NLA web archive collections PANDORA Archive collection (open access) – Selective web archiving since 1996 Australian domain harvest collection (closed) – Large scale, outsourced (IA), annual collection, since 2005 Australian Government Web Archive collection (open access) – Bulk seed list harvesting, outsourced (IA) and in-house run, annual (or more frequent) – 2011, 2012, 2013 (x2) and 2014 (x2)

The government publication problem

So where did AGWA come from? Administrative conditions Whole-of-Government arrangements – Gershon Review (Oct. 2008) May 2010 –Secretaries’ ICT Governance Board approval Non-corporate PGPA Agencies Commonwealth corporate entities  Technical and development considerations NLA development of infrastructure and skills Large scale, bulk harvesting Access to large scale, bulk harvested collections

Selective ‘targets’, ‘titles’ Small scale Reactive Timely Scheduled High curation Themed Curated seed lists e.g. gov.au Moderate scale Scheduled Timely High curation 2 nd L Domain e.g. org.au Moderate to large scale Scheduled (moderate control) Moderate curation TL Domain i.e..au Large scale Scheduled (low control) Low curation Whole Web Internet Archive Large scale Ongoing Unscheduled No curation control PANDORAAusCrawl gov.au

NLA Web Archiving Statistics PANDORA Web Archive ‘Selective’ 1996 – Sept (102,000 instances) Australian Domain (.au) Web Archive ‘Country TL domain’ (9 crawls) Australian Government Web Archive ‘Seed-list’ (6 crawls) All Collections Files269 million6.33 billion76.9 million6.67 billion Data13 TB236 TB7 TB256 TB

AGWA content TotalAverage harvest Files34.5 million~ 8 million Data3 TB750 GB – 1 TB

AGWA futures Coming soon: harvest content More commonwealth agencies More integration to a catalogue near you. Next few years: Integration into Trove Metadata extraction Visualisation of data

Feedback to: