Webarchive Austria NetarchiveSuite Meeting Madrid 2019

Slides:



Advertisements
Similar presentations
NetarchiveSuite Meeting, BnF, 24./ Curator Track Austria Michaela Mayr Austrian National Library
Advertisements

Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Status and plans for the H3 release NetarchiveSuite 5.0.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
Título de la presentación NetarchiveSuite at the BNE Juan Carlos García Arratia – Chief of IT Development Service, NLS Mar Pérez Morillo – Chief of Web.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery Mat Kelly Web Science and Digital Libraries Research Lab Old Dominion University.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
© 2010 Microsoft Corporation. All rights reserved. Quality Assurance: Towards Tools for Characterizing and Comparing Digital Documents Natasa Milic-Frayling.
CHEP 2015 Analysis of CERN Computing Infrastructure and Monitoring Data Christian Nieke, CERN IT / Technische Universität Braunschweig On behalf of the.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Server and Short to Mid Term Storage Funding Research Computing Funding Issues.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Ch 4. The Evolution of Analytic Scalability
WebArchiv Czech Web Archive IIPC 2007, Paris.
Tool Academy: Web Archiving Nicholas Digital Cultural Heritage DC Meetup December 20, 2012 “cobwebbed screw driver” by Flickr user Colby.
Ecoplus clusters innovation through co-operation TCI Conference Delhi Speaker: Simone Hagenauer | Date:
The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management.
Annick Le Follic Bibliothèque nationale de France Tallinn,
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
1 DELOS Network of Excellence on Digital Libraries with a focus on the Preservation Cluster Andreas Rauber Vienna University of Technology
Plans for 2015 Tallinn, Jan 29 th, 2015 Ditte Laursen, Sabine Schostag,
NetarchiveSuite Sabine Schostag The Netarchive
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Web Archiving and Access Mike Smorul Joseph JaJa ADAPT Group University of Maryland, College Park.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
NetarchiveSuite Meeting, Aarhus, 29./ Austria Updates and Plans for 2013 Michaela Mayr, Andreas P. Austrian National Library
1 Data services and computing. 2 We tend to be dealt the computing environment in which we must operate. Few of us have enough influence to steer the.
Curator wishes for the roadmap november 2011 updates.
NetarchiveSuite Meeting, BnF, Austria Updates and Plans for 2012 Michaela Mayr, Andreas P. Austrian National Library
NetarchiveSuite Meeting, Tallinn, 29./ * Austria Updates and Plans for 2015 Michaela Mayr, Andreas Predikaka Austrian National Library.
CyberCemetery Preserving At-Risk Government Web Content.
Presidential elections PolskieRadio.pl #VoteForPresident The web section that solves 2 main issues: 1.Problem: The lack of public debate in the 1st round.
NetarchiveSuite Meeting, Paris, * Austria Updates and Plans for 2014/2015 Michaela Mayr, Andreas Predikaka Austrian National Library.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
1 NetarchiveSuite Workshop Paris November , 2011.
2015 NetarchiveSuite Workshop Eesti Rahvusraamatukogu Tallinn, Estonia January
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Kathleen Shearer Data management: The new frontier for libraries.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Bielefeld Academic Search Engine
Institution update KB DK
System Overview Training on the use of the new countrystat
ICOS on-demand atmospheric transport computation A use case for interoperability of EGI and EUDAT services Ute Karstens, André Bjärby, Oleg Mirzov, Roger.
BnF - DLWEB - Umbra & Heritrix 3
Managing Copyrights in Invenio
DI4R, 30th September 2016, Krakow
System Overview Training on the use of the new countrystat
VI-SEEM Data Repository
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Documentation as part of curation in web archiving.
Virginia Tech Blacksburg CS 4624
DIGITAL LIBRARY.
Ch 4. The Evolution of Analytic Scalability
The Australian Government Web Archive
Preserving Our Collective Digital History
DDP/DAP Design and Technology Overview
Panel on Web Archiving Government Information: LAC’s Program Update
Jobs and Skills in the Local Economy
Presentation transcript:

Webarchive Austria NetarchiveSuite Meeting Madrid 2019 Michaela Mayr, Andreas Predikaka Austrian National Library webarchiv@onb.ac.at www.onb.ac.at 1 *

Harvesting 2018 Ongoing Collections: Selective Crawls: Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 4 regional elections Austrian EU-presidency 100 years Austrian republic Domain Crawl (1 Stage) Budget 6 TB total

Harvesting 2019 Ongoing Collections: Selective Crawls: Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 1 regional election EU-elections Domain Crawl (1 Stage) Budget 6 TB total

Webarchive Data Statistics Storage 2 mio. domains 125,5 TB raw/ 57,1 TB compressed 3,4 bn. Files Storage Was moved from Federal Austrian Computing Center to ONB internal storage

Plans for 2019 (1) More selective crawls ARC  WARC Tool for nomination and NAS integration for internal and external users ARC  WARC No migration of ARC files Adaption of internal file processing for ARC and WARC Integration of externally created WARCs (e.g. Webrecorder)

Plans for 2019 (2) Infrastructure New crawler machines (hopefully problem solution) ElasticSearch Cluster (old hardware, more nodes), fulltext < 5% Indexing not with Hadoop cluster anymore, single machine processing chunks

ONB Labs https://labs.onb.ac.at/de/ Webarchive contributes to ONB Labs: metadata ngram viewer Metadata licensed under Creative Commons Zero (CC0) API for fulltext search

Webarchive anniversary March 1st 2009 – Austrian Media Act allows webarchiving Half-day conference to celebrate 10 years Webarchive Austria 29.03.2019 Stakeholders, libraries, researchers, interested public…

Thank you!