Download presentation
Presentation is loading. Please wait.
Published byGerhard Beck Modified over 5 years ago
1
Webarchive Austria NetarchiveSuite Meeting Madrid 2019
Michaela Mayr, Andreas Predikaka Austrian National Library 1 *
2
Harvesting 2018 Ongoing Collections: Selective Crawls:
Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 4 regional elections Austrian EU-presidency 100 years Austrian republic Domain Crawl (1 Stage) Budget 6 TB total
3
Harvesting 2019 Ongoing Collections: Selective Crawls:
Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 1 regional election EU-elections Domain Crawl (1 Stage) Budget 6 TB total
4
Webarchive Data Statistics Storage 2 mio. domains
125,5 TB raw/ 57,1 TB compressed 3,4 bn. Files Storage Was moved from Federal Austrian Computing Center to ONB internal storage
5
Plans for 2019 (1) More selective crawls ARC WARC
Tool for nomination and NAS integration for internal and external users ARC WARC No migration of ARC files Adaption of internal file processing for ARC and WARC Integration of externally created WARCs (e.g. Webrecorder)
6
Plans for 2019 (2) Infrastructure
New crawler machines (hopefully problem solution) ElasticSearch Cluster (old hardware, more nodes), fulltext < 5% Indexing not with Hadoop cluster anymore, single machine processing chunks
7
ONB Labs https://labs.onb.ac.at/de/
Webarchive contributes to ONB Labs: metadata ngram viewer Metadata licensed under Creative Commons Zero (CC0) API for fulltext search
9
Webarchive anniversary
March 1st 2009 – Austrian Media Act allows webarchiving Half-day conference to celebrate 10 years Webarchive Austria Stakeholders, libraries, researchers, interested public…
10
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.