Webarchive Austria NetarchiveSuite Meeting Madrid 2019 Michaela Mayr, Andreas Predikaka Austrian National Library webarchiv@onb.ac.at www.onb.ac.at 1 *
Harvesting 2018 Ongoing Collections: Selective Crawls: Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 4 regional elections Austrian EU-presidency 100 years Austrian republic Domain Crawl (1 Stage) Budget 6 TB total
Harvesting 2019 Ongoing Collections: Selective Crawls: Media (since 2011) Politics (since 2013) Ariadne: women/gender (since 2016) Selective Crawls: 1 regional election EU-elections Domain Crawl (1 Stage) Budget 6 TB total
Webarchive Data Statistics Storage 2 mio. domains 125,5 TB raw/ 57,1 TB compressed 3,4 bn. Files Storage Was moved from Federal Austrian Computing Center to ONB internal storage
Plans for 2019 (1) More selective crawls ARC WARC Tool for nomination and NAS integration for internal and external users ARC WARC No migration of ARC files Adaption of internal file processing for ARC and WARC Integration of externally created WARCs (e.g. Webrecorder)
Plans for 2019 (2) Infrastructure New crawler machines (hopefully problem solution) ElasticSearch Cluster (old hardware, more nodes), fulltext < 5% Indexing not with Hadoop cluster anymore, single machine processing chunks
ONB Labs https://labs.onb.ac.at/de/ Webarchive contributes to ONB Labs: metadata ngram viewer Metadata licensed under Creative Commons Zero (CC0) API for fulltext search
Webarchive anniversary March 1st 2009 – Austrian Media Act allows webarchiving Half-day conference to celebrate 10 years Webarchive Austria 29.03.2019 Stakeholders, libraries, researchers, interested public…
Thank you!