Download presentation
Presentation is loading. Please wait.
Published byJulian Barker Modified over 9 years ago
1
NetarchiveSuite Meeting, Tallinn, 29./30.01.2015 * Web@rchive Austria Updates and Plans for 2015 Michaela Mayr, Andreas Predikaka Austrian National Library webarchiv@onb.ac.at www.onb.ac.at
2
Harvesting 2014 Ongoing Collections: –Media (since 2011) –Politics (since 2013) incl. 1 regional election Olympic Winter Games Sochi –3 seeds daily, 96 seeds weekly EU elections –132 seeds daily, 33 seeds weekly World War I –151 seeds * Budget = 2 TB
3
Harvesting 2015 Ongoing Collections: –Media (since 2011) –Politics (since 2013) incl. 4 regional elections 4th Broad Crawl –New TLDs.wien,.tirol –ARC format, NAS 4.4, PostgreSQL Eurovision Song Contest Content behind paywalls? * Budget = 10 TB
4
Statistics Approximately 1.4 m. domains 60 TB raw / 30 TB compressed 2 bn. files *
5
Access Prototype for online search interface (no access to data) –Improved search possibilities (partial fulltext-search of selected seeds) –User tracking (inhouse, online) and data handling with ELK stack (Elasticsearch, Logstash, Kibana) External access for 4 libraries
6
NAS & other tech stuff E-Mail-Notification Tool (for selective crawls) NAS Release tests File Format Identification (DROID, as part of ONB risk mangement) *
7
NAS & other tech stuff HADOOP –Responsibilites changed –Problem solving in progress To do until broad crawl (03/15): –Database Migration MySQL to PostgreSQL –Switch to NAS 4.4 Switch to OpenWayback
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.