Download presentation
Presentation is loading. Please wait.
1
BnF experiences in using NAS 5 And Heritrix 3
BnF - DLWEB - Umbra & Heritrix 3 BnF experiences in using NAS 5 And Heritrix 3 Géraldine Camile NAS Workshop, Vienna, 27 April 2017
2
Curator comfort : 1 tool to monitor the harvest
3
The harvest templates migration
The departure : a table excel with the specificities of each harvest template H1
4
The harvest templates migration
We adapted the Danish default harvest template For example : calculation of the budget in URL And declined it to create all the generic harvest templates (domain, host…) Except for the paid press Our engineer merged all 9 harvest templates in only 2
5
The first results of H3 harvest
A better crawl log Less error 4XX Less 2XX but H1 generated a lot of wrong URL 200 H3 respects more the number of redirections Consequence 1 : the crawled data are less heavy To prevent the budget increase due to no deduplication, we have decreased by 10% all our budgets Consequence 2 : The duration of the crawl is shorter Consequence 3 : we crawl less images Other remark : we crawl more https URL
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.