Presentation is loading. Please wait.

Presentation is loading. Please wait.

BnF experiences in using NAS 5 And Heritrix 3

Similar presentations


Presentation on theme: "BnF experiences in using NAS 5 And Heritrix 3"— Presentation transcript:

1 BnF experiences in using NAS 5 And Heritrix 3
BnF - DLWEB - Umbra & Heritrix 3 BnF experiences in using NAS 5 And Heritrix 3 Géraldine Camile NAS Workshop, Vienna, 27 April 2017

2 Curator comfort : 1 tool to monitor the harvest

3 The harvest templates migration
The departure : a table excel with the specificities of each harvest template H1

4 The harvest templates migration
We adapted the Danish default harvest template For example : calculation of the budget in URL And declined it to create all the generic harvest templates (domain, host…) Except for the paid press Our engineer merged all 9 harvest templates in only 2

5 The first results of H3 harvest
A better crawl log Less error 4XX Less 2XX but H1 generated a lot of wrong URL 200 H3 respects more the number of redirections Consequence 1 : the crawled data are less heavy To prevent the budget increase due to no deduplication, we have decreased by 10% all our budgets Consequence 2 : The duration of the crawl is shorter Consequence 3 : we crawl less images Other remark : we crawl more https URL


Download ppt "BnF experiences in using NAS 5 And Heritrix 3"

Similar presentations


Ads by Google