Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heritrix 3: librarian features BnF proposal March 2015.

Similar presentations


Presentation on theme: "Heritrix 3: librarian features BnF proposal March 2015."— Presentation transcript:

1 Heritrix 3: librarian features BnF proposal March 2015

2 Context Follow up of our NetarchiveSuite workshop in Tallinn: – https://sbforge.org/display/NAS/2015+Workshop+Conclusion https://sbforge.org/display/NAS/2015+Workshop+Conclusion Identified work packages: – tests – template migration – implementation of important but missing curator features for common operations in Heritrix 3 BnF will further describe use cases, share them with the community for feedback and implement the following features as a minimal Heritix UI add-on

3 From H1…

4 … to H3

5

6

7 Common curator operations Search crawl.log Add filter on current job (job configuration) Change domains/hosts budget (job configuration) View or delete frontier URIs

8

9

10 Search crawl.log (NASC61) Add a page with the same layout but with 2 additional form fields: – Regular expression: – Show matches: 1000 (default # of matching URIs) – Action => Display URIs (reversed order by default) Possibility to refresh display (F5)

11 Draft UI for « Search crawl log » Display URIs Status + job ID Home Forward Reversed Matching lines: 1000 Lines: displaying 1-1000 out of 12345

12 Common curator operations Search crawl.log Add filter on current job (job configuration) Change domains/hosts budget (job configuration) View or delete frontier URIs

13

14 Add filter on current job (DecideRule) (NASC60) Not necessary to view active filters that were included from job start (NASC59) Add a page containing a rejectTemporarily area working with the following parameters: – Decision: REJECT – List-logic: OR – Regexp-list : empty at job start, free textarea which can be manually edited and sorted (440 px wide, 20 lines) – Action => Save: save current filters and activate them for current job

15 Draft UI for « Add filter on current job » Status + job ID Home All URIs matching any of the following regular expressions will be rejected from the current job. Regular expressions: Save

16

17 Common curator operations Search crawl.log Add filter on current job (job configuration) Change domains/hosts budget (job configuration) View or delete frontier URIs

18 Change domains/hosts budget Works with queue-total-budget and quota- enforcer systems Add a page containing: – a list of domains/hosts (in domain alphabetical order) – their associated budget value (which can be edited) – only those which budget is not set by default – and a form field to add a new domain/host

19 Draft UI for « Change domains/hosts budget » Status + job ID Home Save Budget defined in job configuration: queue-total-budget of 100 000 URIs. bnf.fr 140 000 ina.fr 139 000 cnc.fr 139 500 Budgets of following domains/hosts have been changed in the current job: New domain/host: toto.fr – 130 000 Save

20 Common curator operations Search crawl.log Add filter on current job (job configuration) Change domains/hosts budget (job configuration) View or delete frontier URIs

21

22 View or delete frontier URIs (NASC56 + NASC57 + NASC58) Add a page containing 2 form fields: – Regular expression: – Show matches: 1000 (default # of matching URIs) – Action A => Display URIs: displays the matching URIs, the # of matching URIs and gives the possibility to view the next bloc of matching URIs – Action B => Delete URIs: delete matching URIs and indicates the # of matching URIs

23 Draft UI for « View or delete frontier URIs » Status + job ID Home URIs: displaying 1-1000 out of 12345 Matching lines: 1000 URIs: displaying 1-1000 out of 12345 Pause the job first to view frontier

24 search Job configuration add filter – change budget

25 Comparaison with BAnQ

26

27


Download ppt "Heritrix 3: librarian features BnF proposal March 2015."

Similar presentations


Ads by Google