Status and plans for the H3 release NetarchiveSuite 5.0
History NetarchiveSuite released May 2014 Initial effort focused on refactoring codebase to a modern development setup Nicholas and Søren started on H3 in September 2014 5.0 is aiming for a minimal Heritrix3 release. Ability to perform harvests on the same level as H1 Minimal Heritrix GUI Minimal leverage of new H3 capabilities Same warc, deduplication, etc
NetarchiveSuite 5.0 with release plan Start March: Alpha release with limited possibility to configure and run a harvest. Start April: Beta release able to perform harvests with correct running job, warc files and archiving. Start June: 5.0 Release after testing at institutions. Initial template migration finished.
Current H3 roadmap Isolate Heritrix 1 code in separate modules Create similar H3 modules Generalize harvesting template to support H3 Generalize NAS crawler API to support H3 H1/H3 config option, perhaps using channels
What comes after NetarchiveSuite 5 Leverage the new features of Heritrix3 Support for Umbra Replace the current Archive module functionality: Bitpreservation: Bitrepository platform Processing: Hadoop