Download presentation
Presentation is loading. Please wait.
Published byGwendolyn Wright Modified over 9 years ago
1
Harvesting e-publications in DK – a short status January 2015 By Tue Hejlskov Larsen, netarchive.dk
2
E-books/E-Sound/SMS-books – E-publications Today we don’t know exactly how big the e-publication area is. E-publications (in pdf, mp3 or e-pub format) with or without ISBN/ISSN numbers are published today: ISBNISSN in parallel using different channels/publishers many of them through the biggest danish e-pub publisher Publizon.dk directly to the internet using the author’s own home page or through one of the many very small e-publishers with 10 or 2- 400 e-books like http://gopubli.sh/. directly to the webshops channels e.g. saxo.com in DK or through international sales channels like amazon.com or other foreing located web domains.
3
Currently active pilot projects with publishers Museum Tusculanum ( about 700 titles) Museum Tusculanum Publizon.dk ( I guess about 75 % of the ”normal” commercial e-books/e-sound-books) in numbers about 20.000 e-books and 6.000 e-sound-books) Publizon.dk Smspress.dk (about 100) Smspress.dk
4
Next step OAI-pmh harvesting with NAS of all research libraries and some public institutions using NAS Heritrix OAI extracter module (The aau.dk University is succesfully OAI harvested with some added filters and there was collected about 12.000 pdf’s - allmost the same as Danish National Research Database has information about) OAI-pmh Danish National Research Database One or two commercial webshops
5
Technical solutions 1 Focused NAS harvesting of universities, regions, hospitals, city governments and other public institutions like f.x. Statstidende.dk E.g. by harvesting aau.dk we found about 166.000 pdf files - the so called ”gray/dark e-publication area”– teaching materials, brochures, instructions mixed up with published journal articles, e-books and a lot of duplicates. Metadata and files are in the same harvest and stored in the netarchive.Statstidende.dk SMSBooks Metadata and SMS-books using smspress.dk API in ONIX-format with some new addon extentions for SMS-books. The netarchive.dk paided for the software development at Smspress. Metadata and SMSBooks is stored outside the netarchive.smspress.dkONIX Museum Tusculanum: OAI-pmh harvesting using NAS OAI extractor module ( includes metadata and pdf/e-pub-files in same harvest and stored in the netarchive). The netarchive.dk paided for the software development at Museum Tusculanum.
6
Technical solutions 2 Publizon: a) Metadata about e-books and e-sound-books are extracted from Publizon API and stored outside the netarchive. b) e-book files harvested from ftp://ftp.pubhub.dk using NAS ftp orderxml and stored in the netarchive: ftp://ftp.pubhub.dk true XXXXXXX XXXXXXX true false 0 0 1200
7
Technical solutions 3 Publizon (continued): c) e-sound files harvested from ftp://ftp.pubhub.dk using wget and stored outside the netarchive.ftp://ftp.pubhub.dk Here is the wget command: wget -m -X /*/Splitted/ -A *.mp3 ftp://XXXXXXXX:XXXXXX@ftp.pubhub.dk ftp://XXXXXXXX:XXXXXX@ftp.pubhub.dk
8
And not to forget - the growing number of standalone deliveries We get a growing number of emails with links to e-publications or attached files together with some information. The links are mostly harvested and stored in the netarchive. The attached publications and metadata are stored outside the netarchive (about 300-400 folders)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.