Harvesting e-publications in DK – a short status January 2015 By Tue Hejlskov Larsen, netarchive.dk.

Slides:



Advertisements
Similar presentations
Rclis in vision and reality Thomas Krichel
Advertisements

Managing References : Mendeley
Directorate of Learning Resources Accessing electronic journals from off-campus This causes lots of headaches, but dont despair, heres how to do it! If.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Citing and referencing with EndNote and Word Mari Lundevall, Informatics library.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Using LegalTrac To Find Law Review Articles. What Is LegalTrac? A commercial service UW Libraries subscribe Indexes law reviews, other legal periodicals.
Dublin Core as a tool for interoperability Common presentation of data from archives, libraries and museums DC October 2006 Leif Andresen Danish.
11 WARC standard revision workshop Clément Oury IIPC General Assembly open workshops Stanford, April 28th, 2015 IIPC General Assembly – Stanford – April.
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
Managing references : Mendeley
FIRST COURSE Creating Web Pages with Microsoft Office 2007.
OARE Module 6B: E-journal, E-books and Internet Resources: Free E-book Access.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
Uppsala, ETD-2007 ETD discovery services in Lithuania Vilius Kučiukas Antanas Štreimikis Arūnas Franckevičius Aleksandras Targamadzė Lithuanian Academic.
1 News and media websites harvesting. 2 A daily crawl since December 2010 The selective crawl contains 92 websites National daily newspapers (
Mendeley Institutional Edition Hazman Aziz, eProduct Manager (APAC) University Kebangsaan Malaysia.
Archive-it WARC usage - compared with NAS – and 3 Questions. By Tue Hejlskov Larsen, netarchive.dk January 2015.
Collect Data needed from this resource below: Book: Last name, First name. Title of the Book. Place of Publication: Publisher, Year of Publication. Author’s.
Uganda Science Digital Library (USDL) Digitizing and publishing documents Bergen – Makerere visit February 2005.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
Researching at the Millsaps Library. Goal for Today Prepare you for research you will be asked to perform in your classes at Millsaps.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
1 Public Relations Library Instruction Public Relations Library Instruction Christine Adams Business & Economics Librarian Phone: (330)
UCSD: EndNote Essentials. Buy EndNote $
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Relational Databases Melton, Beth “Databases: Access Terminology and Relational Database Concepts.” 09/LPMArticle.asp?ID=73http://pubs.logicalexpressions.com/Pub00.
1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov.
Research Seminar Series Laura Abate Electronic Resources & Instructional Librarian
Connie Rogers LIS 764 Fall is a free bibliography composer helps you generate, edit and publish a works cited list guides you through punctuation.
28 th January 2015 Using Library Resources Andrew Taylor Faculty Librarian Phone:
EndNote. What is EndNote? EndNote is referencing software that enables you to create a database of references from your readings.
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Web Browsers  Web browser- software that you run on your computer to make it work as a web client.  Web Servers- Computers connected to the Internet.
The Danish National Research Database Which approach: Look at the environment first and the software afterwards or vice versa? Look at the environment.
SPRINGER ONLINE
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Gold Rush Electronic Resource Discovery and Management System George Machovec Colorado Alliance of Research Libraries
Quick Review of Basics NSUOCO New Residents Orientation October 2012.
Libraries and APIs CMPT 281. Overview Basics of libraries and APIs Rich internet applications Examples – Scriptaculous – JQuery.
Jason Platts Lead Technical Developer The Open University An overview of how the Open University has incorporated bibliographic.
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Project RoMEO The RoMEO Project is funded by the UK Joint Information Systems Committee (JISC) Romeo and Juliet, 1884 by Sir Frank Dicksee ( )
Managing your research: Citation Management Systems Scott Johnston University of Victoria.
A centre of expertise in digital information management UKOLN is supported by: What are the Barriers to Web Resource Preservation?
Netarchive Plans for the next year. Netarchive – Plans for the next year  4 broad crawls  One broad crawl lasts less than 55days  We are able to fullfill.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
JST Chinese Bibliographic Database January, 2007 Japan Science and Technology Agency (JST) Office of Science and Technology Information.
Getting Started with EndNote X7 Dr. Christiane Holtz,
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2015 Attila Skulteti
Basics of the Endnote bibliography and refererence manager program Attila Skulteti
Insert location Picture here Insert place Picture here Insert Human- Environment Picture here Insert Regions Picture here Insert movement Picture here.
Learning Services Induction for Partner Institution Students As a student of Edge Hill University you have a wealth of resources available to help you.
Using the University of Northampton Library: an ‘EWO’ guide for students based at other locations Please note: The University’s official term for arrangements.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2015 Attila Skulteti
NIST Office of Data and Informatics (ODI) of the Material Measurement Laboratory Robert Hanisch, director Ray Plante, interoperability expert ODI has responsibility.
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2016 Attila Skulteti
Using the University of Northampton Library: a guide for Law students based at other locations Please note: The University’s official term for arrangements.
Institution update KB DK
Summon – Hinari Search Part B (Basic Course Module 7)
Summon - HINARI Search (Basic Course Module 7 Part B)
Learning Services Induction for Partner Institution Students
London – 11th June 2015 (afternoon – part 2)
Review Key Teaching Points
Márton Németh – László Drótos How to catalogue a web archive?
Summon - HINARI Search (Basic Course: Module 7 Part B)
Presentation transcript:

Harvesting e-publications in DK – a short status January 2015 By Tue Hejlskov Larsen, netarchive.dk

E-books/E-Sound/SMS-books – E-publications Today we don’t know exactly how big the e-publication area is. E-publications (in pdf, mp3 or e-pub format) with or without ISBN/ISSN numbers are published today: ISBNISSN  in parallel using different channels/publishers  many of them through the biggest danish e-pub publisher Publizon.dk  directly to the internet using the author’s own home page or through one of the many very small e-publishers with 10 or e-books like  directly to the webshops channels e.g. saxo.com in DK or through international sales channels like amazon.com or other foreing located web domains.

Currently active pilot projects with publishers  Museum Tusculanum ( about 700 titles) Museum Tusculanum  Publizon.dk ( I guess about 75 % of the ”normal” commercial e-books/e-sound-books) in numbers about e-books and e-sound-books) Publizon.dk  Smspress.dk (about 100) Smspress.dk

Next step  OAI-pmh harvesting with NAS of all research libraries and some public institutions using NAS Heritrix OAI extracter module (The aau.dk University is succesfully OAI harvested with some added filters and there was collected about pdf’s - allmost the same as Danish National Research Database has information about) OAI-pmh Danish National Research Database  One or two commercial webshops

Technical solutions 1  Focused NAS harvesting of universities, regions, hospitals, city governments and other public institutions like f.x. Statstidende.dk E.g. by harvesting aau.dk we found about pdf files - the so called ”gray/dark e-publication area”– teaching materials, brochures, instructions mixed up with published journal articles, e-books and a lot of duplicates. Metadata and files are in the same harvest and stored in the netarchive.Statstidende.dk  SMSBooks Metadata and SMS-books using smspress.dk API in ONIX-format with some new addon extentions for SMS-books. The netarchive.dk paided for the software development at Smspress. Metadata and SMSBooks is stored outside the netarchive.smspress.dkONIX  Museum Tusculanum: OAI-pmh harvesting using NAS OAI extractor module ( includes metadata and pdf/e-pub-files in same harvest and stored in the netarchive). The netarchive.dk paided for the software development at Museum Tusculanum.

Technical solutions 2  Publizon: a) Metadata about e-books and e-sound-books are extracted from Publizon API and stored outside the netarchive. b) e-book files harvested from ftp://ftp.pubhub.dk using NAS ftp orderxml and stored in the netarchive: ftp://ftp.pubhub.dk true XXXXXXX XXXXXXX true false

Technical solutions 3  Publizon (continued): c) e-sound files harvested from ftp://ftp.pubhub.dk using wget and stored outside the netarchive.ftp://ftp.pubhub.dk Here is the wget command: wget -m -X /*/Splitted/ -A *.mp3

And not to forget - the growing number of standalone deliveries  We get a growing number of s with links to e-publications or attached files together with some information. The links are mostly harvested and stored in the netarchive. The attached publications and metadata are stored outside the netarchive (about folders)