KB Lab: Exploring the National Library of the Netherlands’ digital treasure trove Lotte Wilms | @lottewilms.

Slides:



Advertisements
Similar presentations
E-resources Collection Management Anna Grigson E-resources Manager.
Advertisements

Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
Preservation of the Texas Agricultural Experiment Station Bulletin in the Digital Repository By Dr. Rob McGeachin Texas A&M University Libraries June,
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland.
Services Digitisation & Content Management. 600 People – India.
DO WE STILL NEED A CATALOGUE? Discovery, delivery, and engagement at the National Library of Australia DR MARIE-LOUISE AYRES.
Digitization of library collection in developing countries: the Hezekiah Oluwasanmi Library’s experience By Jagboro, K. O. Omotayo,B.O.
Connected Histories Sources for Building British History, Funded under the JISC eContent Capital Programme for 18 months Partners:  Prof. Tim.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Digitisation projects and preserving digital documents in Hungary Current trends in digitisation DELOS, Turin, 3-4. febr István Moldován Hungary,
Collection building for special collections Between Cultural Management and Research: Special collections in the 21 st Century, Weimar Chantal Keijsper,
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
Teula Morgan The Adaptable Repository: Swinburne Online Journals.
Cornell Institute for Digital Collections Digital Technologies and Access At Cornell University Peter B. Hirtle Cornell Institute for Digital Collections.
1 Focus on the User User Centered Design for Finding Articles David Lindahl Director of Digital Library Initiatives University of Rochester Libraries
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Online resources in TCD Library:
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Digitisation of Cultural Heritage at the National Library of Latvia: Past and Future Uldis Zariņš Head of Strategic Development National Library of Latvia.
Copyright 2006, The Ohio State University Mary Manning Eric Schnell Using Greenstone Open-Source Digital Library Software at a Cultural Heritage Institution.
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
Europeana Libraries: what is the value of a library domain aggregator? Susan Reilly (LIBER) LIBER 2012, Tartu.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
University of Bergen Library Electronic publishing Bergen – Makerere visit February 2005.
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
Amos Kujenga ADLSN Training Coordinator Addis Ababa, Ethiopia 5 – 7 November 2014 Introduction To Digital Libraries and Repositories.
TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Datasets of the KB Steven Claeyssens – 19 September 2013.
Implementing an Institutional Repository: Part III 16 th North Carolina Serials Conference March 29, 2007 Resource Issues.
Digitising Special Collections Public-Private Partnerships at the KB and abroad Marieke van Delft, KB, Keeper of Early Printed Collections / Project Leader.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital library of Spanish old newspapers and magazines National Library of Spain.
Digitization of Library Material in Europe. Copenhagen, October 2007 LIBER-EBLIDA Workshop on Digitization Digitisation projects of the Biblioteca.
Texas A&M University Libraries– 5/11/2009 Unmil P. Karadkar Center for the Study of Digital Libraries and The Department of Computer Science Texas A&M.
Topic Maps for Cultural Heritage Collections Conal Tuohy Senior Developer New Zealand Electronic Text Centre
EDLproject WP3 “Developing the European Digital Library” LIBER – EBLIDA workshop Digitisation of Library Material in Europe Copenhagen, October.
Access Impact on Technology in a Digital Library Adolf Knoll National Library of the Czech Republic
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
PA Photos & Documents Exploring Pennsylvania’s digitized documents and photographs This project is made possible by an IMLS grant as administered by the.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
The world’s libraries. Connected. CONTENTdm ® Digital Collection Management Solutions Learn what to consider when outsourcing your library’s digitization.
The world’s libraries. Connected. The Benefits of CONTENTdm Hosting Services OCLC’s Digital Lifecycle Webinar Series April 9, 2013.
Ktisis: Building an Open Access Institutional and Cultural Repository Alexia Kounoudes, Petros Artemi, Marios Zervas Library and Information Services,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Enriching Europeana.
British Library Strategy
LEARNING SERVICES. LEARNING SERVICES Learning Resources As a student of Edge Hill University you have a wealth of resources available to help you complete.
IFLA Newspapers pre-conference Geneva, Arturs Zogla
English Digital Project (EDP711S)
Pre-Course Assignment
VI-SEEM Data Repository
VI-SEEM Data Repository
KB: digitisation and cooperation in the Netherlands
DIGITAL LIBRARY.
Metadata to fit your needs... How much is too much?
DIVE into the Event-Based Browsing of Linked Historical Media
Introduction of KNS55 Platform
Implementing an Institutional Repository: Part III
The Bentley Digital Media Library
Presentation transcript:

KB Lab: Exploring the National Library of the Netherlands’ digital treasure trove Lotte Wilms | @lottewilms

1. KB, National Library of the Netherlands

KB in a nutshell Founded 1798 In 2015 the collection grew by: 49,000 books; 7 million items = 115 km of library materials 42,000 issues of periodicals; 6.5 million digital items; 10,800 current periodicals; 2,700 e-books; 500 licensed databases and e- journals; 3,800 websites. The KB, where I work, was founded in 1798. Currently, we hold about 7 million items, which is about 115 kilometers of books, newspapers, journals and manuscripts. Our collection grows each year, with last year almost 50.000 books and 42.000 periodicals.

Building Capacity 500 study seats (including 125 with a work station) Net floor space of the building : 80,000 m2 Library: 37,000 m2, including 28,000 m2 storage Other institutions: 15,000 m2 Capacity 500 study seats (including 125 with a work station) WiFi available

Mission The KB brings people and information together; The KB makes the library collection of the Netherlands visible, preservable and usable; The KB holds a central position in the library network; The KB helps people to become more skilled, smart and creative. A library such as ours has the task to collect and preserve the written heritage of the Netherlands. We ensure that our storage rooms, which take up 75% of our building, are up to the highest preservation standards and that our material is in the best possible place for storage. The air is not too humid, the temperature is just right and the precious or fragile material is kept in acid free boxes. We have curators that know what is in each collection and research these works and are responsible for acquiring more that fits the profile of their collection. Everything is catalogued to the smallest detail to make sure the audience has the best access to the material. We know what we have, where it is and how we can keep it safe for the future. This is our core business, this is what we do and what we’re good at.

2. Digitised historical newspapers

Delpher newspaper corpus Digitised Dutch newspapers 1618-1995 Images + metadata + text now: 11 million pages (in 1.351.123 issues) prognosis 2020: 20 million pages Full text searchable on: www.delpher.nl

The data type/format level comments PDF issue Searchable text + scan JPEG-2000 page Access: JPEG 2000 lossy compression, colour (or greyscale in case of original from microfilm) Master: JPEG 2000 part 1, lossless compression, greyscale or colour Dublin Core iss./p./art. Descriptive metadata OCR article XML ALTO mpeg21-didl Structural metadata

1618 – 1876 = Public Domain 1877 – 1995 = (Potentially) in copyright However: All content online at delpher.nl Available for research

3. Access

Delpher > 60 million digitised text pages Newspapers, books, magazines, radio bulletins 1618 – 1995 Full text search Updated regularly www.delpher.nl

Delpher data 22 downloadable zip files All newspapers of 1618 - 1876 Metadata + OCR + ALTO CC-BY Unzipped: 622 GB www.delpher.nl/data

Data services Access to data via APIs Open data Set descriptions Manual to access via API Data in copyright Contact dataservices@kb.nl Access for research www.kb.nl/dataservices

4. OCR

OCR correction Crowdsourced correction 17th century newspapers Partner: Meertens Instituut WWII newspapers Partner: NIOD

Automated OCR correction Research project 2000 pages Evaluation of OCR ReOCR existing material Automated correction using machine learning (LSTM) Partner: Netherlands eScience Center

5. Use of Delpher newspapers: external projects

Polimedia Links Dutch parliamentary papers to newspapers and ANP in one search interface CLARIN-NL Project TU Delft, VU University, Sound & Vision, Erasmus University http://www.polimedia.nl/

Translantis: Texcavator “Digital Humanities Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890- 1990.” Utrecht University: 2013 – 2018 Funded by Dutch government Uses analysis tool Texcavator with Delpher newspapers http://translantis.wp.hum.uu.nl/

6. Use of Delpher newspapers: internal projects

KB Lab = online and offline sandbox 2 advisors 1 curator Digital Collections 2 research software engineers = experimental data sets = tools and software = workshops www.lab.kb.nl

Cooperation via KB initiated programs Fellowship Program = 4 months, 1 fte | invited | successful academic 2. Researcher-in-Residence Program = 6 months, 0,5 fte | open call | early career Technical support from KB Lab Tools and/or data always available for others

Tool: Frame generator In collaboration with fellow Prof. Dr. Joris van Eijnatten Automatic extraction of keywords from a text Identification of co-occurrence patterns “How is the concept Europe described in newspapers?”

Data: KBK-1M 1,6 million images extracted from digitised newspapers 1922-1994 Combination of image & byline Available for research purposes Computer vision

Computer vision applications Researchers-in-residence: Thomas Smits (RU): Shift from illustration to photograph in newspapers & recognition of subjects Melvin Wevers (UU): Nearest neighbours of advertisements

Online code repository Open source Free for re-use Open: Github Online code repository Open source Free for re-use

Future plans Be more transparant about collections Research own collections & publish about findings Make access to open sets easier Find new user groups such as social sciences

Any questions?