Datasets of the KB Steven Claeyssens – 19 September 2013
Datasets of the KB 1.General characteristics 2.Examples 3.Technical characteristics 4.Legal aspects Datasets
Datasets of the KB Datasets: general characteristics Collections of digital data (metadata, data) Born digital or the result of mass digitisation (reborn digital) Dutch cultural heritage material, primarily text New units of publication (distant reading)
Datasets of the KB Datasets: examples 1. Early Dutch Books Online (EDBO) 11,240 volumes; 9,710 titles Books published in the Netherlands, million pages
Datasets of the KB Datasets: examples 2. Historical Newspapers 1,457 titles Published in the Netherlands and former colonies, ca. 9 million pages ca. 99 million articles
Datasets of the KB Datasets: examples 3. Periodicals 80 titles Published in the Netherlands, million pages
Datasets of the KB Datasets: technical characteristics Machine readable access Documents in PDF and/or JPEG (Semi-)structured in XML Metadata in Dublin Core OCR with word coordinates Access via SRU and OAI-PMH
Datasets of the KB Datasets: legal aspects In theory: as ‘open’ as possible Public Domain = Public Domain Metadata by KB = CC0 In practice: most datasets are hybrid Solution: negotiating rights for KB site and for researchers
Datasets of the KB Thank you for your attention. Questions? E E