National Library of Finland Digitisation Policy, Production Processes and Access (...and beyond Access) Tiina Ison, Senior Analyst Present-Day Library:

Slides:



Advertisements
Similar presentations
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Advertisements

1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Services Digitisation & Content Management. 600 People – India.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
IFLA Namespaces Gordon Dunsire Chair, IFLA Namespaces Technical Group Session 204 — IFLA library standards and the IFLA Committee on Standards – how can.
Vocabulary Mapping Framework & Libraries Alan Danskin Metadata & Bibliographic Standards Coordinator.
1 Managing Legal Deposit for Online Publications in Germany Cornelia Diebel.
Joachim Bauer Senior System Engineer, CCS
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
WMS: Democratizing Data
National Library of Finland Sustainable Access Library digitisation production workflows and processes adding value to print collections Kuopio
Integrating library and cultural heritage data models: the BIBFRAME - EDM case Sofia Zapounidou 1, Michalis Sfakakis 1, Christos Papatheodorou 1,2 1.Department.
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Digital Pathways: Digital humanities and the interface between scholarly community and libraries Lorna Hughes University of Wales Chair in Digital Collections,
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Is Cataloging Dead: Advocacy for Bibliographic Control Randy Roeder and Rebecca Routh ILA/ACRL Spring Conference Davenport, Iowa March 3, 2008.
Luc Audrain Hachette Livre Head of digitalization
RDA Training FRBR: a brief introduction British Library 2015 (2015 April RDA update)
Digitisation of Cultural Heritage at the National Library of Latvia: Past and Future Uldis Zariņš Head of Strategic Development National Library of Latvia.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Metadata Standards and Applications 1. Introduction to Digital Libraries and Metadata.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany consulting technology digitization services.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
METADATA QUALITY IN EUROPEANA , Den Haag.
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching.
[D2.5] Object model and metadata: Open issues Workgroups Kick-off meeting – 2 & 3 April 2009 Julie Verleyen.
Resource Description and Access Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee for the Development.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Resource Description and Access Deirdre Kiorgaard ACOC Seminar, September 2007.
Antoine Isaac 1 st PRELIDA Workshop Pisa, June 26, 2013.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Evidence from Metadata INST 734 Doug Oard Module 8.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
1 SGML-MARC Incorporating Library Cataloging into the TEI Environment Stephen Paul Davis Columbia University Libraries.
Resource Description and Access (RDA) information session Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
National Library of Finland Metadata in the Digitisation Process Cultural unity and diversity of the Baltic Sea Region – common history, different languages,
National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.
Tiziana // Alessandra Lenzi - MG Breaking down the walls Project Museo Galileo and the Linked Open Data A joint project between.
Discovering libraries’ gold through collection-level descriptions ELAG 2014, Bath Valentine Charles Data specialist.
Current initiatives in developing library linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Identifiers for a Digital World June 29, 2010 Patricia Payton Senior Director of Publisher Relations & Content Development
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
RSC Strategy Gordon Dunsire, Chair, RDA Steering Committee
From FRBR to FRBROO through CIDOC CRM…
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Markup of Educational Content
Chair, IFLA Namespaces Technical Group; Chair-Elect, JSC/RDA
RDA, linked data, and update on development
Metadata - Catalogues and Digitised works
Metadata to fit your needs... How much is too much?
Name authority control in an evolving landscape
Metadata in Digital Preservation: Setting the Scene
RDA in a non-MARC environment
Metadata modelling and metadata creation tool
Presentation transcript:

National Library of Finland Digitisation Policy, Production Processes and Access (...and beyond Access) Tiina Ison, Senior Analyst Present-Day Library: Space, Design, Resources Helsinki Russian Library Heads – NLF visit Giuseppe Acerbi Travels through Sweden, Finland and Lapland to the North Cape in the years 1798 and

1.Digitisation Policy 2.Contextualization – the workflows…. 3.Digitisation Production - containers …. 4.Digitisation Production - content…. 5.Access to Digital Collections… 6.Beyond Access… 7.Experimentation with Beyond Access… Table of Contents 2

New information resources created through library digitisation: 1.Creation of digital manifestation of source material (i.e.book) 2.Enabling content granulation via structural mark up (i.e. chapter) 3.Creation of digitized corpus of text for automated text extraction (i.e. concepts/named entities in text->ontologies) 4.Creation of new metadata and enrichement of metadata through digitisation processes 5.Enabling crowd sourcing and tagging of content in digitisation production (i.e. distributed workflows for content /sttructural mark-up/annotation of themas/semi-automated processes) 6.Ensuring sustainable digitisation via life cycle management and digital preservation of archival copies (i.e. specs for each material type/treatement of aggregates(ER verus OO)/ how to update archival files ) 1.Digitisation Policy National Library, Finland 3

2. Contextualization – the workflows…. Work Granularity (Fragmentation) Authenticity Provenance LOD - URI Acerbi; Physical Manifestation 1 Acerbi Digital Manifestation 2 Acerbi Digital Aggregates of Manifestation 2 Persistent Identifiers and links Structural Fragmentation Chapter Paragraph, Word, Character Persistent Identifiers and links Conceptual Models Ontologies,Themas Relations Named Entities Persistent Identifiers and links (Semantic) Content(METS) Containers How to integrate new (and distributed) workflows in digitisation for contextualising and extracting meaning ? (Catalogue) Container Long Library Workflow Tradition 4

3. Digitisation Production - MARC Container Creator: Acerbi, Giuseppe, Title: Travels through Sweden, Finland, and Lapland, to the North Cape, in the years 1798 and 1799 / by Joseph Acerbi. In two volumes. Illustrated with seventeen elegant engravings. Vol. I[-II]. Publisher: London : Printed for Joseph Mawman..., 1802 (by T. Gillet, Salisbury Square) Format: 2 vol. (xxiv, 396 p., [8] leaves of plates ; viii, 380 p., [9] leaves of plates, some col.) : ill., map ; 4:o. Note Vol. I: Portrait of Joseph Acerbi. Painted by P. Violet, engr. by P. W. Tomkins. Vol. II: Music p Handwritten notes by James Henry Monk ( ) in volume I, identified by Maurice de Coppet. See also p. 92. Provenance: Bequeathed by Maurice de Coppet ( ), French ambassador to Finland in Bookplate "Gallia in Finlandia", designed by the Finnish artist Jukka Pellinen, added later by the National Library of Finland. Aineisto: kirja Language: eng Subject: travel accounts Sweden Finland Lapland Abstract: Giuseppe Acerbi, Italian naturalist, explorer and composer, travelled to the far North of Scandinavia in the years 1798 and His travel account was first published in English and it was soon translated into several languages. Library Catalogue ! Minimal Record for Digitisation Workflow Unique ID of Source Provenance of Source (un)Controlled (linked) Vocabs Value Drivers; Authenticity of Source and Links 5 YSO – Upper Finnish Ontology (English). Eero Hyönen Semantic Web. FinOnto tool

3. Digitisation Production - METS Containers METS EXPORT SIP Packages include: JPEG2000 OCR TXT as ALTO XML PDF JPEG(150) METSXML MARCXML DIGITAL WORK (NEW RESOURCES) Acerbi Manifestation 2 and Acerbi aggregates (chapter/illustrations/poems) of Acerbi digital manifestation 2ONS Standards & OAI-PMH compliant archival quality Two Bibliographic Records SRUCTURAL GRANULATION AGGREGATES: Articles Illustrations Poems DESCRIPTIVE METADATA MARC21/MODS ADMINISTRATIVE METADATA MIX/PREMIS Newspapers Journals Books Parchments Manuscirpts Ephemera Maps Audio ORIGINAL WORK ; Acerbi; Manifestation 1 OMPREHENSIVE DIGITIAL COLLECTIONS S Value Drivers; Authenticity of Work, Manifestation, Aggregates and Links; Interoperability, Digital Preservation 6 STRUCTURAL METADATA METSALTO Cataloguing Scanning Post Processing Structural Mark up OCR Correction Content Annotation Conversion into TEI XML Automatic Entity Extraction CRITICAL MASS NEW WORKFLOWS Printed Digital

7 4. Digitisation Production – OCR correction…. N National Library of Finland and Microtask, 2011 OCR Correction of Historical Digitized Newspapers via crowd sourcing and gaming ‘Mole Hunt’, players are shown two different words from which they must determine any similarities. ‘Mole Bridge’ makes players correctly spell the words appearing on the screen. More than people have completed over 2 million individual tasks inside the games. This measures up to about minutes or about 1700 hours of work (or 226 working days for that matter / 7,5 hours per day). Most of the people helping out in are between 25 and 44 years of age. March, 2011  ALTO files generated via digitisation OCR processes  OCR recognition of old characters, Fractur poor.  Poor quality obstacle for automated extraction of i.e. named entities.  OCR correction is performed via Microtask gaming, experimental project.  Text can be converted to TEI/XML Next mark up of articles/illustrations in historical digitised newspapers and digitsed journals through gaming ….

4. Digitisation Production – Content Granulation 8 1.Structural mark up and granulation of a work (aggregates as works) 2.Content granulation/fragmentation in a work (re-use, re-mixing) 3.Automatic extraction of named entities from works (ontologies development, enrichment of authority files) Structural markup and content mark up as part of digitisation production…. 1.Article/Chapter/Illustration (aggregates as works, container and metadata profile,,digital preservation) 2.Paragraph (annotatin of thema, event, time) 3.Sentence (annotatin of thema, event, time) 4.Word (named entities) 5.Character (OCR correction)

5. Access to Digital Collections…. 1.Physical: National Library of Finland 2.Digitisation: Centre for Digitisation and Conservation; full in-house digitisation production chain, 2milj p newspapers, 2 milj p of journals, books, parchments. 3.Access to Digitized Collections (Public Domain) Restricted Access to Digitized Collections (Copyright Protected) 5.Access to Funded Digitisation by Projects akirja/ or akirja/ 6.National Digital Library – Europeana RDA – 2011 NLF decision to adopt RDA by Open linked data on agenda… Cataloguing Policy due Spring/Summer

6. Beyond Access – Semantic Contextualization 10 For creating a web of linked resources through digitisation production:  Containers, Content and Context as part of digitisation production processes as part of distributed digitisation production…  Focus away from the Catalogue (even though it is magnifique) and make do with a minumum record with persistent ID and link to manifestation 1.  Containers still need to be specified and maintained for digital manifestation 2 and aggregations of digital manifestation 2. If each work stands as separate entity with its associated metadata, it can be linked and annotated in different contexts (METS profiles for works, and how to update containers…)  Use of linked and controlled vocabularies and ontologies in digitisation production aid semantic contextualization (YSO, FinOnto, OCM)  Automatic and semi-automatic extraction of data, named entities, thema, subjects, concpets relations and meaning as part of digitisation production… FRBR, FRAD, FRSAD/CIDOC CRM..  Integrate distributed work processes and crowd sourcing, in addition to repetitive work such as OCR correction, to structural and content markup….

6. Experimentation with Beyond Access – Annotation/Extraction of Meaning…. 11 Acerbi as prototype: Using FRBR,; Funcitional Requirements for Bibliographic Record (FRAD and FRSAD) for annotation of Acerbi text for extracting meaning from content– does it work for ? 1. Markup of named entities (FRAD) (VIAF) 2.Markup of time 3.Markup of place (GeoOnto) 4.Markup of thema (OCM) 5.Markup of subject (FRSAD) 6.Markup up of relations between entities 7.Markup of other….Acerbi defined chapter structure, Acerbi’s knowledge domain, Acebi’s emotions…. How about FRBRoo and CIDOC CRM perspectives ? How about EDM- FRBRoo perspectives ? Experimentation; Tiina Ison, Eeva Murtomaa, Mika Nyman - enabling access to content utilizing conceptual models ….

12 Thank You