National Library of Finland Sustainable Access Library digitisation production workflows and processes adding value to print collections Kuopio Tiina Ison, Senior Analyst
Outline 1.Paradigm Shifts Sustainable Strategies for Print Digitisation (PPP) ? 3.User Community Value Drivers 4.Library Value Drivers- Market Positioning 5.National Digital Library (KDK/PAS) 6.(Mass) Digitisation Project ( ) 7.In-House Digitisation Production and Workflows 8.Interoperable Metadata and METS Profiles 9.Adding Value in Production to Print Collections 10.Sustainability....
1. Paradigm Shifts.... Sustainable access to printed collections One book, one article – is publishing/licensing concept sustainable.... ? Open access – including use and reuse of content, repurposing Print versus data intensive science paradigm Granularity and aggregation of content, repurposing, repackaging... Text and data mining of corpus of collections accross disciplines interlinking datasets by computational tools Text mining and automated concept extraction methods used to add semantic metadata to text Eeva Ahonen and Eero Hyvönen: Publishing Historical Texts on the Semantic Web - A Case Study. Proceedings of the Third IEEE International Conference on Semantic Computing (ICSC2009) (forthcoming), Berkeley, CA, USA, September, 2009, Historical Texts on the Semantic Web - A Case Study Community generated contribution – crowd sourcing Volunteer contribution economy, Two Way Engagment (JISC study being commissioned) Galaxy Zoo Australian Historial Newspapers
example of open access to corpus of print collection Open Access for Research, Education and Citizen use and contribution, non commercial use in a free knowledge economy (sounds like public domain and cc license citizen contribution ? non commercial use ?) National Library of Australia, Historical Newspapers
2. Emerging Print Digitisation Strategies (PPP) European and National Digitisation Strategies aim for critical mass (mass) digitization of print EU Comission warning members states of SILOs... Mr. Javier Hernandez_Ros on behalf of Vivienne Redding 1. Books (Danish Library, Director General Erland Kolding Nielsen categorizing market at Liber digitisation seminar2009 with some additions … ) Early Printed books up to 18th Century – ProQuest Market Penetration Printed Books Google Market Penetration Books under copyright – Nordic Extended Licensing/Rights Registry Google/Arrow Project EU Black Market and Priacy Economy (Global Information Divide, Social Generation Divide) 2. Scientific Journals A publisher’s market? Not covered by Google settelment 3. Newspapers Likelyhood of ongoing government funding ? high value corpus ? 4. Ephemera Did i remember to mention no ongoing funding for sustainable digitisation strategies ?
example of toll gated/silo access to corpus of print collection Toll gated access to digitised content limited by membership, limited to country boundaries. Toll gated citizen access. Limited reuse and repurposing. British Library, Newspaper Collection (sounds like closed access )
3. Community Value Drivers Sustainability builds on user and community needs open access digital format minimum restrictions use and resuse at will data intensive research free content for semantic web and data linking free community contribution fun and play (creativity) are user community value drivers informing descisions about building infrastructure, workflows and processes for sustainable access and long term preservation ? non users of today are user market of tomorrow
4. Library Value Drivers – Market Positioning Sustainable access invests in quality infrastructure and workflows for: Trustworthy, authoritative sources Citation with trust Physical and digital provenance Rights management Links between physical items and digitial surrogates and manifestations Complex objects, granularity Level and perisistend ID’s Links in catalogue/union catalogues Interoperable metadata and use of standards Long term preservation Analogue work practices are well established in libraries..... Do Libraries have the know how to move from analogue to digitial world – no mention has yet been made about scanning !
5. National Digitial Library KDK/PAS RAKE Structural Change in Higher Education, Finland National Infrastructure Development Projects funded by the Education Ministry 1.National Digital Library Initiative 2008– Long Term Preservation Initiative 2012, 2016 ? 3.(Mass) Digitisation Project, 2007–2009 METS profiles Rights Management... Ministry of Education Did i remember to mention copyright, orphan woks, data protection issues ?
1.One Production Line - one production line between back end digitisation production and National Digital Library Infrastructure. 2.Process modeling – library wide logistics, processes and workflows are modeled and renewed where needed 3.Interoperable Metadata - quality of metadata used, captured and packaged throughout the digitisation production line is adequate for access and long term preservation needs 4. Tools - Ensure appropriate tools are put in place for tracking and managing workflows between National Library at Helsinki and Digitisation Centre at Mikkeli 6. (Mass) Digitisation Project,
7. In-House Digitisation Production affects Workflows Process Modelling Sustainable in-house digitisation production affects Library wide workflows and requires workflow re-design in a distributed environment. Processes, tools, work practices and standards are required for controlling: 1.Physical printed Item (preservation and logistics of transport) 2.Management of digital objects from production to access/preservation 3.Control of metadata Metadata for Printed and Digital Traditional concept of metadata for a printed object extended with lconcept of metadata for digital resouces for provision of sustainble access and preservation. 1.Bibliographic Metadata (MARC21) 2.Administrative Metadata 1.Technical Metadata 2.Rights Metadata 3.Long Term Preservation Metada 3.Structural Metadata 1.Physical Structure 2.Logical Structure
8. Interoperable Metadata Standards and METS Profiles METS profiles for monographs, newspapers, journals, audio… EXPORT FILES : JPEG2000, lossless, PDF, OCR TXT as ALTO XML, JPEG (150dpi), METSXML and MARCXML METS container or wrapper provides a SIP package for delivery and exchange of digital objects accross systems that is OAI-PMH compliant. Wraps descriptive, administrative and structural metadata + PREMIS. MODS and MARCXML for descriptive and bibliographical metadata ( ( MIX for image technical metadata ( PREMIS for preservation metadata ( PREMIS for rights management metadata. Metadata standards and METS incorporated into National Digital Library (KDK) recommendations by Technical Working Group for DL metadata portflio (standardi salkku)
9. Adding Value in Production to Print Collections... Unique ID for Physical Items at Collections (Barcodes) Minimal Bibliographic Record for non catalogued items Status in catalogue Two bibliographic records will be created into Fennica catalogue Unique and persistent ID’s for digital objects, pages, segments URN:NBN resolver Metadata re-use Catalogue enrichment Complex objects, granularity level and structural mark-up Technical metadata and provenance
10. Sustainability... OPiNiONS ? Should google or publihsers do it ?