Current Challenges in Digitization

Slides:



Advertisements
Similar presentations
End-to-end document capture, indexation, OCR to Microsoft SharePoint
Advertisements

Preservation of the Texas Agricultural Experiment Station Bulletin in the Digital Repository By Dr. Rob McGeachin Texas A&M University Libraries June,
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) DIGITISATION INIS Training Seminar 7-11 October 2013, Vienna,
Enterprise Integration Solutions SharePoint Imaging.
Services Digitisation & Content Management. 600 People – India.
Hyper compression with Create very small archives that are both in COLOUR and perfectly LEGIBLE with the revolutionary I.R.I.S. compression technology.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
DIGITIZATION OF LOCAL HISTORY COLLECTIONS IN PUBLIC LIBRARY “VLADISLAV PETKOVIC DIS” IN CHACHAK: DIGITIZATION OF THE NEWSPAPER “THE VOICE OF CHACHAK” Bogdan.
David Ometto Product Manager. Discover… The most robust solution to structure, index, compress and convert all your documents into optimized text files.
Cornell Institute for Digital Collections Digital Technologies and Access At Cornell University Peter B. Hirtle Cornell Institute for Digital Collections.
2.01 Understand Digital Raster Graphics
ANNO – AustriaN Newspapers Online A digitisation initiative of the Austrian National Library.
File Formats The most common image file formats, the most important for cameras, printing, scanning, and internet use, are JPG, TIF, PNG, and GIF.
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process Krystyna K. Matusiak, Ph.D. Assistant Professor Library & Information.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Research Methods & Data AD140Brendan Rapple 2 March, 2005.
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
DML-CZ: Scanning and adjusting the images Martin Lhoták Academy of Sciences Library Launching the DML-CZ Prague.
Task 01 – Explain how different types of graphical images relate to file formats, file conversions, formats and compression. Emily Riley.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
The Luminary Library Experience: Large scale digitization at Toronto Public Library Agenda Introduction Background The project Current status Implementation.
Overview of Graphic Communications Vocabulary
The most powerful high-speed scanning, indexing and OCR solution on the market Supports many high speed scanners: Fujitsu, Canon, Kodak, Epson, Avision,
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
2002 September -- ejk/UF RESEARCH TOPICS Web-Interface Performance DTD Extensibility Imaging Distillation Other topics?
Mark Sullivan Digital Library of the Caribbean. Imaging  Imaging Theory & Specifications  Recommended Equipment and Software 2 dLOC Training (7/29/2013)
TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library.
Confidential, I.R.I.S. © 2005, All rights reserved Discover… The most robust solution to structure, index, compress and convert all your documents into.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
4DigitalBooks 1 Ivo IOSSIGER President Automatic page Turning and Scanning Books Switzerland.
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
1 Bridging the gap between the paper past and digital future.
Digitization Programmes National Library of the Czech Republic Adolf Knoll
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
Digitization/Scanning Process from Crystal Infosystems & Services.
Digital library of Spanish old newspapers and magazines National Library of Spain.
Scanners. Using a Scanner Scanners are used to digitize any flat object. Several types of scanners- flatbed, sheet fed, handheld, film. Most common is.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
WORKING WITH VENDORS: THE UCONN “DAILY CAMPUS” STUDENT NEWSPAPER DIGITAL REFORMATTING CASE STUDY DIGITAL COMMONWEALTH ANNUAL CONFERENCE MAY 1, 2013 DEVENS.
DIGITIZATION IN THEORY AND PRACTICE WEBSITE: Helen Nneka Okpala Presentation done at University of.
Understanding Images. Pixels pixels Every image is made up of very small squares called pixels, and each pixel represents a color or shade. Pixels within.
PDF Recovery Tool Fix Portable Document File Format.
2.01 Understand Digital Raster Graphics
2.01 Understand Digital Raster Graphics
Objective % Describe digital graphics production methods.
DIGITIZATION OF PAPER DOCUMENTS OF INSTITUTE OF OCEANOGRAPHY’S LIBRARY
2.01 Understand Digital Raster Graphics
Sunan Kalijaga State Islamic University Yogyakarta
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Introduction To Photo Editing SHIELA MAE A. AQUINO SRNHS.
Associated Hardware and File Handling
Digital Scanning at the Course Materials Program
IMAGE SIZE AND RESOLUTION
Inclusive practice: Creating accessible resources for learning and teaching This session will introduce you to principles around creating accessible print.
2.01 Understand Digital Raster Graphics
DIGITAL LIBRARY.
Scan to USB.
Objective % Describe digital graphics production methods.
2.01 Understand Digital Raster Graphics
RESEARCH TOPICS Web-Interface Performance DTD Extensibility Imaging
PRODUCTION PHASES CHANGES
Dissemination and Communication Introductory course
Quick and Dirty: the art of OCR
Presentation transcript:

Current Challenges in Digitization Thank you for giving me the opportunity to present and welcome Who we are – What we do Background Ivo Iossiger Ivo IOSSIGER President Director General

Digitization Brings Documents Online paper content digital images digital files digital content Books, magazines and newspapers are precious collections of knowledge and reference Allow users to search and access valuable content through most recent online technologies The new trend : "What is not online, does not exist !”

Content, knowledge are imprisoned between pages of books. The Problem Content, knowledge are imprisoned between pages of books. Millions of documents are written in various languages. How to digitize by preserving the book or the unique copy. Why do we exist – Level of pain

Manipulate the large variety : Challenges of Books Manipulate the large variety : page formats paper types binding types soft or rigid covers Turn one by one all pages of a book and present them to a camera. Reproduce accurately the aspect of every page page layout calligraphy images

Enhancing images of pages Challenges of Pages Enhancing images of pages remove typical artifacts (borders, split pages) remove transparency of print from the opposite page (unbleed) correct rotation of page (deskew) correct page curvature of bound documents enhance contrast and clear page background

Challenges of Text Recognize text and make it accessible for full text search, copy and paste.

How to reduce costs per page ? Challenges of Costs Equipment and production tools are more performing and reliable engages fixed costs Human Labor is expensive and slow engages variable costs Logistics and workflow are sophisticated and diverse Why do we exist – Level of pain How to reduce costs per page ?

The Solution Everybody is Looking for How to digitize a large number of books as quick and as cheap as possible with preserving superior quality ? automatic solution without operator superior productivity beyond humans insure digitization of all pages produce high quality images unattended faster reliable Increase VOLUME, Keep QUALITY, Decrease PRCE

The Solution of the Past An operator turns pages all day long ... an endless task forbidding and tedious task limited performance by concentration and tiredness irregular quality due to individuals contradiction between performance and motivation

The Solution of the Future

Technology and Production Challenges Image Scanning Image Treatment OCR Indexing Structuring Providing Online Paper Content 4DigitalBooks ABBYY Digital Library Digital Content Digitizing Line Recognition Server Page Improver Text Search

Challenges of File Formats -17 -16 -11 -9 -8 -4 -1 time you are here Image & Text 1993 PDF (Acrobat 1) 2001 PDF hidden text (Acrobat 5) 2005 PDF/A 2008 PDF is an open standard (Acrobat 9) 1992 JPEG 2000 JPEG 2000 Image 1992 TIFF 1998 XML Text Which formats will survive ? - The most popular and widely spread !

ratio GS RGB BW Challenges of Storage bitmap image A4 at 300 dpi : 8.5 MB 25.5 MB 1.0 MB % % % lossless quality (recommended for multiple edits) TIFF .TIF 1:1 100 300 12 TIFF LZW .TIF 2:1 ~50 ~150 ~6 TIFF CCITT G4 .TIF 100:1 ~1 loss on quality (not recommended for multiple edits) JPEG .JPG 5-10:1 ~20-10 ~25-12 JPEG 2000 .JP2 6-12:1 ~16-8 ~20-10 Archive or reference files are NOT intended for multiple edits. Therefore all these formats are good for long term preservation.

Challenges Joining Digital and Paper Born Content PDF Image PDF Text PDF Image over Text (large size) (small size) PDF hidden text abcdefghijklm nopqrstuvwxyz abcdefghijklm nopqrstuvwxyz abcdefghijklm nopqrstuvwxyz Paper Born OCR abcdefghijklm nopqrstuvwxyz accurate subject to read & print search to original OCR mistakes copy & paste Word Text PDF Text (small size) (small size) Digital Born read & print search copy & paste abcdefghijklm nopqrstuvwxyz abcdefghijklm nopqrstuvwxyz

Challenges Selecting Source Material - Microfilm or Digital computer output microfilm COM digital media lifetime 20 y NEEDS MIGRATION abcdefghijklm nopqrstuvwxyz abcdefghijklm nopqrstuvwxyz microfilm printer analog media lifetime 500 y STORAGE possible image enhancement and restoration abcdefghijklm nopqrstuvwxyz abcdefghijklm nopqrstuvwxyz microfilm microfilm scanner microfilm camera book scanner 5-10% quality loss analog media lifetime 500 y REFERENCE no quality loss

Challenges to Bring Files Online