September 1 st 2010 Igelu Ghent 2010 1 The on-the-fly conversion circus Matthias Gross (Bavarian State Library)

Slides:



Advertisements
Similar presentations
Final Project Instructor: Nguyen Anh Tu Students: Tran Tien Tai Tran Tien Tai Tran Ngoc Mai Tran Ngoc Mai Tu Kim Tuan Tu Kim Tuan Nguyen Ngoc Phuong Nguyen.
Advertisements

3rd Annual Plex/2E Worldwide Users Conference Page based on Title Slide from Slide Layout palette. Design is cacorp Title text for Title or Divider.
Copyright © 2003 Pearson Education, Inc. Slide 6-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 4-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Permanent Hosting, Archiving and Indexing of Digital Resources and Assets Raman Ganguly Computer Center University of Vienna.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
National Diet Library Digital Archive Portal - PORTA - Gateway to digital information in Japan April 3, 2008 Hideki Takeuchi Planning.
Deconstructing Cataloging A Web Services Approach to Bibliographic Control Thomas Hickey.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Copyright CompSci Resources LLC Web-Based XBRL Products from CompSci Resources LLC Virginia, USA. Presentation by: Colm Ó hÁonghusa.
Electronic Resources in the EUI Library
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Pure Silver Reusing and Repurposing Bibliographic Data in a Current Research Information System and Institutional Repository 15 September.
C. Multimedia Production and Web Site Development
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
© Prentice Hall, 2005 Business In Action 3eChapter Developing Distribution and Promotional Strategies.
What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
1 2 In a computer system, a file is a collection of information with a single name, such as addresses.doc, or filebackup.ppt, or ftwr.exe, or guidebook.xls.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
A Standardized DigiTool Ingest Approach to Internet Archive Digitized Books Joseph Shubitowski IGeLU 2008, September 9, 2008.
1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.
Freight Management System
Chapter 4 Memory Management Basic memory management Swapping
ABC Technology Project
Tables Tables provide a means of organising the layout of data
1 Contract Inactivation & Replacement Fly-in Action ( Continue to Page Down/Click on each page…) Electronic Document Access (EDA)
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Symantec Education Skills Assessment SESA 3.0 Feature Showcase
Microsoft Office Illustrated Fundamentals Unit C: Getting Started with Unit C: Getting Started with Microsoft Office 2010 Microsoft Office 2010.
VOORBLAD.
CAR Training Module PRODUCT REGISTRATION and MANAGEMENT Module 2 - Register a New Document - Without Alternate Formats (Run as a PowerPoint show)
Squares and Square Root WALK. Solve each problem REVIEW:
Executional Architecture
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Addition 1’s to 20.
25 seconds left…...
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
A digital audio and video market on the web. Participants: Admin, Sellers, Buyers, Archivists, Publishers Content: multilingual, customer- defined and.
EDLocal kick off meeting June 26-27, María Luisa Martínez-Conde Subdirectorate General for Library Co-ordination Digital Libraries in Spain: Policies.
Week 1.
We will resume in: 25 Minutes.
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
How Cells Obtain Energy from Food
© ABB University - 1 Revision C E x t e n d e d A u t o m a t i o n S y s t e m x A Chapter 20 Import and Export Course T314.
Excel Lesson 17 Importing and Exporting Data Microsoft Office 2010 Advanced Cable / Morrison 1.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Kypriana: Organising the metadata harvesting of Cyprus digital libraries and collections in the frame of EuropeanaLocal and Athena Filippos Tsimpoglou,
Sai Deng, Metadata Catalog Librarian, Wichita State University Libraries Tse-Min Wang, Graduate Student in CS, Wichita State University Digital Imaging.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
September 8th, 2013 IGeLU Berlin IT staff and librarians pull together: Collaborative development of a new METS viewer Matthias Groß, Bavarian State.
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Permanent Hosting, Archiving and Indexing of Digital Resources and Assets Markus Höckner Computer Center University of Vienna.
ITL conference 2003 Putting Your Content on a Diet Using rich online media without download woes.
Introduction of Wget. Wget Wget is a package for retrieving files using HTTP and FTP, the most widely-used Internet protocols. Wget is non-interactive,
Implementing institutional Content Repositories with MyCoRe and MILESS
Presentation transcript:

September 1 st 2010 Igelu Ghent The on-the-fly conversion circus Matthias Gross (Bavarian State Library)

September 1 st 2010 Igelu Ghent Introduction Often the original version of an object is not what the user wants … or what we want the user to get Two basic strategies: Store additional versions Offer them virtually: create them on the fly

September 1 st 2010 Igelu Ghent whats common… Common aspect: something has to be converted a)Content e.g. tiff / PDF jpg (for viewing) tiff / jpg PDF (for download) b)Structural metadata (METS) c)Bibliographic metadata e.g. MARC21 DC Common side-effect of conversion: loss of information, sometimes of functionality. In most cases affordable (hi-resolution scans), in some even wanted: -for legal reasons -reduce file size to speed up transfer

September 1 st 2010 Igelu Ghent … and whats different Benefits of on-the-fly conversion: Reduce storage costs Keep data structure simple Reduce migration costs when specifications of desired target formats change Price: Server load Runtime for conversion <= waiting time for end user e.g. tiff jpg: waiting time is usually too long to be acceptable Some formats are built to be delivered primarily via on-the- fly conversion: j2k ( jpg)

September 1 st 2010 Igelu Ghent normal presentation

September 1 st 2010 Igelu Ghent First example: serving the DFG viewer In this example, we have single page PDFs in a METS structure. Let us look at the content first. To serve the so-called DFG viewer (which projects funded by the Deutsche Forschungsgemeinschaft, DFG, are obliged to), three different JPEG versions of each page with a given resolution are needed This would not be easy to implement within DigiTool: how to encode the resolution of a VIEW manifestation so that it can be addressed from outside the system? (Basically, all VIEW manifestations are equal, with an optional VIEW MAIN) The conversion is implemented via a special viewer which calls Ghostscript/Imagemagick. Via simple caching the waiting time can be shortened when the user returns to a previously accessed page.

September 1 st 2010 Igelu Ghent presentation in the DFG viewer

September 1 st 2010 Igelu Ghent DFG viewer, continued Now let us look at the structural metadata (METS). The DFG viewer expects a different type of METS : the FileSec has to be changed significantly METS (digital entity-XML) can be quite large (>3 MB for 1700 pages) Conversion lasts up to 7 seconds on-the-fly conversion not reasonable, so we put the converted METS in the file system; access via PID (how would you store additional METS files within the DigiTool data model?)

September 1 st 2010 Igelu Ghent Next example: PDF download Let us look at the content again. We want to offer a "PDF download option. An additional manifestation (single PDF file for an IE) would need a significant amount of extra storage This is even true for caching these PDF files; besides that, each read/write-operation for such big files is expensive best performance when using the streams as they are in the repository, embedding them in PDF and give that as HTTP response further optimization: use fast PERL library and access Oracle DB directly For a 350 page book, 1 minute can be reached. Usually the internet connection of the end user is the bottleneck.

September 1 st 2010 Igelu Ghent PDF download dialogue

September 1 st 2010 Igelu Ghent PDF download result

September 1 st 2010 Igelu Ghent Next example: EuropeanaTravel

September 1 st 2010 Igelu Ghent EuropeanaTravel The overall objective of this project is to digitise content on the theme of travel and tourism to be made accessible via Europeana, the European digital library, museum and archive. Launched in November 2008, Europeana provides integrated access to digital treasures from museums, archives, audio-visual archives and libraries of Europe. Europeana EuropeanaTravel has officially started on 1st May 2009 and it will last for two years. Contribution of the University Library of Regensburg via DigiTool: 400 books 200maps 600 illustrations from books (+ detailed metadata)

September 1 st 2010 Igelu Ghent OAI Problem with transporting the metadata to Europeana via OAI: Europeana expects extended DC with special europeana: elements Yet another replica set? How often should we replicate information with just little differences in format? Replica sets demand 1.) database space 2.) significant java resources during replication

September 1 st 2010 Igelu Ghent Solution: build a set that contains all the information which is needed for different demands (granular bibliographic data, delivery URL, thumbnail URL, file type, object type, …) And then…

September 1 st 2010 Igelu Ghent filter OAI-XML through stylesheet on the fly!

September 1 st 2010 Igelu Ghent OAI for Europeana

September 1 st 2010 Igelu Ghent Travel … Facet for EuropeanaTravel Search for Reise (which means travel in German)

September 1 st 2010 Igelu Ghent … and get … link to DigiTool delivery

September 1 st 2010 Igelu Ghent … back home

September 1 st 2010 Igelu Ghent further application of this technique just 1 OAI record per object OAI for harvester 1 stylesheet 1 OAI for harvester 2 stylesheet 2 report list of objects HTML stylesheet 3 report list of objects Excel stylesheet 4 HTML page for starting services stylesheet 5

September 1 st 2010 Igelu Ghent example: thumbnail selec- tion service (post-ingest)

September 1 st 2010 Igelu Ghent Current / planned activities (1) secure PDF display option which prohibits download and printout for copyright material (at least makes that very wearisome, as you cant prevent screen snapshots) assume multi-page PDF as storage format normal PDF viewer (client-side plugin) has too many options which cant be disabled reliably We look at: -multi-page jpg (conversion via Ghostscript) -flash approach (browser plugin needed) -applet approach Observation: the images look somehow not so nice without using the Adobe stuff

September 1 st 2010 Igelu Ghent Current / planned activities (2) We will evaluate conversion to ePUB as alternative download format, possibly on the fly. Who has experience already?

September 1 st 2010 Igelu Ghent THE END … Thank you very much for your patience attention! Special credits to Petra Schröder for most of the work, Joe Getty for thumbnail inspiration and to Wikimedia Commons for some pictures!