Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives MetaDB Development at Lafayette College Haruki Yamaguchi Class of 2011 Department of Computer Science CS320: December 1, 2010
About this talk 1.WHY: digital collection management – The shift from analog to digital – Preserving our digital heritage – Fast-moving field 2.WHAT: usage overview – Digitization workflow – Why this is important – Brief version history 3.HOW: design overview – Development environment – Application design – Database interaction 4.END: wrap-up – Demo (time permitting) Talk Outline
About this talk Ever-Expanding Digital Collections The Lafayette 140 years online, ~43,000 searchable pages Lafayette Digital Repository Open Access to faculty & College publications East Asia Image Collection ~3,000 images from books, photographs, negatives, slides
About this talk Analog versus digital – Can easily find 1000-year-old book – Where is the page we scanned last week? – How to manage digital material in the future? – Are we headed toward a digital dark age? – Organization, storage, and retrieval of information is a big field trying to keep up with fast-changing technology Preserving Our Digital Heritage
About this talk 1.Standardization is first step toward preservation – Automation prevents human error – Allows us isolate specific content types – Ubiquitous formats aid standards aid migration 2.Strengthens digital collection building efforts – Allows me to work faster and smarter – Subject exports create stronger collections MetaDB: Return on Investment
File Input High Resolution Master Images Metadata Input Descriptive Administrative Technical Collection Output CSV & TSV data Derivative Images Workflow Managment Digital Asset Management System (CONTENTdm, Dspace, Drupal)
Descriptive MD (subject specialist) Title Description Subjects […] Administrative MD (librarian) Collection Publisher Access Rights […] Technical MD (automated) File Format File Size Checksum […] MetaDB Allows Us to Automate & Distribute Collection of Metadata Asset Management System File Input High Resolution Master Images Metadata Input Descriptive Administrative Technical Collection Output CSV & TSV data Derivative Images
MetaDB Allows Us to Automate & Distribute Collection of Metadata Asset Management System File Input High Resolution Master Images Metadata Input Descriptive Administrative Technical Collection Output CSV & TSV data Derivative Images CSV, TSV data Dublin Core metadata standard Common file format outputs Derivative Images Created from multiple image formats Custom image sizes Pan/zoom interface Banding/branding
Completed MetaDB Collection TSV JPG CONTENTdm
Better Digital Collections Increased Visibility New Acquisition Greater Knowledge Improve Workflow Why this Work is Important
MetaDB Version History Version 0 Microsoft Access Database shared over local Novell network Version 1 MySQL database with simple web-based HTML interface Version 2 MySQL database with PHP / YUI JavaScript interface Version 3 Postgres database with Java / jQuery JavaScript interface
Major Features in Latest Version Version 3.1 Table view editing Controlled vocabularies Drag/drop field ordering Technical metadata extraction Automatic derivative creation Image banding/branding Web-based user management Vastly improved user interface
Development Environment MetaDB0 (Production) MetaDB1 (Development) svn.lafayette.edu New releases Feedback Bugs Test Patches/Features
Application Design MetaDB Service API AJAX Database Servlets Images ImageMagick Back-End Front-End
Update Database Data Whitelist Cross- checking Authentication Log Feedback
Retrieve from Database Data Authenticate Request Gather data Wrap in objects Unpack into JSON Log Project: cpw-nofuko Item: 1 Type: Descriptive Metadata Session ID:AHJ7HA… Concurrency Check
Populate User Interface Data Widgets Templates
Facts & Figures Development : January Present Size: 110+ Java classes, ~30,000 lines of code Database: ~120,000+ rows of data Images: ~200GB disk space Subversion: Revision 3864
Eric LuhrsHaruki Yamaguchi What does this mean for you?