University of Washington Digital Library Project Geri Bunker The Fifth Dublin Core Metadata Workshop October 7, 1997
Perspective Large, public research university: –Multiple branches –Growing demand for access, distance education –Focus on collaboration, with government, business, academia University Libraries –20 branches; 4 major units –Central technical and system services; rest distributed
UW Priorities: User-centered focus in design of services and products Web access for resources and services –strategic, “not only or last” Digital Library –commercially acquired full text –locally digitized multimedia Process Improvement –for redeployment of assets and resources
Convergence of developments University re-focus on outreach and access –especially through use of technology (Web, distance education) Appearance of great multimedia archiving software-- CONTENT..and success with an Intel grant for hardware faculty and library collections abound--students need digital access to ever more course information International consensus-building around metadata-- The Dublin Core
Avoiding “the 21st Century nightmare” The Digital Orphan metaphor created stark terror and focused us on need for inexpensive description Opportunity to test Dublin Core for images with new software tool
Content: a practical, scalable, high-performance multimedia archive to solve the need for small, fast startup systems for images …allowing them to grow to millions of images with fast, accurate retrieval standards-based and extensible, flexible facilitates collaboration among content providers, librarians, curators, archivists
Client-server; “federated heterogeneous system” Server on Windows NT, AIX and HP-UX Clients on Windows 95 (Search, Acquire and Administer) and Java (Search) Set of API functions--http/cgi-based Acquisition and DB admin for distribution of tasks
Workbox Keep track of images and videos Store only the link Items can be categorized HTML document can be built based on the workbox to share results with colleagues
Dictionary and Thesaurus Dictionary –Contains all valid search words –Dictionary for every field Hierarchical Thesaurus –Organize related words –Group in hierarchy –Simple text file –Indentation (tabbing) to indicate hierarchy –Every field can have an optional thesaurus The thesaurus
Current projects All retrospective conversion to DC –Collection of historical photographs –Collection of teaching slides All happening in the context of Web integration for all resources All want their own labels displaying –will be mapped to DC in the server
What problems have we encountered in our implementation? –Need understanding of key elements –Need some qualification method for more precision narrowing of hit set –Need to distinguish DC names from administrative metadata
Elements causing the most confusion Source (visual resources have many levels) Date (of what?) Coverage (only useful if heavily qualified) Relation (is this the key to “containers”?)
…even some elements thought to be “safe”... Some elements perhaps thought to be clear are not--depends upon reason for digitizing E.g., in a set of instructional slides of architectural images, the name or address of a building shown in a photograph may be considered the “subject” of the photo. Here we map to “title”; possibly behind the scenes...
Administration Default metadata as specified by Dublin Core, but configurable by CONTENT administrator’s tool All CONTENT server administration can be performed via a Web interface
Database Configuration File
Text Description
Features: implementation can influence success of metadata Chaos of the Web confounds instruction and research -- faculty want a system built for them Template for Dublin Core description –locally configurable to the field/tag level –with compliant mapping behind the scenes to DC labels Vocabulary controls optional Work-box
Content tool plays to strengths of participants Acquisition module: start by scanning and build a skeletal record from template DB Admin: oversight and maintenance can be done remotely, asynchronously Metadata can be continuously enhanced through workbox and uploaded remotely.
Promise of the Dublin Core What holds the most promise for UW Digital Library Project? –Core element set, an intersection of minimal but critical discovery elements –Turn websites into repositories with reasonbly accurate indexing. DC template focuses paraprofessional, student, faculty cataloging
Future work includes Code to the standards; Collaborate for synergy Train, document; Test for usability Enhance –Adding thesauri and locally developed vocabularies –Planning additional reformatting for open Web access
Contact Development lab: –Greg Zick –Lawrence Yapp –Craig Yamashita U.W. Library Project: –Geri Bunker, UW Digital Library Coordinator/ Interim Associate Director of Libraries for Technical Services University of Washington