UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst UCLA Library Hardware and Software Architecture Project Architecture What is the Open Archives Initiative? The OAI Sheet Music Harvester
UCLA Digital Library UCLA Digital Library Hardware Architecture
UCLA Digital Library - Java 2 Enterprise Edition (J2EE) (v ) - Oracle 8i (9i Fall 2002) - Oracle Intermedia Tool Kit - JRun Application Server (v3.1) - XML, XSLT - MS Access – for Metadata collection - Microsoft NT4 and Win 2000 UCLA Digital Library Software Architecture
UCLA Digital Library Digital Library Projects Project TypeProductionDevelopment Text Image53 Audio01- Planned WQ 03 Video0None Planned All projects share similar design patterns Web based applications to search and present digital content and metadata.
UCLA Digital Library
Combining Text (XML) and Format (XSLT) to Create HTML
UCLA Digital Library Archive of Popular American Music (APAM) APAM contains ~ 450,000 pieces of Sheet Music Metadata collected in UCLA Core. No pre-existing Metadata Content is digitized in house (about 850 sheets so far) Sheet music hosted as a PDF file. All Covers and PDF’s are hosted from Oracle DB as Bfiles Dynamic sizing of Cover images through Oracle InterMedia Tools. In production, last updated March The basis for the OAI Sheet Music Harvester Project
UCLA Digital Library Open Archives Initiative Protocol for Metadata Harvesting (OAI Version 2.0) “The OAI protocol facilitates metadata harvesting”
UCLA Digital Library OAI Requests and Responses OAI Requests and Responses uses HTTP - “just like the web” OAI Requests Use either the HTTP GET or POST methods. OAI Responses Formatted as HTTP Responses. Every OAI Response is valid XML
UCLA Digital Library Important OAI “Verbs” The meat of the OAI is six “verbs” issued in a request to harvest metadata. 1) GetRecord - to retrieve an individual record 2) Identify - to retrieve information about a repository 3) ListIdentifiers - to retrieve the identifiers of records that can be harvested from a repository. 4) ListMetadataFormats - to retrieve the metadata formats available from a repository. DC is the minimum requirement. 5) ListRecords - to harvest records from a repository. 6) ListSets - to retrieve the set structure in a repository.
UCLA Digital Library Important OAI “Nouns” Repository - a server to which OAI protocol requests can be submitted. The repository outputs metadata in the form of a record. Record - an XML-encoded byte stream that is returned by a repository in response to an OAI request for metadata from an item in that repository. At a minimum, repositories must be able to return records with metadata expressed in unqualified Dublin Core. Set - A construct for grouping items in a repository for the purpose of selective harvesting of records.
UCLA Digital Library
Sheet Music OAI Data Providers UCLA – Currently online (Java) Library of Congress - Currently online John Hopkins University– any day now Indiana University - September 2002 Duke – within the next 12 months Brown – within the next 12 months Each participating institution is responsible for creating their own OAI-compliant sheet music repository. Major hurdles to becoming a Data Provider: -Programming -Data Mapping
UCLA Digital Library High Level Design of OAI Sheet Music Service Provider
UCLA Digital Library
OAI Sheet Music Project Development Goals and Challenges Leveraging UIUC Harvester code Challenge of reverse engineering and extending code Being flexible - combine relational and XML text indexing Performance vs. Functionality: an on-going challenge Testing of 0.1 Service Provider – August 2002 Debut of the pilot - late Fall 2002
UCLA Digital Library Hypothetical User Interface for Sheet Music Service Provider The biggest challenge is to create a Service Provider that extends the usable services offered to users. Conceptualize -> Design -> Implement
UCLA Digital Library Summary John Ober’s Charge: “Discuss architecture and standards used in projects and the technical challenges yet to be faced.” Challenges: Metadata collection – Automated vs. Manual Meeting infrastructure storage needs: Online – Nearline - Backup Personal Challenges and Thoughts: Many challenges are not technical Developing a personal filter on information Risk assessment: when is the right time to adopt a new technology Surface knowledge vs. Deep understanding. Islands of knowledge No stable resource body of knowledge to turn to for advise or help