The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002 Edward A. Fox CS DLRL Virginia Tech, Blacksburg, VA, USA
Acknowledgements Sponsors: Mellon Foundation, SOLINET, NSF, DLF, CNI, UK’s JISC, Virginia’s CIT, … OAI Team: Steering Committee, Technical Committee, Developers, Data Providers, Service Providers Emory Team, Partners around Southeast VT Colleagues: Hussein Suleman, Rohit Kelapure, Ming Luo, Ryan Richardson, Marcos Goncalves, Priya Shivakumar, Baoping Zhang, students working on term projects, …
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
Open Archives Initiative OAI
Open Archives Initiative (OAI) high-energy physics (Ginsparg, 1991) CSTR + WATERS = NCSTRL (Lagoze,1994) xxx + NCSTRL = CoRR collaboration (1998) Universal Preprint Service protoproto, Oct , 1999, Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi Santa Fe Convention (see Feb 2000 D-Lib Magazine article) Archives -> Open Archives Support unique archive identifiers Implement metadata set(s) (DC, using XML) Implement OA harvesting protocol Register the archive Build tools, layer other services: linking, searching, …
OAi Philosophy Self-archiving = submission mechanism Long-term storage system = archive Open interface = harvesting mechanism Data provider + service provider Start with “gray literature” e-prints/pre-prints, reports, dissertations, …
Began as “archives of the world unite!” OAI
Open Archives (protoproto) ArXiv & Los Alamos National Lab CogPrints & U. Southampton NACA & NASA (reports) NCSTRL & Cornell U. NDLTD & Virginia Tech RePEc & U. Surrey Total of around 200K records
Original Open Archives Members American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
Now is a Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums
Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI
Aggregation through OAI Harvesting – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7
Aggregation through OAI Harvesting – By Organization TheologyEmoryGAUGAU FLUTKAmSoLibrary
Aggregation through OAI Harvesting – By Topic Confederate Constitution Civil WarHistoryOralSportsCultureAmSoDiaries
Approaches to Aggregation Build By Discipline Build By Institution
Types of Access Possible Build By Discipline Build By Institution Year Category Personage Author Genre Query …
OAI Repository Required: Protocol DO MDO
Metadata vs. Data Data refers to digital objects or digital representations of objects Metadata is information about the objects (e.g. title, author, etc.) OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects
Metadata: Complex to Simple MARC (>$50)Dublin Core (DC)
repository repositoryrepository OAI protocol harvesterharvester support data harvesting data items
identifiers oai-identifier = oai:archive-identifier:record-identifier Registered URI Scheme Archive Identifier: Registered within OAI Unique ID within archive: (syntax is archive- specific) example = oai:ncstrl:ncstrl.cornellcs/TR locally unique key for extracting a record from a repository
selective harvesting - datestamps repositoryrepository harvest within date range record
selective harvesting - sets repositoryrepository harvest within set S1 record S2
Summary: Protocol for Metadata Harvesting Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date (and Time) Ranges Resumption Tokens
Harvesting vs. Federation Competing approaches to interoperability Federation is when services are run remotely on remote data (e.g., federated searching) Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g., union catalogues) Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting OAI (currently) focuses on harvesting
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
Example 1: Union Collection of ETDs (Electronic Theses and Dissertations, for Networked Digital Library of Theses and Dissertations, NDLTD)
Example 1: Details
Example 2: NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”
Example 2: CITIDEL -> NSDL Computing and Information Technology Interactive Digital Education Library A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
Example 2: CITIDEL Distributed repository structure
Example 2: NSDL Collections (themes relevant to our projects) Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on pedagogy
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
Open Digital Libraries XOAI-PMH Dissertation work of Hussein Suleman (member of OAI technical committee) Extending the OAI protocol Supporting rapid development of DLs using networks of components Demonstrated with NDLTD, CSTC Described in Dec D-Lib Magazine article, and article scheduled for publication
Open Digital Libraries Components Running now XML-File (data provider from file system) Union, search, browse, recent, filter E-journal support system Class projects High performance multilingual search Recommender User rating Others discussed Classification/categorization and browsing
Component System Approach (Open) DL = Network of Extended OAs Local Archive Data Input Remote Archive Browse Metadata Repository SearchRecommend Resource Discovery User Interface OAI/ODL archive OAI/ODL protocol legend
Example Architecture (NDLTD) Humboldt Duisburg MIT Filter MIT Browse Union Catalog SearchRecent User Interface OAI/ODL archive OAI/ODL protocol legend Virginia Tech PhysNet CalTech Dresden
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
OAI Tools Related resources, e.g., XML, Unicode Submission / author support XML Schema Validator Servers and utilities, e.g., ARC, Kepler, EPrints Repository Explorer Interactive Browsing Testing of parameters Multiple views of data Multilingual support Automatic test suite
Author‘s tools
XSV Schema Validator
VT Tool: Repository Explorer The Repository Explorer is a tool for browsing and testing Open Archives, by Hussein Suleman You issue commands and see the results You also can perform a sequence of automatic tests
VT Tool: RE 1.3
VT Tool: Request, Response
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
What will central service look like? (1 of 2) Harvesting from local sites Rich content, drawn from all participating sites Data management Logging and reporting Repository/preservation/mirroring Adding/updating/deleting User interface and support for digital librarians and data providers
What will central service look like? (2 of 2) Adding value De-duping Categorization/classification -> browsing Normalization/standardization -> authority control Tools for communication/collaboration/annotation -> security/privacy User interface for both general users and scholars
What are needs at local sites? Increasing OAI expertise Connecting OAI with local systems Supporting standards, normalization Supporting continual updating Passing enhancements upstream
How can VT help? (1 of 2) Usability studies for central site Help develop consensus Help plan system architecture & services Education/training Provide and support tools/systems Help sites engage, become OAI compliant
How can VT help? (2 of 2) Standards MARC-XML ODL Suite Download and configure Use in packaged forms, or re-architected Support Connecting your system into OAI Help with OAI Tools
MARC XML-DTD XML Transport format for US-MARC records Standardized metadata exchange format for traditional library services joining OAI
Contents Early history Key concepts Examples ODL, XOAI OAI Tools Technical Plan Conclusion
Rethink your efforts in terms of providers of Data, Services Reduced work for data providers Tools available Don’t need to offer services Reduced work for service providers Others provide the data Can use tools and systems for OAI, XOAI Results More data becoming available To more people Supported by improved services MetaScholar can be a win-win-win project!
Links Open Archives Initiative OAI Metadata Harvesting Protocol Virginia Tech DLRL OAI Projects Repository Explorer NDLTD
More Links ARC Cross-Archive Search Service XML Schema Validator Dublin Core Metadata Initiative E-Prints DL-in-a-box XML Tools at W3C