U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA ISTEC / NSF Ibero-American Digital Library Joint Project Development Symposium Campinas, Brazil - March 20, 2003
Acknowledgements ODU: K. Maly, M. Zubair, J. Bollen, X. Liu LANL: R. Luce, X. Liu NASA: G. Roncaglia, J. Rocker MAGiC (UK): Paul Needham
Outline Review of data provider / service provider model –including “aggregators” Role of registration for repositories NASA projects OSTI demo project Technical Report Interchange (TRI) –NASA, DOE, DOD
Disclaimer: Scientific and Technical Information (STI) This talk will cover US Government focused / sponsored STI only This talk will not cover American Memory –a cultural history project from the Library of Congress (LoC) –the LoC played a significant role in the definition and early adoption of the OAI-PMH
Acronym Review NASADepartment of EnergyDepartment of Defense CASI (Center for AeroSpace Information) OSTI (Office of Scientific and Technical Information) DTIC (Defense Technical Information Center) LaRC = Langley Research Center LANL = Los Alamos National Laboratory Sandia = Sandia National Laboratory AFRL = Air Force Research Laboratory
Data Providers / Service Providers data providers (repositories) service providers (harvesters)
Aggregators data providers (repositories) service providers (harvesters) aggregator aggregators allow for: scalability for OAI-PMH load balancing community building discovery
Aggregators Frequently interchangeable terms: –aggregators: likely to be community / institutionally focused –caches: stores a copy, less likely to be community- oriented –proxies: less likely to store a copy, may gateway between OAI-PMH and other protocols Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 To learn more about aggregators, caches & proxies: – –
Example Aggregators Arc - –first described “hierarchical harvesting” in D- Lib Magazine, 7(4) Celestial - –among other services, it provides a history of harvests (successful vs. errors)
OAI-PMH 2.0 Registration Data Providers: Service Providers: 75 repositories registered ??? unregistered repositories unregistered because: testing / development not for public harvesting public, but “low-profile” never got around to it… ??? SciELO (> 20k records?) DP:SP ~= 5:1
Registration is Nice… …But Not Required OAI-PMH is (becoming) the “http” for digital libraries –there is no central registry of http servers remember the NCSA “What’s New” page? (ca. 1994) There will never be “registration support” in OAI- PMH –registries are a type of service provider, built on top of OAI-PMH –registration will be an integral part of community building –friends…
A light weight, optional, DP-centric method to communicate the existence of “others”
… harvester Identify NASA example
Langley Technical Report Server publicly available –began as an anonymous ftp server in 1992; http access in 1993 –model for other technical report servers at other NASA centers details in NASA TM mostly LaTeX, MS Word, other systems –some scanned reports
NACA Technical Report Server publicly available –began in 1996 –details in NASA TM scanned reports from –NACA = predecessor to NASA contents mirrored with the MaGIC project –a UK-based grey-literature preservation project –OAI-PMH used to mirror contents
NACA Report 1345 as seen through its native DL
NACA Report 1345 as seen through MAGiC
NACA Report 1345 as seen through its Scirus (Elsevier)
NACA Report 1345 as seen through my.OAI (FS Consulting)
NTRS OAI Architecture user... search for “cfd applications” local copy of metadata metadata harvested offline, through OAI interface each node independently maintained individual nodes can still support direct user interaction NTRS LTRSATRSGTRS CASITRS all searching, browsing, etc. performed on the metadata here content (reports) remain archived at the local sites
NASA Technical Report Server (nearly) publicly available replacement for the current distributed searching version of NTRS –MySQL –Va Tech harvester –modified “bucket” –details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (March 2003) a service provider & aggregator –same OAI baseURL as used for interactive searching
NASA Technical Report Server advanced, fielded search explicit query routing –10 NASA repositories –4 non-NASA repositories turned “off” by default
non-NASA repositories > 0.5M records
NASA DLs in the Larger STI Realm NTRS LTRSATRS CASITRS … DOE DOD UniversitiesPublishers... International NTRS could also be a data provider from the point of view of other DLs; allowing the harvesting of NASA report metadata. NTRS could also harvest metadata from other DLs, and provide access to non-NASA content. We hope to influence the direction of the science.gov effort to use OAI-PMH this could be a fully connected graph
OSTI Energy Citations Database OAI-PMH support just recently added (Feb 2003) –not yet officially announced –20k records, 8k full- text other OSTI collections planned
Technical Report Interchange Goal: share technical reports between 4 US government labs without creating new digital libraries for users to learn! –NASA Langley Research Center –Air Force Research Laboratory –Los Alamos National Laboratory (DOE) –Sandia National Laboratory (DOE) Solution: use cooperating OAI-PMH caches at each site to –export local contents –ingest remote contents
TRI Production System - Status LaRC TRI System LANL TRI System Sandia TRI System AFRL TRI System ODU TRI System (Listener) Records coming in from other TRI systems Records going out to other TRI systems Slide from M. Zubair, ODU Proposed In Production
Mappings in TRI Details in Liu, et al. ECDL 2002; the above table also taken from the same paper
A Single TRI Module Slide from M. Zubair, ODU
The Future: Community Building Ultimately, protocols and metadata formats are not what makes a difference Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML) The best current example: The Open Language Archives Community – OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends
STI Communities Government produced/sponsored STI Academia –self-archiving vs. institutional archives Commercial publishers –e.g. BioMed Central