ECDL Workshop “Extending Interoperability of Digital Libraries: Building on the Open Archives Initiative” Lisbon – September 21, 2000 Edward A. Fox CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected) F Sponsors: CNI, DLF, Dept. of Energy, DFG, NASA, NSF, … F VT Faculty/Staff: Anthony Atkins, … F VT Students: Fernando Das Neves, George Fillipini, Robert France, Marcos Goncalves, Hussein Suleman, …
Open Archives Initiative OAI
Program F 9-10Session 1 – Introduction F 10:30-11Break F 11-12:30Session 2 – Technical Details F 12:30-2Lunch F 2-3:30Session 3 – Discussion F 3:30-3:50Break F 3:50-4:20Session 4 – Presentations F 4:20-5Session 5 – Moving Forward
Program F 9-10Session 1 – Introduction –Introductory Remarks (Fox, Lagoze) – 15 min –Introductions from Participants (Fox, Lagoze) – 30 min –Historical Overview (Fox) – 45 min
Introductory Remarks - Fox F Welcome! F Thanks to conference organizers F Program/Logistics F Latest in series of meetings that have shaped OAI during its first year
Introductory Remarks - Lagoze F
Introductions from Participants - 1 F “Straw Polls” F Training: CS / LIS / Sciences / Humanities / ? F Work Now: CS / LIS / Sciences / Humanities / ? F Location: University / Industry / Gov. / Assn. / ? F OAI Connection: Run an “archive” or DL or collection / Manage data / Develop software / Standards / ?
Introductions from Participants - 2 F OAI Meeting Involvement: Santa Fe mtg / San Antonio mtg / Technical Committee / Cornell mtg / Steering Committee F OAI Trials: Opened an archive / Developed software for OAI / ? F OAI Project: Wrote proposal / Plan to write a proposal / Have internally funded project / Have externally funded project
Introductions from Participants F Short Statements (20 seconds per person) –Name (pronounced slowly, clearly) –Country –Affiliation (institution/organization)
Historical Overview - Fox F Meetings –Santa Fe – “archives of the world unite” F Philosophy F Repositories / Building on Black Boxes F Approaches to building repositories F VT view F Some proposals for funding F Development efforts
Open Archives Initiative (OAI) F high-energy physics (Ginsparg, 1991) F CSTR + WATERS = NCSTRL (Lagoze,1994) F xxx + NCSTRL = CoRR collaboration (1998) F Universal Preprint Service protoproto, Oct , 1999, Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi F Santa Fe Convention (see Feb. D-Lib Magazine article) F Follow-on mtgs: Antonio, (ECDL) F Archives -> Open Archives –Support unique archive identifiers –Implement Open Archives metadata set (DC, using XML) –Implement OA harvesting protocol (derived from Dienst protocol) –Register the archive F Build tools, layer other services: linking, searching, …
Open Archives (protoproto) F ArXiv & Los Alamos National Lab F CogPrints & U. Southampton F NACA & NASA (reports) F NCSTRL & Cornell U. F NDLTD & Virginia Tech F RePEc & U. Surrey F Total of around 200K records
Original Open Archives Members F American Physical Society F California Digital Library F Caltech F Coalition for Networked Info. F Cornell University F Harvard University F Library of Congress F Los Alamos Nat’l Lab F Mellon Foundation F NASA Langley Research Cntr F Old Dominion University F Stanford University F U. of Ghent F U. of Surrey F U. of Southampton F Vanderbilt University F Virginia Tech F Washington University
Open Archives Future – 1 st View F EconWPA (U. Washington) F e-biomed -> PubMed Central (NIH) F PubScience (DOE) F Clinical Medicine Netprints (+ other HighWire Press holdings ) F University ePub (California Digital Library) F All public e-prints (MIT) F Scholar’s Forum (Caltech) F Int’l: CERN, Germany, India, Mexico, … F Goal: millions of books/articles/reports / yr
OAi Philosophy F Self-archiving = submission mechanism F Long-term storage system = archive F Open interface = harvesting mechanism F Data provider + service provider F Start with “gray literature” –e-prints/pre-prints, reports, dissertations, …
Tiered Model of Interoperability Mediator services Metadata harvesting Document models
Repository of Digital Objects Repository Access Protocol handle Digital object terms and conditions
OAI – Repository Perspective Required: Protocol DO MDO
OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7
Black Box OAI-ETD Perspective ISTEC (Ibero America) PhysDisNSYSU (Taiwan) ADT (Australia) BN.PT (Portugal) (Francophone) VTDissert.Online (Germany) MITOhioLINKCBUC (Catalunya) NDC (Greece) NDC (Greece) CICU. Bergen (Norway) … …
Approaches to Open Archives Build By Discipline Build By Institution
Approaches to Open Archives Build By Discipline Build By Institution Author Category Interdisciplinary Year Language Query …
Mechanisms F Sharing –Join initiative, run software –Make metadata and archive available F Aggregating –By discipline –By institution –By genre F Automating –Workflow –Harvesting and providing services –Federated searching –Dynamic linking (e.g., with SFX)
VT View of the Open Archives Initiative (OAI) F Enable sharing of publication metadata and full-text by digital libraries F Standardize low-level mechanisms to share contents of libraries F Build higher-level user-centric and administrative services in meta-libraries F Install organizational mechanisms to support the technical processes
Virginia Tech Projects F MARC XML-DTD F Computer Science Teaching Centre (CSTC) F W3C Web Characterization Repository F OAI Repository Explorer F Networked Digital Library of Theses and Dissertations (NDLTD) F OAI-Campus (esp. multimedia)
MARC XML-DTD F XML Transport format for US-MARC records F Standardized metadata exchange format for traditional library services joining OAI
CS Teaching Center (CSTC) F Collection of reviewed online resources used to aid in teaching of Computer Science F Supports author submission and peer-review process for new ACM Journal of Educational Resources In Computing (JERIC) F Connected with NSDL (NSF 00-44) F
W3C Web Characterization Repository F Online database of metadata related to publications, tools and data sets dealing with Web characterization F Project of the Web Characterization Activity working group of the World-Wide-Web Consortium ( F
OAI Repository Explorer F Serves as a compliancy test F Allows browsing of open archives using only OAI protocol F Sends requests on behalf of user, parses and checks responses and displays browsable interface F Will detect most discrepancies in protocol F
OAI-Campus F Undergrad term project for Honors course on digital libraries F Aim is to have many OAs on campus F Emphasis will be on multimedia collections F Survey developed for campus: F /Surveys/OAiVT
Funding Success F NSF-DFG / VT-Oldenburg: F OAI research for next 3 years F 2 countries F 2 domains –Physics –Electronic theses and dissertations (ETDs) F Evolution of existing efforts to use OAI F Refinement of services as ontologies develop
Funding Failures F NSF ITR – Large F US Dept. of Education (FIPSE) – 5 sites –Training –Graduate students
Other Development Efforts F Cornell Software F Los Alamos Software F Southampton Software F ODU Software F Other Software F Registered Archives
Program F 9-10Session 1 – Introduction F 10:30-11Break F 11-12:30Session 2 – Technical Details F 12:30-2Lunch F 2-3:30Session 3 – Discussion F 3:30-3:50Break F 3:50-4:20Session 4 – Presentations F 4:20-5Session 5 – Moving Forward
Program F 11-12:30Session 2 – Technical Details –Expanding the Scope and New Technical Agreements (Lagoze) – 60 min –Framing the Discussion for the Afternoon (Fox) – 30 min
Expanding the Scope and New Technical Agreements - Lagoze F
Framing the Discussion for the Afternoon – Fox - 1 F Divide into groups soon for lunch F Sit and discuss in groups during lunch F Groups report back in afternoon –Present comments orally –Lead discussion of those comments F Groups submit report later through
Framing the Discussion for the Afternoon – Fox - 2 F Possible Groups: F Political agendas and their unfolding –“Gray literature”, Courseware/NSDL, … F Guiding principles for technical agenda –What is an archive? –What is best terminology? F Implementation plans for OAI core
Framing the Discussion for the Afternoon – Fox - 3 F Possible Groups (cont’d): F Requirements for OAI-related services / design of component-based DL F Implementation plans for OAI-related services F Linking OAI with other initiatives: science data, …
Program F 9-10Session 1 – Introduction F 10:30-11Break F 11-12:30Session 2 – Technical Details F 12:30-2Lunch F 2-3:30Session 3 – Discussion F 3:30-3:50Break F 3:50-4:20Session 4 – Presentations F 4:20-5Session 5 – Moving Forward
Program F 2-3:30Session 3 – Discussion –Funding Agencies/Sponsors – 30 min –General Discussion (Fox, Lagoze) – 60 min
Funding Agencies / Sponsors F
General Discussion F Reactions to OAI agreements F Applications of OAI to communities represented by attendees
Program F 9-10Session 1 – Introduction F 10:30-11Break F 11-12:30Session 2 – Technical Details F 12:30-2Lunch F 2-3:30Session 3 – Discussion F 3:30-3:50Break F 3:50-4:20Session 4 – Presentations F 4:20-5Session 5 – Moving Forward
Program F 3:50-4:20Session 4 – Presentations –Constantino Thanos (IEI-CNR, Italy) –Robert Tansley (U. Southampton, UK) –Eberhard Hilf (U. Oldenburg, Germany)
Program F 9-10Session 1 – Introduction F 10:30-11Break F 11-12:30Session 2 – Technical Details F 12:30-2Lunch F 2-3:30Session 3 – Discussion F 3:30-3:50Break F 3:50-4:20Session 4 – Presentations F 4:20-5Session 5 – Moving Forward
Program F 4:20-5Session 5 – Moving Forward (Fox, Lagoze) – 40 min –Plans for implementation –Future research agendas –Community building: listservs, …
VT - 1 F General purpose tools: F Hussein’s PERL implementation F Marcos’ Java implementation F OAI Repository Explorer – Version 2
VT - 2 F NSDL / XXDL ? F Bill Graves, Collegis/Eduprise F IMS F UNC Wilmington F Chemistry, CS, Math, … F CSTC already involved in OAI
VT - 3 F MARIAN: F Evolved from CODER (~1987) F C/C++ version: SIGIR’93 F Research and production DL system F Harvest/Gateway: Dienst, “Harvest”, OAI, Z OAI to Greenstone, Phronesis
OAI Protocol DO MDO Sets by subjectSets by origin MDO MARIAN Dienst VTLS HarvestZ39.50OAI - 1OAI - 2 …
Program F 4:20-5Session 5 – Moving Forward (Fox, Lagoze) – 40 min –Plans for implementation –Future research agendas –Community building F Closing –Thank you! –Future meetings: JCDL’2001, ECDL’2001 –Online discussions