Download presentation
Presentation is loading. Please wait.
Published byMoses Armstrong Modified over 9 years ago
1
Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Congreso Internacional de Información en Salud Lima, Peru May 28, 2004
2
Acknowledgements ODU: K. Maly, M. Zubair, J. Bollen LANL: R. Luce, X. Liu NASA: G. Roncaglia, J. Rocker, C. Mackey Cornell: C. Lagoze, S. Warner MAGiC (UK): Paul Needham and, of course, Herbert Van de Sompel (LANL) –the OpenURL slides are nicked from his presentations
3
Outline A bit of history Core technologies & Issues –OAI-PMH deep web –OpenURL –Handles / DOIs –Object Models Example implementations Download and go… covered only briefly
4
OAI-PMH
5
Background I met Herbert Van de Sompel in April 1999... –we spoke of a demonstration project he had in mind and had received sponsorship from Paul Ginsparg and Rick Luce –We wanted to demonstrate a multi-disciplinary DL that leveraged the large number of high quality, yet often isolated, tech report servers, e-print servers, etc. most digital libraries (DLs) had grown up along single disciplines or institutions –little to no interoperability; isolated DL “gardens” –Universal Preprint Service Demonstrated at Santa Fe NM, October 21-22, 1999 –http://web.archive.org/web/*/http://ups.cs.odu.edu/ D-Lib Magazine, 6(2) 2000 (2 articles) –http://www.dlib.org/dlib/february00/02contents.html –UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/
6
Result… OAI The OAI was the result of the demonstration and discussion during the Santa Fe meeting –OAI = a bunch of people, a religion, a cult, etc. –OAI Protocol For Metadata Harvesting (OAI-PMH) = the protocol created and maintained by the OAI Initial focus was on federating collections of scholarly e-print materials… …however, interest grew and the scope and application of OAI- PMH expanded to become a generic bulk metadata transport protocol Note: –OAI-PMH is only about metadata -- not full text! but what is metadata vs. full-text? –OAI is neutral with respect to the nature of the metadata or the resources the metadata describes read: commercial publishers have an interest in OAI-PMH too...
7
OAI-PMH Mechanics Request is encoded in http Response is encoded in XML XML Schema for the responses are defined in the OAI-PMH document
8
Overview of OAI-PMH Verbs VerbFunction Identifydescription of archive ListMetadataFormatsmetadata formats supported by archive ListSetssets defined by archive ListIdentifiersOAI unique ids contained in archive ListRecordslisting of N records GetRecordlisting of a single record archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)
9
resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp set-membership is item-level property OAI-PMH Data Model
10
Data Providers / Service Providers data providers (repositories) service providers (harvesters)
11
Aggregators data providers (repositories) service providers (harvesters) aggregator aggregators allow for: scalability for OAI-PMH load balancing community building discovery
12
Aggregators Frequently interchangeable terms: –aggregators: likely to be community / institutionally focused –caches: stores a copy, less likely to be community-oriented –proxies: less likely to store a copy, may gateway between OAI- PMH and other protocols Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 To learn more about aggregators, caches & proxies: –http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm –http://www.cs.odu.edu/~mln/jcdl03/
13
Example Aggregators Arc - http://arc.cs.odu.edu/ –first described “hierarchical harvesting” in D- Lib Magazine, 7(4) 2001 http://www.dlib.org/dlib/april01/liu/04liu.html Celestial - http://celestial.eprints.org/ –among other services, it provides a history of harvests (successful vs. errors) http://celestial.eprints.org/cgi-bin/status
14
OAI-PMH 2.0 Registration Data Providers: http://www.openarchives.org/Register/BrowseSites.pl Service Providers: http://www.openarchives.org/service/listproviders.html 150+ repositories registered ??? unregistered repositories unregistered because: testing / development not for public harvesting public, but “low-profile” never got around to it… ??? DP:SP ~= 5:1
15
Registration is Nice… …But Not Required OAI-PMH is (becoming) the “http” for digital libraries –there is no central registry of http servers remember the NCSA “What’s New” page? (ca. 1994) There will never be “registration support” in OAI-PMH –registries are a type of service provider, built on top of OAI- PMH –registration will be an integral part of community building –friends…
16
… http://techreports.larc.nasa.gov/ltrs/oai2.0/ http://naca.larc.nasa.gov/oai2.0/ http://ntrs.nasa.gov/oai2.0/ http://ston.jsc.nasa.gov/collections/TRS/oai/ http://horus.riacs.edu/perl/oai/ harvester Identify NASA example
17
NACA Technical Report Server publicly available –began in 1996 –details in NASA TM-1999- 209127 scanned reports from 1917-1958 –NACA = predecessor to NASA contents mirrored with the MaGIC project –a UK-based grey- literature preservation project –OAI-PMH used to mirror contents http://naca.larc.nasa.gov/ http://naca.larc.nasa.gov/oai2.0/
18
NACA Report 1345 as seen through its native DL http://naca.larc.nasa.gov/
19
NACA Report 1345 as seen through MAGiC http://www.magic.ac.uk/
20
NACA Report 1345 as seen through its Scirus (Elsevier) http://www.scirus.com/
21
NACA Report 1345 as seen through my.OAI (FS Consulting) http://www.myoai.com/
22
NASA Technical Report Server replacement for the previous distributed searching version of NTRS –MySQL –Va Tech harvester –modified “bucket” –details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (March 2003) a service provider & aggregator –same OAI baseURL as used for interactive searching http://ntrs.nasa.gov/
23
NASA Technical Report Server advanced, fielded search explicit query routing –12 NASA repositories –4 non-NASA repositories turned “off” by default >600k abstracts; >300k full-text
24
Service Providers It is clear that SPs are proliferating, despite (because of?) the inherent bias toward DPs in the protocol –easy to be a DP -> many DPs -> SPs eventually emerge –hard to be a DP -> SPs starve –currently 5x DPs more than SPs SPs are beginning to offer increasingly sophisticated services –competitive market originally envisioned for SPs is emerging
25
Community Building Colegio America Colegio Universitario Andino Pontificia Universidad Catolica del Peru Universidad Nacional Jorge Basadre Grohmann Universidad Nacional Mayor de San Marcos Universidad Nacional de Trujillo Universidad Peruana de Ciencias Apicadads Universidad de Lima Universidad del Pacifico Universidad Nacional Federico Villarreal www.ndltd.org
26
OAI-PMH & The Deep Web
27
Exposing Repository Contents DP9: Webcrawler access to OAI-PMH repositories http://dlib.cs.odu.edu/dp9/ JCDL 02 http://www.cs.odu.edu/~liu_x/dp9/dp9.pdfhttp://www.cs.odu.edu/~liu_x/dp9/dp9.pdf An Apache module for OAI-PMH –http://www.modoai.org/http://www.modoai.org/ Extensible Repository Resource Locators (ERRoLs) for OAI Identifiers –http://www.oclc.org/research/projects/oaireso lver/default.htmhttp://www.oclc.org/research/projects/oaireso lver/default.htm
28
Race for This New Market… Yahoo! & University of Michigan –http://www.umich.edu/news/index.html? Releases/2004/Mar04/r031004http://www.umich.edu/news/index.html? Releases/2004/Mar04/r031004 Google & CrossRef –http://www.nature.com/nature/focus/ac cessdebate/17.htmlhttp://www.nature.com/nature/focus/ac cessdebate/17.html
29
OpenURL slides from Herbert Van de Sompel, LANL
30
The Context: Library Automation Environment anno 1998 distributed information environment local & remote A&I databases rapidly growing e-journal collection need to interlink the available information The Problem: links are delivered by info providers links are not sensitive to user’s context appropriate copy problem links dependent on business agreements between information vendors links don’t cover the complete collection Origins & Motivation
31
The Context: Library Automation Environment anno 1998 distributed information environment local & remote A&I databases rapidly growing e-journal collection need to interlink the available information The REAL Problem: libraries have no say in linking libraries are losing core part of the “organizing information” task expensive collection is not used optimally users are not well served Origins & Motivation
32
The Solution: In information services: DO NOT provide a link which is an actual service related to a referenced item (e.g. a link from a record in an A&I database to the corresponding full-text) BUT rather provide a link that transports metadata about the referenced item to others that are better placed to provide service links OpenURL Linking server operated by library
33
link source link destination link to referenced work. resource resolution of metadata into link reference non-OpenURL linking resource link
34
link source. user-specific resolution of metadata & identifiers into services reference OpenURL linking OpenURL linking server provision of OpenURL link destination link destination link destination link destination transportation of metadata & identifiers context-sensitive
35
metadata plane resource1 resource2resource3 default links herbert van de sompel default links: restricted in nature action-radius restricted by business agreements not context-sensitive
36
metadata plane extended services plane resource1 service component1 service component2 default links appropriate links OpenURL resource2resource3 herbert van de sompel
37
NISO OpenURL Standardization Charge Use existing “OpenURL Framework” as starting point notion of context-sensitive services notion of transporting “contextual” metadata packages to obtain context-sensitive services Define syntax and transport-method for “contextual” metadata packages Ensure extensibility: must support future applications must support other information communities => Generalize and Standardize
38
NISO OpenURL Standardization Charge Therefore, to be addressed were: OpenURL Framework beyond scholarly resources “contextual” metadata packages Syntax for “contextual” metadata packages Transport of “contextual” metadata packages
39
OpenURL Status (Nearly) a NISO standard –check for details: http://library.caltech.edu/openurl/
40
Naming: Handles & DOIs
41
Naming Fundamental to other technologies (OAI- PMH, OpenURL, etc.) Options –URNs –Persistent URLs (PURLs) http://purl.org/ –Handles http://www.handle.net/ –Digital Object Identifiers http://www.doi.org/ –ARK http://www.cdlib.org/inside/diglib/ark/
42
“Inverted Archives” Unit of discourse is no longer an archive or service, but a DOI which has services linked from it –cf.: UPS demonstration prototype “Smart Objects, Dumb Archives” (SODA) model
43
Example http://dx.doi.org/10.1145/374308.374342
44
Object Models
45
Popular Object Models METS –used in DSpace, Fedora –http://www.loc.gov/standards/mets/http://www.loc.gov/standards/mets/ MPEG-21 DIDL –http://xml.coverpages.org/mpeg21-didl.htmlhttp://xml.coverpages.org/mpeg21-didl.html –used in LANL DLs http://www.dlib.org/dlib/november03/bekaert/11bekaert.html http://www.dlib.org/dlib/february04/bekaert/02bekaert.html http://lib-www.lanl.gov/~herbertv/papers/jcdl2004-submitted- draft.pdfhttp://lib-www.lanl.gov/~herbertv/papers/jcdl2004-submitted- draft.pdf
46
Object Models & OAI-PMH resource itemoai:foo.edu:1234 METS Move from simple metadata files “pointing” to resources… …to records as “modeled representations” of resources records
47
Download and Go!
48
Where Do You Want to Build? user... data provider data provider data provider data provider service provider local context- sensitive services EPrints.org data provider CDSware
49
Fedora joint project between Cornell & UVa –funded by the Mellon Foundation a repository management system –focuses on complex digital objects and their behaviors more info: –http://www.fedora.info/http://www.fedora.info/ –D-Lib Magazine, 9(4) http://www.dlib.org/dlib/april03/staples/04staples.h tmlhttp://www.dlib.org/dlib/april03/staples/04staples.h tml
50
MIT + HP Labs constructed to capture all the output of MIT’s faculty now generalized to the DSpace Federation –8 top universities in the US & Canada More info: –http://www.dspace.org/http://www.dspace.org/ –http://sourceforge.net/projects/dspace/http://sourceforge.net/projects/dspace/ –D-Lib Magazine 9(1) http://www.dlib.org/dlib/january03/smith/01smith.ht mlhttp://www.dlib.org/dlib/january03/smith/01smith.ht ml
51
EPrints.org developed at Southampton University –part of larger suite of institutional/author self- archiving tools and services e.g.: citebase; paracite widely adopted -- 100+ sites –http://software.eprints.org/#ep2http://software.eprints.org/#ep2 more info –http://www.eprints.org/http://www.eprints.org/ –http://www.arl.org/sparc/core/index.asp?page= g20#6http://www.arl.org/sparc/core/index.asp?page= g20#6
52
CDSware developed at CERN data provider & service provider large-scale use @ CERN (> 600k records) –in use at a few non-CERN sites free & paid support models more info –http://cdsware.cern.ch/http://cdsware.cern.ch/
53
P2P publishing for academia –community servers for coordination, management –archivelets for individual laptops, PCs more info: –http://kepler.cs.odu.edu/http://kepler.cs.odu.edu/ –D-Lib Magazine 7(4) http://www.dlib.org/dlib/april01/maly/04maly.html
54
developed by UKOLN –open source OpenURL 0.1 format resolver –NISO 1.0 format??? more info: –Ariadne, 28 http://www.ariadne.ac.uk/issue28/resolver/ ftp://ftp.ukoln.ac.uk/metadata/tools/openresolver/ http://www.ukoln.ac.uk/distributed-systems/openurl/
55
Conclusions
56
Why The OAI-PMH is NOT Important Users don’t care OAI-PMH is middleware –if done right, the uninterested user should never have to know OAI Inside Using OAI-PMH does not insure a good SP OAI-PMH is (or is becoming) HTTP for DLs –few people get excited about http now http & OAI-PMH are core technologies whose presence is now assumed
57
Digital Library Technologies http XML OAI-PMH OpenURL ?
58
Other Uses For the OAI-PMH Assumptions: –Traditional DLs / SPs will continue on their present path of increasing sophistication citation indexing, search results viz, personalization, recommendations, subject-based filtering, etc. –growth rates remain the same (5x DPs as SPs) Premise: OAI-PMH is applicable to any scenario that needs to update / synchronize distributed state –Future opportunities are possible by creatively interpreting the OAI-PMH data model See Van de Sompel, Young & Hickey, D-Lib Magazine July 2003, http://www.dlib.org/dlib/july03/young/07young.html http://www.dlib.org/dlib/july03/young/07young.html Nelson, 2nd OAI Workshop, http://agenda.cern.ch/askArchive.php?base=agenda&categ=a0 2333&id=a02333s5t8/transparencies http://agenda.cern.ch/askArchive.php?base=agenda&categ=a0 2333&id=a02333s5t8/transparencies
59
OpenURL Framework evolution A spec based on HTTP GET to transport metadata about a scholarly referent & the context in which the referent is referenced Draft Van de Sompel, Beit-Arie, Hochstenbach - 05/2001 A framework Standard that enables different Communities to: describe a referent describe the context in which the referent is referenced transport these descriptions NISO Draft Standard - 04/2003
60
The Future: Community Building Ultimately, protocols and metadata formats are not what makes a difference Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML) The best current example: The Open Language Archives Community –http://www.language-archives.org/ OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends
61
Further Reading Gerry McKiernan, Library Hi-Tech News –http://www.public.iastate.edu/~gerrymck/OAI-SP-I.pdfhttp://www.public.iastate.edu/~gerrymck/OAI-SP-I.pdf –http://www.public.iastate.edu/~gerrymck/OAI-SP-II.pdfhttp://www.public.iastate.edu/~gerrymck/OAI-SP-II.pdf –http://www.public.iastate.edu/~gerrymck/OAI-SP-III.pdfhttp://www.public.iastate.edu/~gerrymck/OAI-SP-III.pdf Open Archives Forum OAI-PMH Tutorial –http://www.oaforum.org/tutorial/http://www.oaforum.org/tutorial/ “A Survey of Digital Library Aggregation Services” –http://www.diglib.org/pubs/brogan/http://www.diglib.org/pubs/brogan/ Open Access News –http://www.earlham.edu/~peters/fos/fosblog.htmlhttp://www.earlham.edu/~peters/fos/fosblog.html Guide To Institutional Repository Software –http://www.soros.org/openaccess/software/http://www.soros.org/openaccess/software/
62
Great Stuff I Did Not Cover… OAI-PMH –Static Repositories http://www.openarchives.org/OAI/2.0/guidelines-static- repository.htmhttp://www.openarchives.org/OAI/2.0/guidelines-static- repository.htm –OAI-Rights http://www.openarchives.org/documents/OAIRightsWhite Paper.htmlhttp://www.openarchives.org/documents/OAIRightsWhite Paper.html http://www.openarchives.org/news/oairightspress030929. htmlhttp://www.openarchives.org/news/oairightspress030929. html Digital Preservation –http://www.digitalpreservation.gov/http://www.digitalpreservation.gov/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.