Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo)

Slides:



Advertisements
Similar presentations
1 Introduction to NDLTD and Brief History of the ETD Movement ETD 2008: 11 th Int. Symp. on ETDs Aberdeen, Scotland: Newcomers Edward A. Fox,
Advertisements

1 NDLTD Union Catalog Panel Session 1C, Auditorium Introduction and Opening Statement ETD 2011: 14 th Int. Symp. on ETDs Cape Town, South Africa Edward.
Lawrence Webley, Hussein Suleman, Tatenda Chipeperekwa University of Cape Town Department of Computer.
University Electronic Publishing through Digital Libraries: Courseware, Theses and Dissertations ICSEP - Valparaiso, Chile, 1 Oct 2002 Edward A. Fox
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Digital Library in a Box Ming Luo, Hussein Suleman, Edward Fox Virginia Tech Subcontract to Collaborative Project led by University of Florida (also with.
Digital Library Architecture and Technology
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
W w w. i l u m i n a – d l i b. o r g iLumina: A Digital Library of Educational Resources for Science & Mathematics National Science Digital Library All-Projects.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
1 JCDL/ICADL 2010 (Gold Coast, Australia – June 24) “Ensemble PDP-8: Eight Principles for Distributed Portals” Edward A. Fox, Yinlin Chen, Monika Akbar,
PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 10, PSU Edward A. Fox, Virginia Tech, Blacksburg,
Open Archives Initiative OAI openarchives.org “Opening Remarks & Historical Overview” - ACM SIGIR’2001 Ed Fox (w. Lagoze.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 Introduction to NDLTD and Brief History of the ETD Movement ETD 2009: 12 th Int. Symp. on ETDs Pittsburgh, PA: Newcomers Edward A. Fox, Executive.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Collaborative Research: Curriculum Development for Digital Library Education Presentation in May 1,2006
1 NDLTD Welcome and Introduction ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A.
Open Virginia Tech DLRL Hussein Suleman
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
AOL Search Speaker Series Virginia Tech’s Digital Library Research Laboratory Dec. 20, AOL HQ Edward A. Fox, Virginia Tech, Blacksburg,
1 NDLTD Welcome and Introduction ETD 2011: 14 th Int. Symp. on ETDs Cape Town, South Africa Edward A. Fox Executive Director, NDLTD,
CitiViz: A Visual User Interface to the CITIDEL System ECDL 2004, Bath, England, September 2004 Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Mirroring an OAI archive with an I2-DSI channel Ryan Richardson Edward A. Fox Digital Library Research Laboratory Virginia Tech May 7 th, 2002.
Digital Library Component Models hussein suleman uct cs honours 2005.
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
XXDL and CSTC and Virginia Tech NSDL Fall 2000 PI Meeting September 22-24, 2000 NSF, Arlington, VA Edward A. Fox CS DLRL.
La Propuesta de Software de Código Abierto: Su Lugar en la Educación Superior Universidad de Buenos Aires May 19, 2004 Edward A. Fox
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Open Source y Educación Superior Biblioteca Central Universidad Nacional del Sur Bahia Blanca, Argentina May 17-18, 2004 Edward A. Fox
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Video Message: Welcome ETD 2015: 18 th Int’l Symposium on ETDs New Delhi, India Edward A. Fox Executive Director, Chairman of the Board NDLTD,
Introduction to Concept Maps Edward A. Fox and Rao Shen CS5604 Fall 2002 “Information Storage & Retrieval” Dept. of Computer Science Virginia Tech, Blacksburg,
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
OCKHAM: Fostering DL Interoperability through Reference Models and Lightweight Protocol Networks Martin Halbert Emory University Director for Library Systems.
ETD Search Services Ming Luo Edward A. Fox Virginia Tech.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
SCENARIO-BASED GENERATION OF DIGITAL LIBRARY SERVICES Rohit Kelapure, Marcos André Gonçalves, Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
ETDs and NDLTD Hussein Suleman University of Cape Town May 2004.
Foundations of, and Experiences with, Componentized Digital Libraries OCKHAM Panel ECDL Rome, Italy Edward A. Fox Digital Library Research.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
Open Digital Libraries Edward A. Fox Virginia Tech, Dept. of Computer Science.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
DISSERTATION COLLECTIONS DISSERTATION COLLECTIONS NETWORKED DIGITAL LIBRARY OF THESES AND DISSERTATIONS
The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002 Edward A. Fox.
Introduction to USETDA and Brief History of the ETD Movement John Hagen, Consultant – Renaissance Scholarly Communications / Board Member – NDLTD and USETDA.
NDLTD Toward Universal Accessibility of ETDs: Building the NDLTD Union Archive Hussein Suleman, Edward A. Fox,
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Introduction to NDLTD and Brief History of the ETD Movement ETD 2008: 11th Int. Symp. on ETDs Aberdeen, Scotland: Newcomers Edward A. Fox,
OAI and Metadata Harvesting
NSDL Data Repository (NDR)
Presentation transcript:

Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec Edward A. Fox (with Hussein Suleman, Ming Luo) CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech, Blacksburg, VA, USA

Acknowledgements (Selected) Sponsors: ACM, Adobe, DLF, IBM, Mellon Foundation, Microsoft, NSF (Grants CDA ; DUE , , ; IIS , , , and ), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, … Faculty/Staff (now): Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee Giles, Martin Halbert, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Muhammad Zubair, … Students: Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan Richardson, Priya Shivakumar, Wensi Xi, Liang Xu, Baoping Zhang, …

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

Overview We address the problem of how to develop DLs; build on experience in building many DLs; strive for simplicity as per OCKHAM initiative; build upon the Open Archives Initiative; demonstrate our approach in diverse situations; and invite all to use DL-in-a-box and help build Open Digital Libraries.

Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: 1.The library budget won’t allow purchase of a commercial DL system. 2.Unless the development effort is local, there won’t be any control. 3.DLs are extensions of DBMSs, so they are simple applications to develop. 4.Since DLs operate on the Web, one must adopt the newest W3C proposal.

Problem – cont’d 5.Since technology moves so quickly, it is essential to follow the latest fad. 6.CS students always develop from scratch. 7.This team knows it can do it better. 8.This system must have more capabilities than any other system. 9.This DL has to be more flexible and extensible. 10.This is the right system architecture – at last!

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

Experience: Case Study Projects AmericanSouth.org NDLTD CSTC JERIC CITIDEL NSDL Digital Library in a Box

AmericanSouth.org Domain: culture and history of the southern region of America (USA) Genre: diverse distributed collections at a dozen universities Submission & Collection: local sites  Emory University (for SOLINET)

Networked Digital Library of Theses and Dissertations (NDLTD) Domain: graduate education and research Genre: electronic theses and dissertations (ETDs) Submission & Collection: local sites 

Computer Science Teaching Center (CSTC) Domain: teaching computer science Genre: courseware Submission & Collection:

CS Teaching Center (CSTC): Lessons Learned Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.

Browsing (2)

ACM Journal of Educational Resources in Computing (JERIC) Domain: teaching computer science Genre: courseware, scholarly articles Submission & Collection: CSTC, ACM Digital Library

JERIC JER iCJournal of Educational Resources in Computing Accessible from and and ACM and SIGCSE support Refereed and interactive Part of ACM Digital Library

Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, technical reports, … Submission & Collection: sub/partner collections 

CITIDEL Team An NSDL Collection Track project Led by Virginia Tech, with co-PIs: Fox (director, DL systems) Lee (history) Perez (user interface, Spanish support) Partners College of New Jersey (Knox) Hofstra (Impagliazzo) Villanova (Cassel) Penn State (Giles)

Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Size of Collection 1-5 items items items items Number of Collections Identified

Multi-dimensional Categorization

CITIDEL Collection Sources metadata JERIC fulltext Experts’ finding aids IEEE-CS … include CSTCResearch Index ACM NEC’s data processed w. R.I. SIGCSE proceedings ACM DL include Borner’s info viz software repository NCSTRL

CITIDEL Collection Building thru aided by after using or thru using Submitting VIADUCT GetSmart Searching, Browsing Classifying Nominating Crawling Crawlifier thru Composing include after Creating include after

Overview of CITIDEL architecture

Distributed repository structure

Digital library architecture for local and interoperable CITIDEL services

National Science Digital Library (NSDL) Domain: undergraduate and K-12 education, etc. Genre: educational resources Submission & Collection: sites of 90 projects 

NSDL Information Architecture Developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

Digital Library in a Box Domain: helping DL projects Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) Software and Documentation:

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

Open Archives Initiative OAI

Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

Tiered Model of Interoperability Mediator services Metadata harvesting Document models

OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7 BrowseSummarizeSearchVisualize DO Services: Docs: Metadata:

Aggregation through OAI Harvesting ArchiveLite SitesNCSTRLEprints IEEE-CS, ACM, … Own: History, ResearchIndex, CSTC, … CITIDELActive

Protocol for Metadata Harvesting Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date/Time Ranges Sets (with semantics depending on local data providers) Resumption Tokens

NDLTD OAI Example

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

Open Digital Library (ODL) Hypothesis (Hussein Suleman) Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as networks of extended Open Archives, where each extended Open Archive is a source of data and/or a provider of services.

Example Architecture (NDLTD) Humboldt Duisburg MIT Filter MIT Browse Union Catalog SearchRecent User Interface OAI/ODL archive OAI/ODL protocol legend Virginia Tech PhysNet CalTech Dresden

ODL Demonstration - FrontPage

ODL Demonstration - Search

ODL Demonstration - Browse

Hussein Suleman’s Thesis Summary Open Digital Libraries (DLs) Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) Extending OAI-PMH provides the glue for building componentized DLs. Lightweight protocols connect the components to support modular systems with good efficiency.

Research in a Nutshell We build extensible modular systems with customizable services. This supports interoperability and allows distributed development. This is in use in AmericanSouth.org, … Components include search, browse, annotate, editorial support, union, filter, whats-new, submit, rate, recommend, …

Program Document Document Document Program Program Image Image Image Video Video Video usersdigital objects ?

Program Document Document Document Program Program Image Image Image Video Video Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH

ODL Component Requirements Search Retrieve a list of items Index new items Annotate Add annotation to item Retrieve a list of annotations for an item

Open Digital Library Components Running now XML-File (data provider from file system) Union, search, browse, recent, filter E-journal/review, Submit, Edit, Annotation Class projects High performance multilingual search Recommender, Rating; Mirroring (see JCDL’02) Working with NCSA: from DB, unstructured text Others discussed Classification/categorization DL-Viz interconnection (VIDI – Jun Wang ETD)

Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider

Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 Digital Library for the Networked Digital Library of Theses and Dissertations ( Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

Digital Library for the Computer Science Teaching Center (

CSTC User Interface

Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

Layer 1 : OAI PMH Protocol for Metadata Harvesting Transfer stream of metadata from one archive or component to another Service Requests Identify, ListSets, ListMetadataFormats GetRecord, ListIdentifiers, ListRecords

Layer 2 : Extended OAI-PMH OAI-PMH + extensions for general-purpose inter-component communication Added in generic containers in every response for additional information Added “PutRecord” to submit a record Increased granularity to support times as well as dates (same as OAI-PMH v2.0) Ignored DC requirement

Layer 3 : ODL Protocols Specialized protocol semantics for different components, e.g.: Search component uses ODLSearch protocol ListRecords and ListIdentifiers embed query terms in “set” parameter Annotation component uses ODLAnnotate protocol ListRecords and ListIdentifiers specify the item for which annotations are requested in the “set” parameter PutRecord adds an annotation to an item

Performance Optimizations Caching of responses Persistent CGI mechanisms FastCGI SpeedyCGI Request multiple records in a single operation (proposed)

What have we accomplished ? Complete protocol-level separation among components within the DL Seamless integration with little “glue” Simple extensions of OAI-PMH Modular and portable components Efficient in speed – but not as efficient in storage

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

Digital Library In A Box Part of NSF’s National Science Digital Library ( Offers “Shrink-wrap” Open Digital Library Components – Open Source Software Users install ready-made digital library solutions, or build their own from snap- together components.

OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Next meeting in Atlanta Jan. 8, 2003 Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement

5S Layers Societies Scenarios Spaces Structures Streams

Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

It is possible to build DLs easily. The ODL approach to this has been developed and validated in a number of settings. Everyone is invited to: Use ODL components Refine or add ODL components, protocols Join ODL and OCKHAM For more information see:

(Somewhat) Open Issues Is this scalable? Portable ? Extensible ? Can we define all popular DL services using such a methodology? (completeness problem) Can we define DLs as configurations of ODL components? (composition problem) Is OAI-PMH a good baseline protocol ? Can we design a better baseline protocol upon which to base harvesting and repository access? To what degree is an ODL network equivalent to a monolithic system? (comparison problem)

Ultimate Goal Package different configurations into instant DL systems or subsystems DL building = component configuration All DLs speak the same language(s) Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs

Selected Links CITIDEL – NCSTRL – NDLTD – NSDL – Open Archives Initiative

More Links Hussein Suleman’s Dissertation Repository Explorer DL Courseware – Virginia Tech Digital Library Research Laboratory (DLRL) – Listservs