1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, Digital Library Research Laboratory, Dept. of CS Virginia Tech,

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
1 Introduction to NDLTD and Brief History of the ETD Movement ETD 2008: 11 th Int. Symp. on ETDs Aberdeen, Scotland: Newcomers Edward A. Fox,
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Digital Library in a Box Ming Luo, Hussein Suleman, Edward Fox Virginia Tech Subcontract to Collaborative Project led by University of Florida (also with.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1 CS5604 October 13, 2010 “5S Overview for Modules” by Edward A. Fox and Lillian (Boots) Cassel (on Ensemble) Dept. of.
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
Digital Library Architecture and Technology
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications Marcos André Gonçalves Doctoral defense.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 10, PSU Edward A. Fox, Virginia Tech, Blacksburg,
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 Introduction to NDLTD and Brief History of the ETD Movement ETD 2009: 12 th Int. Symp. on ETDs Pittsburgh, PA: Newcomers Edward A. Fox, Executive.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
CHAPTER TEN AUTHORING.
ETANA-DL NSF Digital Library Project Edward A. Fox, Virginia Tech ASOR Annual Meeting, 2004
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
AOL Search Speaker Series Virginia Tech’s Digital Library Research Laboratory Dec. 20, AOL HQ Edward A. Fox, Virginia Tech, Blacksburg,
1 NDLTD Welcome and Introduction ETD 2011: 14 th Int. Symp. on ETDs Cape Town, South Africa Edward A. Fox Executive Director, NDLTD,
CitiViz: A Visual User Interface to the CITIDEL System ECDL 2004, Bath, England, September 2004 Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 Slides for Steve Griffin, NSF “ETANA and Digital Library Integration” by Edward A. Fox Oct. 3, Dept. of Computer.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
National Science Foundation The National SMET Education Digital Library (NSDL) Program: Context and Vision August 10, 2000 US-Korea Joint Workshop on Digital.
XXDL and CSTC and Virginia Tech NSDL Fall 2000 PI Meeting September 22-24, 2000 NSF, Arlington, VA Edward A. Fox CS DLRL.
La Propuesta de Software de Código Abierto: Su Lugar en la Educación Superior Universidad de Buenos Aires May 19, 2004 Edward A. Fox
Open Source y Educación Superior Biblioteca Central Universidad Nacional del Sur Bahia Blanca, Argentina May 17-18, 2004 Edward A. Fox
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
1 Video Message: Welcome ETD 2015: 18 th Int’l Symposium on ETDs New Delhi, India Edward A. Fox Executive Director, Chairman of the Board NDLTD,
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
Introduction to Concept Maps Edward A. Fox and Rao Shen CS5604 Fall 2002 “Information Storage & Retrieval” Dept. of Computer Science Virginia Tech, Blacksburg,
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
IBM Software Group ® Managing Reusable Assets Using Rational Suite Shimon Nir.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
ETD Search Services Ming Luo Edward A. Fox Virginia Tech.
Visual Semantic Modeling of Digital Libraries Qinwei Zhu, Marcos André Gonçalves, Rao Shen, Edward A. Fox – Virginia Tech,, Blacksburg, VA, USA Lillian.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
SCENARIO-BASED GENERATION OF DIGITAL LIBRARY SERVICES Rohit Kelapure, Marcos André Gonçalves, Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
ETDs and NDLTD Hussein Suleman University of Cape Town May 2004.
Foundations of, and Experiences with, Componentized Digital Libraries OCKHAM Panel ECDL Rome, Italy Edward A. Fox Digital Library Research.
5S Perspective Digital Libraries Foundations Workshop at JCDL 2007 Vancouver – June 23 Edward A. Fox Virginia Tech, USA
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
Open Digital Libraries Edward A. Fox Virginia Tech, Dept. of Computer Science.
1 Digging into Digital Libraries: From Archaeology to Formalism Edward A. Fox Virginia Tech, Dept. of CS CSC Spring Colloquium Villanova – February.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Introduction to NDLTD and Brief History of the ETD Movement ETD 2008: 11th Int. Symp. on ETDs Aberdeen, Scotland: Newcomers Edward A. Fox,
Vision... “… a network of learning environments and resources for Science, Mathematics, Engineering and Technology education, will ultimately meet the.
VI-SEEM Data Repository
NSDL Data Repository (NDR)
Oya Y. Rieger Cornell University Library May 2004
Presentation transcript:

1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, Digital Library Research Laboratory, Dept. of CS Virginia Tech, Blacksburg, VA USA

Acknowledgements (Selected) Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS , , , ; ITR ; DUE , , , ), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS

3 Acknowledgements: Faculty, Staff Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

4 Acknowledgements: Students Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …

5 For More Information Magazine: Books: (1994) –MIT Press: Arms, plus by Borgman, Licklider (1965) –Morgan Kaufmann: Witten... (several), Lesk (2 nd edition) Conferences –ECDL: –ICADL: –JCDL: Associations –ASIS&T DL SIG –IEEE TCDL: (student awards, consortium) NSF: Labs: VT:

6

7 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL

8 Outline 1. 5S Framework for DL 1.1. Motivation: the problem –Hypotheses and research questions 1.2. Theory –5S: introduction, formal definitions –The formal ontology 1.3. Tools/Applications –Language –Visualization –Generation –Logging 1.4. Quality 1.5. Conclusions, Future Work

Motivation Digital Libraries (DLs): what are they?? –No definitional consensus –Conflicting views –Makes interoperability a hard problem DLs are not benefiting from formal theories as are other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lack of support for tailoring/customization Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. –Lack of specific DL models, formalisms, languages

10 Hypotheses A formal theory for DLs can be built based on 5S. The formalization can serve as a basis for modeling and building high- quality DLs.

11 Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?

Informal 5S Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

13 5Ss SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

14 Digital Objects (DOs) Born digital Digitized version of “real” object –Is the DO version the same, better, or worse? –Decision for ETDs: structured + rendered Surrogate for “real” object –Not covered explicitly in metamodel for a minimal DL –Crucial in metamodel for archaeology DL

15 Metadata Objects (MDOs) MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps

16 Other Key Definitions –coll, catalog, repository, service, archive, (minimal) DL –See Gon ç alves et al. in April 2004 ACM Transactions on Information Systems (TOIS)

17 5S and DL formal definitions and compositions (April 2004 TOIS)

18 Glossary: Concepts in the Minimal DL and Representing Symbols

19 5S Static / Passive Dynamic / Active

20 Digital Library Formal Ontology

21 Ontology: Applications Expand definition of minimal DL by characterizing –typical DL services –in the context of “employs” and “produces” relationships Use characterization to: –Reason about how DL services can be built from other DL components –As well as be composed with other services through extension or reuse

22 Ontology: Applications

23 Ontology: Taxonomy of Services Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting PreservationalCreational Add Value Repository-Building Information Satisfaction Services Infrastructure Services

24 Composition of key fundamental / infrastructure services

25 Composition of additional services

26 Approach

Tools/Applications

28 5SL: a DL design language Domain specific languages –Address a particular class of problems by offering specific abstractions and notations for the domain at hand –Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. XML-based realization of 5S –Interoperability –Use of many sub-languages (e.g., MIME types, XML Schemas, UML notations)

29 5SL – The Minimal DL Metamodel

30 <stream value=`ETDText'> <stream value=`ETDAudio'>... %XMLSchema% Example of Document declaration in the Structures Model <Attribute name='name‘ type='String'/> <Attribute name='ID‘ type='Integer'/> Converting Reviewing Cataloguing ……… Example of Actors declaration in the Societies Model Simple scenario for an NDLTD site searching service Patron InterfaceManager collection query InterfaceManager SearchManager collection query SearchManager InterfaceManager WtdSet …. Example of Service declaration in the Scenario Model

31 Help users model their own instances of a digital library (DL) in the 5S language (5SL). A simple modeling process which enables rapid generation of digital libraries Features –5SGraph loads and displays a metamodel in a structured toolbox. –The structured editor of 5SGraph provides a top- down visual building environment for the DL designer. –5SGraph produces syntactically correct 5SL files according to the visual model built by the designer. 5SGraph: A DL Modeling Tool

32 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)

33 5SGraph: Other Key Features Flexible and extensible architecture Reuse of models –Load, save, and change common (sub-) models Synchronization of views Enforcing of semantic constraints

34 5SGraph Evaluation: Usability Study

35 5SGen Version 1 -- MARIAN as the target system –Focused on rich structures: semantic networks –Behavior attached to nodes/links Version 2 -- Shifted for later work to componentized (ODL) approach –Focused on scenarios/societies –Structures/Spaces encapsulated within components (e.g., relational tables, indexes) –Only textual streams supported

36 5SLGen – Version 2: ODL, Services, Scenarios

37 5SLGen Proof of Concept: prototyping –CITIDEL –Viaduct –NDLTD Union Catalog –BDBComp

38 XML-based DL Log Standard Log analysis –is a source of information on: How patrons really use DL services How systems behave while supporting user information seeking activities Used to: –Evaluate and enhance services –Guide allocation of resources Common practice in the web setting –Supported by web servers, proxy caches DL Logging can be more detailed

39 DL Logging Features Captures high level user and system behaviors Organized according to the 5S framework –Hierarchical organization (XML-based) –Centered on the notions of events Record only events related to initial user inputs and final system outputs Help to understand user interactions and the perceived value of responses

40 The XML Log Format Log SessionIdMachineInfo StatementTransactionTimestamp SessionInfoRegisterInfo StatementEventTimestamp Action SearchBrowse StoreSysInfoUpdate SearchBy QueryString CatalogCollection PresentationInfo StatusInfo Timeout

Describing Quality in Digital Libraries What’s a “good” digital Library? –Central Concept: Quality! –Hypotheses of this work: Formal theory can help to define “what’s a good digital library” by: New formalizations of quality indicators for DLs within our 5S framework Contextualizing these measures within the Information Life Cycle

42 Quality Dimensions

43 Digital Objects: Accessibility A digital object is accessible by an DL actor or patron, if 1.it exists in the DL collections 2.is retrievable from the repository 3.it is not restricted from access –by metadata on rights –For actor or actor’s society

44 Digital Objects: Pertinence Inf(do i ) = information carried by a digital object or any of its descriptions IN(ac j ) = information need of an actor Context jk = an amalgam of societal factors which can impact the judgment of pertinence by ac j at time k. –Factors include time, place, the actor's history of interaction, task in hand, and factors implicit in the interaction and ambient environment.

45 Digital Objects: Pertinence The pertinence of a digital object to a user ac j is an indicator function Pertinence(do i, ac j ): Inf(do i )  IN(ac j )  Context jk defined as: –1, if Inf(do i ) is judged by ac j to be informative with regards to IN(ac j ) in context Context jk ; –0, otherwise

46 Digital Objects: Relevance Relevance (do i,q) 1, if do i is judge by external-judge to be relevant to q 0, otherwise Relevance Estimate –Rel(do i,q) = do i   dj  / |do i  |  |q  | Objective, public, social notion –Established by a general consensus in the field, not subjective, private judgment by an actor with an information need

47 Metadata Specifications and Metadata Format: Completeness Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. Completeness(ms x ) = 1 - (no. of missing attributes in ms x / total attributes of the schema to which ms x conforms)

48 Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog

49 Metadata Specifications and Metadata Format: Conformance An attribute att xy of a metadata specification ms x is cardinally conformant to a metadata format/standard if: –it appears at least once, if att xy is marked as mandatory; –its value is from the domain defined for att xy ; –it does not appear more than once, if it is not marked as repeatable. Conformance(ms x ) = (  (  attribute att xy of ms x ) degree of conformance of att xy )/ total attributes).

50 Metadata Specifications and Metadata Format: Conformance Based on ETD-MS

51 Services: Efficiency / Effectiveness Effectiveness –Very common measures: Precision, Recall, F1, 10- precision, R-Precision –Other services may have different measures: e.g., Recommending, etc. Efficiency –let t(e) be the time of an event e – let e ix and e fx be the initial and the final event of service se x. –For service se x, efficiency is defined as: Efficiency(se x ) = t(e fx ) - t(e ix )

52 Services: Extensibility & Reusability A service Y reuses a service X if the behavior of Y incorporates the behavior of X. A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events.

53 Services: Extensibility & Reusability (2) Macro-Reusability(Serv) = no. of reused services/ total number of services Micro-Reusability(Serv) = number of lines of code of managers that implement (run) reused services/ total lines of code

54 Services: Extensibility and Reusability Macro-Reusability = 4/16 = 0.25 Micro-Reusability = 3630 / = 0.304

55 Quality and the Information Life Cycle

56 Quality Model: Evaluation Focus groups –3 librarians –Major points Focus on DLs not traditional libraries Some indicators may have more theoretical than practical use in some contexts Liked minimalist approach Interesting and potentially useful mainly for education and evaluation

Conclusions We have answered the almost 40-year-old challenge of Licklider to build a unified CS / LIS theory by –Proposing and formalizing the first comprehensive formal framework for digital libraries Showed how to move from theory to practice by –Applying the framework to the problems of –Materializing these application into languages, tools, formats, etc. –Explaining and evaluating these applications (usability studies, focus groups, prototyping, etc.)

58 Future Work Theory –Apply to formally describe other systems –Complete formal definitions of all services with further events –Load axioms in knowledge base to automatically assess quality of models (correctness, etc.) Applications/Tools –Language Make different versions uniform Extend with METS, less complex scenario, society models New metamodels –Domain/application oriented (e.g., archaeology, education) –For traditional libraries

59 Future Work (cont’d) Applications/Tools –Visualization Integration with other tools –through Wizard New visualizations Applying as educational tool –Generation Use of Web services Incorporation of Native XML repositories Improvement of Scenario Algorithms – Logging Promote use Consider privacy issues New actions Deal with scalability issues

60 Future Work (cont’d) Quality –Development of more usage-oriented measures Current measures are mostly system-oriented Focus on log format and evaluation –Development of Quality ToolKit (5SQual) for DL managers with following features: Mapping tool to map local log format to standard XML Log format Components to implement all measures Visualization of data and measures Broken into several logical pieces to be used in the different phases of the information life cycle Others, e.g., personalization Create theories, tools, languages, methods for personalization based on 5S

61 2. DL Integration What is “DL Integration” –Hide distribution –Hide heterogeneity –Enable autonomy of individual component Why Integration –island-DLs –inability to seamlessly and transparently access knowledge across DLs Utilize various autonomous DLs in concert

62 Integration: Rationale We can read any paper book (ignoring limitations of language, vision, …). Scholarship requires access, analysis, and synthesis spanning disciplines and sources. New theories, systems, and services build upon our past accomplishments. Our “Small World” and the “Internet Age” demand that we, and our computers, work together and interoperate.

63 Integration: Urgency, Longevity If we collect, capture, acquire, or produce information, will it be usable in 100 years? NSF Digital Archiving Program Library of Congress National Digital Information Infrastructure and Preservation Program

64 Integration: Standards Standards don’t exist in many areas. Standards that do exist create a jumble: –Conversion between (without loss?) –Bridging gaps (Z > OAI) –Managing legacy content and systems Standards in DLs have focused on: –Metadata (e.g., Dublin Core) –Architecture (e.g., handles, repositories)

65 Integration: Challenges “Semantic Web” is vision, not reality. How can we integrate without a theory? How can we interoperate without a common framework? How can we have a science of DLs if we lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?

66 Hypothesis and Research Questions The 5S framework provides effective solutions to DL integration. –Formally define the DL integration problem? –Guide integration of domain focused DLs? How to formally model such domain specific DLs? How to integrate formally defined DL models into a union DL model? How to use the union DL model to help design and implement high quality integrated DLs? –Assess the integration?

67 Related Work DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use SemInt has an example LSD has an example Interrelated with

68 DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use Interrelated with GA trained by DL integration formalization based on

69 Formal Definition of DL Integration DL i =(R i, DM i, Serv i, Soc i ), 1 i n –R i is a network accessible repository –DM i is a set of metadata catalogs for all collections –Serv i is a set of services –Soc i is a society UnionRep UnionCat UnionServices UnionSociety

70 Formal Definition of DL Integration (Cont.) DL integration problem definition: Given n individual libraries, integrate the n DLs to create a UnionDL.

71 Repository1 DL1 Repository2 Union Catalog Union Repository Catalog1Catalog2 Searching Union DLDL2 archaeologists Society General Public Society Archaeologists General Public Union Society Service Browsing Service Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Architecture of a Union DL

72 Example of Union Service: CitiViz

73 Integration of Domain Focused DLs Union archaeological metadata catalog generation Modeling archaeological DLs (ArchDLs) in the 5S framework ArchDL integration case study: ETANA-DL

74 Union Catalog Integration VN Metadata Format Global Metadata Format VN Catalog HD Catalog Union Catalog Mapping Tool Wrapper Mapping Tool Wrapper HD Metadata Format Virtual Nimrin (VN) Halif DigMaster (HD) Union ArchDL

75 Modeling ArchDLs in the 5S Framework Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services Minimal DL Minimal ArchDL

76 Digital Object Repository Collection Minimal DL Metadata Catalog Descriptive Metadata Specification A Minimal DL in the 5S Framework Structural Metadata Specification StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream

77 StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification SpaTemOrg StraDia Arch Descriptive Metadata specification ArchDO ArchObj ArchColl Arch Metadata catalog ArchDColl ArchDR Minimal ArchDL A Minimal ArchDL in the 5S Framework

78 Integration of Domain Focused DLs Modeling archaeological DLs (ArchDLs) in the 5S framework Union archaeological metadata catalog generation ArchDL integration case study: ETANA-DL

79 ETANA-DL Archaeological DL Integrated DL –Heterogeneous data handling Applies and extends the OAI-PMH –Open Archives Initiative Protocol for Metadata Handling Design considerations –Componentized –Extensible –Portable

80 5S Meta Model 5SGraph DL Expert DL Designer 5SL DL Model 5SLGen Practitioner Researcher Tailored DL Services Teacher c omponent pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Requirements (1) Analysis (2) Implementation (4) Design (3) 5SGraph5SGen Mapping Tool 5SSuite

81 5SGraph 5S Archaeology MetaModel ArchDL Expert ArchDL Designer Structure Sub-model ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing … Scenario Sub-model VN Metadata Format ETANA-DL Metadata Format HD Metadata Format Mapping Tool Wrapper4VNWrapper4HD Inverted Files Services DB Index Browse Service Search Service Browse DB Other ETANA-DL Services Web Interface XOAI VN Catalog HD Catalog Union Catalog 5SGen Component Pool Browsing …

82 ETANA-DL Architecture UsersServicesData ETANA-DL Union ServicesUsers DigBaseDigKit

83 ETANA-DL Architecture DigBase and DigKit Lahav Nimrin Umayri Hisban Megiddo Jalul New Sites DATABASEWRAPPERSDATABASEWRAPPERS ETANA-DL UNION CATALOG Search USERINTERFACEUSERINTERFACE Browse Recommend Note Personalize Review Visualizations Archaeology Specific Work in progress …

84 Assessment of Integrated DL Union catalog quality measurement Union service quality measurement Initial example

85 Union Catalog Quality Measurement Complete –All the catalogs to be integrated are complete. Consistent –All the catalogs to be integrated are consistent. –Each descriptive metadata specification in the union catalog describes only one digital object.

86 Union Catalog Quality Measurement (Cont.) Mapping-Completeness n is the total number of local schemas

87 Union Services Quality Measurement Internal quality measurement –Composability: reusability and extensibility External quality measurement –Searching: coverage q = –Browsing: knowledge-gain browse =

88 ArchDL1ArchDL2 UnionArchDL Site1 *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Sites Site2 *Sub-partition *Container*Artifact*Locus*Partition Artifacts Path(ArchDL1)=6 Path(ArchDL2)=2 Path(UnionArchDL) = (6+6+2) + 4*6*2=62 Browsing: knowledge-gain browse Site *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Knowledge-gain browse =

3. DL Overview Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration –wherein all can contribute and benefit –which leverages investment in networking –which provides useful content on Internet & WWW –which will tie nations and peoples together more strongly and through deeper understanding

90 Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community

91 Synchronous Scholarly Communication Same time, Same or different place

92 Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place

93 Digital Libraries Shorten the Chain from Editor Publisher A&I Consolidator Library Reviewer

94 DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian

Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore Note: we should consider 4 dimensions: computing, communications, content, and community (people)

96

97 AmericanSouth.Org – Roles, Content SOLINETLibraries (Data Providers)Scholars Intellectual Organization Controlled vocabulary Metadata extension development Collection Decisions Selection Criteria Controlled vocabulary Central Server MaintenanceLocal Server MaintenanceProvision of Context Metadata RepositoryMetadata Creation/MaintenanceOrganizational Structure and Annotation Tools Central Interface Design/MaintenanceLocal Interface Design/MaintenanceSelection of Other Annotation Tools Central Indices Creation/MaintenanceLocal IndicesSelection of Thesauri Coordination of Metadata Gateway Development Gateway ImplementationConcept Mapping Digital Objects

98 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal African-American cultural life Agricultural crisis of late 19 th century Codification of segregation laws Configuration of white supremacy Cultural values and activities Disenfranchising movements Educational movements Emergence of Holiness & Pentecostal Groups Emergence of new musical forms Emergence of organized groups expressing farmers concerns … ………………………… Total Each Format

99 Application Domain Related InstitutionsExamples Technical ChallengesBenefit / Impact Publishing Publishers, Eprint archives OAI Quality control, opennessAggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, CultureMuseumAMICO, PRDLA Digitization, describing, catalogingGlobal understanding Science Government, Academia, Commerce NVO, PDG, SwissProt, UK eScience,European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutionsCourt cases, patents Developing standardsStandardization, economic development History, Heritage FoundationsAmerican Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Cross- cutting Library, Archive Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness Reagan MooreEd FoxReagan MooreEd Fox June 2002for NSFJune 2002for NSF

100

101

102

103

104 As data, information, and knowledge play increasingly central roles … digital library research should focus on: Increasing the scope and scale of information resources and services; Employing context at the individual, community, and societal levels to improve performance; Developing algorithms and strategies for transforming data into actionable information; Demonstrating the integration of information spaces into everyday life; and Improving availability, accessibility, and, thereby, productivity.

105 An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions: Acquisition of new information resources; Effective access mechanisms that span media type, mode, and language; Facilities to leverage the utilization of humankind’s knowledge resources; Assured stewardship over humanity’s scholarly and cultural legacy; and Efficient and accountable management of systems, services, and resources.

OAI, OCKHAM, CSTC, NSDL, NDLTD: Open Archives Initiative Advocacy for interoperability Standard for transferring metadata among digital libraries –Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility Support for PMH => Open Archive (OA)

107 OAI = Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

108 OAI – Repository Perspective Required: Protocol DO MDO

109 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7

110 Tiered Model of Interoperability Mediator services Metadata harvesting Document models

111 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

112 LOCKSS Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels Initial content: journals Emory (Martin Halbert) –Help deploy and adapt –Help apply in other contexts Another registry Set of publisher manifests (information providers) Set of storage systems (archival storage) –NDIIP: AmericanSouth, MetaArchive

113 OCKHAM Library Network

114 OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement Funded by NSF in NSDL, with P2P

115 Lightweight Protocols “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. Successes of protocols considered lightweight is illuminating. Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

116 Reference Models Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement

117 OCKHAM Proposed Services Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry (plus others such as from adapted ODL)

CS -> CSTC -> CRIM NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - College of NJ, U. Ill. Springfield, Virginia Tech Focus initially on labs, visualization, multimedia Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: (with curricular guidelines also under development)

CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from

120

121 Browsing (1)

122 Browsing (2)

123

124

125

126 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections 

Led by Virginia Tech, with co-PIs: –Fox (director, DL systems) –Lee (history) –Perez (user interface, Spanish support) Partners –College of New Jersey (Knox) –Hofstra (Impagliazzo) –Villanova (Cassel) –Penn State (Giles)

128 Multi-dimensional Categorization

129 Overview of CITIDEL architecture

130 Distributed repository structure

131 Digital library architecture for local and interoperable CITIDEL services

132 CITIDEL: Computing & Information Technology Interactive Digital Education Library

133

134

135

136

137

138 CITIDEL Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies OAI: Used to collect DL Resources and DL Interoperability XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) Perl: Component Integration ESSEX: Search Engine Functionality Very fast, utilizing in-memory processing Includes snap-shots for persistence Multi-scheming Integrates multiple classifications / views through maps, closure

139

140 Cluster Search Results from CITIDEL

141 Cluster NDLTD-Computing

142 CITIDEL + PIPE Adds Interaction Personalization to CITIDEL Automatically handles multi-modal conversion to Cell phone, PDA, Etc. Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL National Science Digital Library (Next slides courtesy Lee Zia, NSF)

144 Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories;... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate;...

145 Supports: Users Content Tools (profiles) (metadata) (protocols) Learning communities Customizable collections Application services

146 Enables: Environments for Communication Collaboration Creation Validation Evaluation Recognition... Discovery Stability Reliability Reusability Interoperability Customizability... of Resources AND

147 NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks Pathways: large efforts across broad ranges of areas or approaches or users

148 Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy

149 Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms

150

151

152

153 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: Collection: Project: Networked Digital Library of Theses & Dissertations (NDLTD)

NDLTD: How can a university get involved? Select planning/implementation team –Graduate School –Library –Computing / Information Technology –Institutional Research / Educ. Tech. Join online, give us contact names – Adapt Virginia Tech or other proven approach –Build interest and consensus –Start trial / allow optional submission

Student Gets Committee Signatures and Submits ETD Signed Grad School

Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD

158

159 ETD Union Collection (OAI)

160 Union catalog: OCLC OCLC will expand OAI data provider on TDs. Is getting data from WorldCat (so, from many sites!). Will harvest from all others who contact them. Need DC and either ETD-MS or MARC. Has a set for ETDs.

161

162

163

164 OCLC SRU Interface

165 Union catalog: VTLS, VT VTLS will enhance search/browse service for ETDs –Will harvest from OCLC’s set of ETD records –Will receive through other mechanisms –Will work with MARC-21 and ETD-MS VT will continue to offer experimental services

166

167 ETD Union Search Mirror Site in China (CALIS) ( – popular site!)

168

169 VTLS Union Catalog Content Languages The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow

170 Language = German; hits = 137

171 Full record display

172

173

174 Complex to Simple MARC ($50)Dublin Core (DC) + thesis

175

176 Why ETD? Short Answer For Students: –Gain knowledge and skills for the Information Age –Richer communication (digital information, multimedia, …) For Universities: –Easy way to enter the digital library field and benefit thereby For the World: –Global digital library – large, useful, many services General: –Save time and money –Increased visibility for all associated with research results

Open Source, Repositories, DigArch, ODL Open Source DL Examples Eprints ( Fedora Greenstone ( Many systems in NSF DLI projects VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD

178

179

180

181

182

183

184

185 What is a Digital Object Repository?  Also called: digital rep., digital asset rep., institutional repository  Stores and maintains digital objects (assets)  Provides external interface for Digital Objects  Creation, Modification, Access  Enforces access policies  Provides for content type disseminations Adapted from Slide by V. Chachra, VTLS

186 Goals of Institutional Repositories (by Steven Harnad, U. Southampton)  Self Archiving of Institutional Research  Thesis and Dissertations (VTLS NDLTD Project)  Article preprints and post prints  Internal documents and maps  Management of digital collections  Preservation of materials – decentralized approach  Housing of teaching materials  Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objects Adapted from Slide by V. Chachra, VTLS

187 Fedora ™ Digital Object Architecture Persistent ID (PID) Disseminators SystemMetadata EAD, TEI, DC, MARC, VRA Core, MIX, etc. Datastreams Images, E-books, E-journals, Music, Video, etc. Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object The Mellon Fedora Project Adapted from Slide by V. Chachra, VTLS

188 Fedora™ Repository Web Service Exposure Layer Adapted from Slide by V. Chachra, VTLS

189

190

191

192

193

194

195

196 Digitization and Preservation Community and Activity (selected) Archivists worldwide International collaboration –Million book project in US, China, India (Reddy, Chen, Balakrishnan) US Library of Congress –Matching funds –American Memory –Infrastructure: NDIIP Dutch National Library + IBM Associations: ARL, DLF People –Harnad: Self-archiving movement –Lorie: Universal virtual computer –Gladney: technology, philosophy ( –Besser, Trant, …

197 DigArch Complexities: Document Models, Representations, and Accesses Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only Multilingual: content, summary, metadata Structured: MARC; SGML, HTML, XML Distributed collection: Kleisli, CIMI, Z39.50 Federated search: collecting, picking site(s), parallel search / fall-back, fusing results Access: IPR, payment, security, scenarios

198 DigArch Complexities: Multimedia Multiple media types, representations –Self-describing (structures), provenance Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real time), QoS, rendering –Popularity (e.g., PowerPoint) vs. longevity (SMIL?) JPEG, MPEG (and versions)

Program Document Document Document Program Program Image Image Image Video Video Video usersdigital objects ? ODL: Open Digital Library

200 ? Program Document Document Document Program Program Image Image Image Video Video Video ? digital library Monolithic and/or Custom-built web-based application

Program Document Document Document Program Program Image Image Image Video Video Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH

203 Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting

204 Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

205 Open Digital Library Deployments NDLTD ( Computer Science Teaching Center ( Computing and Information Technology Interactive Digital Educational Library ( Open Archives Distributed (NSF, DFG) – enhancements to PhysNet OCKHAM Open to others through DL-in-a-box

206 Open Digital Library Network of Extended Open Archives where each node acts as either a provider of data, services or both. Component = Node Protocol = Arc

207 Open Digital Library Components Running now –XML-File (data provider from file system) –Search: simple or in-memory (Essex) or generalized –Union, browse, recent, filter –E-journal/review, Submit, Edit, Annotation –Recommender, Rating; Mirroring (see JCDL’02) –Working with NCSA: from DB, unstructured text Others in process –Classification/categorization –Registry (and other connections with web services)

Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 ETD DL for the Networked Digital Library of Theses and Dissertations ( Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

209 OAI, ODL, DL-in-a-box Open Archives Initiative –since 1999, Open Digital Libraries –since 2001, from –with Hussein Suleman (now U. Cape Town) DL-in-a-box –NSDL support since 2001 –Aimed to help new collections / services projects –

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL