Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, Digital Library Research Laboratory, Dept. of CS Virginia Tech,

Similar presentations


Presentation on theme: "1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, Digital Library Research Laboratory, Dept. of CS Virginia Tech,"— Presentation transcript:

1 1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, fox@vt.edu Digital Library Research Laboratory, Dept. of CS Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/2004/ http://fox.cs.vt.edu/cv.htm

2 Acknowledgements (Selected) Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR- 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS

3 3 Acknowledgements: Faculty, Staff Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

4 4 Acknowledgements: Students Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …

5 5 For More Information Magazine: www.dlib.org Books: http://fox.cs.vt.edu/DLSB.html (1994) –MIT Press: Arms, plus by Borgman, Licklider (1965) –Morgan Kaufmann: Witten... (several), Lesk (2 nd edition) Conferences –ECDL: www.ecdl2005.org –ICADL: http://icadl2004.sjtu.edu.cn –JCDL: www.jcdl2005.org Associations –ASIS&T DL SIG –IEEE TCDL: www.ieee-tcdl.org (student awards, consortium) NSF: www.dli2.nsf.gov Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/

6 6

7 7 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL

8 8 Outline 1. 5S Framework for DL 1.1. Motivation: the problem –Hypotheses and research questions 1.2. Theory –5S: introduction, formal definitions –The formal ontology 1.3. Tools/Applications –Language –Visualization –Generation –Logging 1.4. Quality 1.5. Conclusions, Future Work

9 9 1.1. Motivation Digital Libraries (DLs): what are they?? –No definitional consensus –Conflicting views –Makes interoperability a hard problem DLs are not benefiting from formal theories as are other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lack of support for tailoring/customization Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. –Lack of specific DL models, formalisms, languages

10 10 Hypotheses A formal theory for DLs can be built based on 5S. The formalization can serve as a basis for modeling and building high- quality DLs.

11 11 Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?

12 12 1.2. Informal 5S Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

13 13 5Ss SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

14 14 Digital Objects (DOs) Born digital Digitized version of “real” object –Is the DO version the same, better, or worse? –Decision for ETDs: structured + rendered Surrogate for “real” object –Not covered explicitly in metamodel for a minimal DL –Crucial in metamodel for archaeology DL

15 15 Metadata Objects (MDOs) MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps

16 16 Other Key Definitions –coll, catalog, repository, service, archive, (minimal) DL –See Gon ç alves et al. in April 2004 ACM Transactions on Information Systems (TOIS)

17 17 5S and DL formal definitions and compositions (April 2004 TOIS)

18 18 Glossary: Concepts in the Minimal DL and Representing Symbols

19 19 5S Static / Passive Dynamic / Active

20 20 Digital Library Formal Ontology

21 21 Ontology: Applications Expand definition of minimal DL by characterizing –typical DL services –in the context of “employs” and “produces” relationships Use characterization to: –Reason about how DL services can be built from other DL components –As well as be composed with other services through extension or reuse

22 22 Ontology: Applications

23 23 Ontology: Taxonomy of Services Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting PreservationalCreational Add Value Repository-Building Information Satisfaction Services Infrastructure Services

24 24 Composition of key fundamental / infrastructure services

25 25 Composition of additional services

26 26 Approach

27 27 1.3. Tools/Applications

28 28 5SL: a DL design language Domain specific languages –Address a particular class of problems by offering specific abstractions and notations for the domain at hand –Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. XML-based realization of 5S –Interoperability –Use of many sub-languages (e.g., MIME types, XML Schemas, UML notations)

29 29 5SL – The Minimal DL Metamodel

30 30 <stream value=`ETDText'> <stream value=`ETDAudio'>... %XMLSchema% Example of Document declaration in the Structures Model <Attribute name='name‘ type='String'/> <Attribute name='ID‘ type='Integer'/> Converting Reviewing Cataloguing ……… Example of Actors declaration in the Societies Model Simple scenario for an NDLTD site searching service Patron InterfaceManager collection query InterfaceManager SearchManager collection query SearchManager InterfaceManager WtdSet …. Example of Service declaration in the Scenario Model

31 31 Help users model their own instances of a digital library (DL) in the 5S language (5SL). A simple modeling process which enables rapid generation of digital libraries Features –5SGraph loads and displays a metamodel in a structured toolbox. –The structured editor of 5SGraph provides a top- down visual building environment for the DL designer. –5SGraph produces syntactically correct 5SL files according to the visual model built by the designer. 5SGraph: A DL Modeling Tool

32 32 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)

33 33 5SGraph: Other Key Features Flexible and extensible architecture Reuse of models –Load, save, and change common (sub-) models Synchronization of views Enforcing of semantic constraints

34 34 5SGraph Evaluation: Usability Study

35 35 5SGen Version 1 -- MARIAN as the target system –Focused on rich structures: semantic networks –Behavior attached to nodes/links Version 2 -- Shifted for later work to componentized (ODL) approach –Focused on scenarios/societies –Structures/Spaces encapsulated within components (e.g., relational tables, indexes) –Only textual streams supported

36 36 5SLGen – Version 2: ODL, Services, Scenarios

37 37 5SLGen Proof of Concept: prototyping –CITIDEL –Viaduct –NDLTD Union Catalog –BDBComp

38 38 XML-based DL Log Standard Log analysis –is a source of information on: How patrons really use DL services How systems behave while supporting user information seeking activities Used to: –Evaluate and enhance services –Guide allocation of resources Common practice in the web setting –Supported by web servers, proxy caches DL Logging can be more detailed

39 39 DL Logging Features Captures high level user and system behaviors Organized according to the 5S framework –Hierarchical organization (XML-based) –Centered on the notions of events Record only events related to initial user inputs and final system outputs Help to understand user interactions and the perceived value of responses

40 40 The XML Log Format Log SessionIdMachineInfo StatementTransactionTimestamp SessionInfoRegisterInfo StatementEventTimestamp Action SearchBrowse StoreSysInfoUpdate SearchBy QueryString CatalogCollection PresentationInfo StatusInfo Timeout

41 41 1.4. Describing Quality in Digital Libraries What’s a “good” digital Library? –Central Concept: Quality! –Hypotheses of this work: Formal theory can help to define “what’s a good digital library” by: New formalizations of quality indicators for DLs within our 5S framework Contextualizing these measures within the Information Life Cycle

42 42 Quality Dimensions

43 43 Digital Objects: Accessibility A digital object is accessible by an DL actor or patron, if 1.it exists in the DL collections 2.is retrievable from the repository 3.it is not restricted from access –by metadata on rights –For actor or actor’s society

44 44 Digital Objects: Pertinence Inf(do i ) = information carried by a digital object or any of its descriptions IN(ac j ) = information need of an actor Context jk = an amalgam of societal factors which can impact the judgment of pertinence by ac j at time k. –Factors include time, place, the actor's history of interaction, task in hand, and factors implicit in the interaction and ambient environment.

45 45 Digital Objects: Pertinence The pertinence of a digital object to a user ac j is an indicator function Pertinence(do i, ac j ): Inf(do i )  IN(ac j )  Context jk defined as: –1, if Inf(do i ) is judged by ac j to be informative with regards to IN(ac j ) in context Context jk ; –0, otherwise

46 46 Digital Objects: Relevance Relevance (do i,q) 1, if do i is judge by external-judge to be relevant to q 0, otherwise Relevance Estimate –Rel(do i,q) = do i   dj  / |do i  |  |q  | Objective, public, social notion –Established by a general consensus in the field, not subjective, private judgment by an actor with an information need

47 47 Metadata Specifications and Metadata Format: Completeness Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. Completeness(ms x ) = 1 - (no. of missing attributes in ms x / total attributes of the schema to which ms x conforms)

48 48 Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog

49 49 Metadata Specifications and Metadata Format: Conformance An attribute att xy of a metadata specification ms x is cardinally conformant to a metadata format/standard if: –it appears at least once, if att xy is marked as mandatory; –its value is from the domain defined for att xy ; –it does not appear more than once, if it is not marked as repeatable. Conformance(ms x ) = (  (  attribute att xy of ms x ) degree of conformance of att xy )/ total attributes).

50 50 Metadata Specifications and Metadata Format: Conformance Based on ETD-MS

51 51 Services: Efficiency / Effectiveness Effectiveness –Very common measures: Precision, Recall, F1, 10- precision, R-Precision –Other services may have different measures: e.g., Recommending, etc. Efficiency –let t(e) be the time of an event e – let e ix and e fx be the initial and the final event of service se x. –For service se x, efficiency is defined as: Efficiency(se x ) = t(e fx ) - t(e ix )

52 52 Services: Extensibility & Reusability A service Y reuses a service X if the behavior of Y incorporates the behavior of X. A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events.

53 53 Services: Extensibility & Reusability (2) Macro-Reusability(Serv) = no. of reused services/ total number of services Micro-Reusability(Serv) = number of lines of code of managers that implement (run) reused services/ total lines of code

54 54 Services: Extensibility and Reusability Macro-Reusability = 4/16 = 0.25 Micro-Reusability = 3630 / 11910 = 0.304

55 55 Quality and the Information Life Cycle

56 56 Quality Model: Evaluation Focus groups –3 librarians –Major points Focus on DLs not traditional libraries Some indicators may have more theoretical than practical use in some contexts Liked minimalist approach Interesting and potentially useful mainly for education and evaluation

57 57 1.5. Conclusions We have answered the almost 40-year-old challenge of Licklider to build a unified CS / LIS theory by –Proposing and formalizing the first comprehensive formal framework for digital libraries Showed how to move from theory to practice by –Applying the framework to the problems of –Materializing these application into languages, tools, formats, etc. –Explaining and evaluating these applications (usability studies, focus groups, prototyping, etc.)

58 58 Future Work Theory –Apply to formally describe other systems –Complete formal definitions of all services with further events –Load axioms in knowledge base to automatically assess quality of models (correctness, etc.) Applications/Tools –Language Make different versions uniform Extend with METS, less complex scenario, society models New metamodels –Domain/application oriented (e.g., archaeology, education) –For traditional libraries

59 59 Future Work (cont’d) Applications/Tools –Visualization Integration with other tools –through Wizard New visualizations Applying as educational tool –Generation Use of Web services Incorporation of Native XML repositories Improvement of Scenario Algorithms – Logging Promote use Consider privacy issues New actions Deal with scalability issues

60 60 Future Work (cont’d) Quality –Development of more usage-oriented measures Current measures are mostly system-oriented Focus on log format and evaluation –Development of Quality ToolKit (5SQual) for DL managers with following features: Mapping tool to map local log format to standard XML Log format Components to implement all measures Visualization of data and measures Broken into several logical pieces to be used in the different phases of the information life cycle Others, e.g., personalization Create theories, tools, languages, methods for personalization based on 5S

61 61 2. DL Integration What is “DL Integration” –Hide distribution –Hide heterogeneity –Enable autonomy of individual component Why Integration –island-DLs –inability to seamlessly and transparently access knowledge across DLs Utilize various autonomous DLs in concert

62 62 Integration: Rationale We can read any paper book (ignoring limitations of language, vision, …). Scholarship requires access, analysis, and synthesis spanning disciplines and sources. New theories, systems, and services build upon our past accomplishments. Our “Small World” and the “Internet Age” demand that we, and our computers, work together and interoperate.

63 63 Integration: Urgency, Longevity If we collect, capture, acquire, or produce information, will it be usable in 100 years? NSF Digital Archiving Program Library of Congress National Digital Information Infrastructure and Preservation Program

64 64 Integration: Standards Standards don’t exist in many areas. Standards that do exist create a jumble: –Conversion between (without loss?) –Bridging gaps (Z39.50 -> OAI) –Managing legacy content and systems Standards in DLs have focused on: –Metadata (e.g., Dublin Core) –Architecture (e.g., handles, repositories)

65 65 Integration: Challenges “Semantic Web” is vision, not reality. How can we integrate without a theory? How can we interoperate without a common framework? How can we have a science of DLs if we lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?

66 66 Hypothesis and Research Questions The 5S framework provides effective solutions to DL integration. –Formally define the DL integration problem? –Guide integration of domain focused DLs? How to formally model such domain specific DLs? How to integrate formally defined DL models into a union DL model? How to use the union DL model to help design and implement high quality integrated DLs? –Assess the integration?

67 67 Related Work DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use SemInt has an example LSD has an example Interrelated with

68 68 DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use Interrelated with GA trained by DL integration formalization based on

69 69 Formal Definition of DL Integration DL i =(R i, DM i, Serv i, Soc i ), 1 i n –R i is a network accessible repository –DM i is a set of metadata catalogs for all collections –Serv i is a set of services –Soc i is a society UnionRep UnionCat UnionServices UnionSociety

70 70 Formal Definition of DL Integration (Cont.) DL integration problem definition: Given n individual libraries, integrate the n DLs to create a UnionDL.

71 71 Repository1 DL1 Repository2 Union Catalog Union Repository Catalog1Catalog2 Searching Union DLDL2 archaeologists Society General Public Society Archaeologists General Public Union Society Service Browsing Service Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Architecture of a Union DL

72 72 Example of Union Service: CitiViz

73 73 Integration of Domain Focused DLs Union archaeological metadata catalog generation Modeling archaeological DLs (ArchDLs) in the 5S framework ArchDL integration case study: ETANA-DL

74 74 Union Catalog Integration VN Metadata Format Global Metadata Format VN Catalog HD Catalog Union Catalog Mapping Tool Wrapper Mapping Tool Wrapper HD Metadata Format Virtual Nimrin (VN) Halif DigMaster (HD) Union ArchDL

75 75 Modeling ArchDLs in the 5S Framework Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services Minimal DL Minimal ArchDL

76 76 Digital Object Repository Collection Minimal DL Metadata Catalog Descriptive Metadata Specification A Minimal DL in the 5S Framework Structural Metadata Specification StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream

77 77 StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification SpaTemOrg StraDia Arch Descriptive Metadata specification ArchDO ArchObj ArchColl Arch Metadata catalog ArchDColl ArchDR Minimal ArchDL A Minimal ArchDL in the 5S Framework

78 78 Integration of Domain Focused DLs Modeling archaeological DLs (ArchDLs) in the 5S framework Union archaeological metadata catalog generation ArchDL integration case study: ETANA-DL

79 79 ETANA-DL Archaeological DL Integrated DL –Heterogeneous data handling Applies and extends the OAI-PMH –Open Archives Initiative Protocol for Metadata Handling Design considerations –Componentized –Extensible –Portable

80 80 5S Meta Model 5SGraph DL Expert DL Designer 5SL DL Model 5SLGen Practitioner Researcher Tailored DL Services Teacher c omponent pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Requirements (1) Analysis (2) Implementation (4) Design (3) 5SGraph5SGen Mapping Tool 5SSuite

81 81 5SGraph 5S Archaeology MetaModel ArchDL Expert ArchDL Designer Structure Sub-model ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing … Scenario Sub-model VN Metadata Format ETANA-DL Metadata Format HD Metadata Format Mapping Tool Wrapper4VNWrapper4HD Inverted Files Services DB Index Browse Service Search Service Browse DB Other ETANA-DL Services Web Interface XOAI VN Catalog HD Catalog Union Catalog 5SGen Component Pool Browsing …

82 82 ETANA-DL Architecture UsersServicesData ETANA-DL Union ServicesUsers DigBaseDigKit

83 83 ETANA-DL Architecture DigBase and DigKit Lahav Nimrin Umayri Hisban Megiddo Jalul New Sites DATABASEWRAPPERSDATABASEWRAPPERS ETANA-DL UNION CATALOG Search USERINTERFACEUSERINTERFACE Browse Recommend Note Personalize Review Visualizations Archaeology Specific Work in progress …

84 84 Assessment of Integrated DL Union catalog quality measurement Union service quality measurement Initial example

85 85 Union Catalog Quality Measurement Complete –All the catalogs to be integrated are complete. Consistent –All the catalogs to be integrated are consistent. –Each descriptive metadata specification in the union catalog describes only one digital object.

86 86 Union Catalog Quality Measurement (Cont.) Mapping-Completeness n is the total number of local schemas

87 87 Union Services Quality Measurement Internal quality measurement –Composability: reusability and extensibility External quality measurement –Searching: coverage q = –Browsing: knowledge-gain browse =

88 88 ArchDL1ArchDL2 UnionArchDL Site1 *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Sites Site2 *Sub-partition *Container*Artifact*Locus*Partition Artifacts Path(ArchDL1)=6 Path(ArchDL2)=2 Path(UnionArchDL) = (6+6+2) + 4*6*2=62 Browsing: knowledge-gain browse Site *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Knowledge-gain browse =

89 3. DL Overview Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration –wherein all can contribute and benefit –which leverages investment in networking –which provides useful content on Internet & WWW –which will tie nations and peoples together more strongly and through deeper understanding

90 90 Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community

91 91 Synchronous Scholarly Communication Same time, Same or different place

92 92 Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place

93 93 Digital Libraries Shorten the Chain from Editor Publisher A&I Consolidator Library Reviewer

94 94 DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian

95 Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore Note: we should consider 4 dimensions: computing, communications, content, and community (people)

96 96

97 97 AmericanSouth.Org – Roles, Content SOLINETLibraries (Data Providers)Scholars Intellectual Organization Controlled vocabulary Metadata extension development Collection Decisions Selection Criteria Controlled vocabulary Central Server MaintenanceLocal Server MaintenanceProvision of Context Metadata RepositoryMetadata Creation/MaintenanceOrganizational Structure and Annotation Tools Central Interface Design/MaintenanceLocal Interface Design/MaintenanceSelection of Other Annotation Tools Central Indices Creation/MaintenanceLocal IndicesSelection of Thesauri Coordination of Metadata Gateway Development Gateway ImplementationConcept Mapping Digital Objects

98 98 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal African-American cultural life64694123101872 Agricultural crisis of late 19 th century113114819 Codification of segregation laws13211816 Configuration of white supremacy13331920 Cultural values and activities31517415152071 Disenfranchising movements122121615 Educational movements61118621352798 Emergence of Holiness & Pentecostal Groups111710 Emergence of new musical forms311128 Emergence of organized groups expressing farmers concerns 221813 … ………………………… Total Each Format411451161381331379301831

99 99 Application Domain Related InstitutionsExamples Technical ChallengesBenefit / Impact Publishing Publishers, Eprint archives OAI Quality control, opennessAggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, CultureMuseumAMICO, PRDLA Digitization, describing, catalogingGlobal understanding Science Government, Academia, Commerce NVO, PDG, SwissProt, UK eScience,European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutionsCourt cases, patents Developing standardsStandardization, economic development History, Heritage FoundationsAmerican Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Cross- cutting Library, Archive Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness Reagan MooreEd FoxReagan MooreEd Fox June 2002for NSFJune 2002for NSF

100 100

101 101

102 102

103 103

104 104 As data, information, and knowledge play increasingly central roles … digital library research should focus on: Increasing the scope and scale of information resources and services; Employing context at the individual, community, and societal levels to improve performance; Developing algorithms and strategies for transforming data into actionable information; Demonstrating the integration of information spaces into everyday life; and Improving availability, accessibility, and, thereby, productivity.

105 105 An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions: Acquisition of new information resources; Effective access mechanisms that span media type, mode, and language; Facilities to leverage the utilization of humankind’s knowledge resources; Assured stewardship over humanity’s scholarly and cultural legacy; and Efficient and accountable management of systems, services, and resources.

106 106 4. OAI, OCKHAM, CSTC, NSDL, NDLTD: Open Archives Initiative Advocacy for interoperability Standard for transferring metadata among digital libraries –Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility Support for PMH => Open Archive (OA)

107 107 OAI = Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

108 108 OAI – Repository Perspective Required: Protocol DO MDO

109 109 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7

110 110 Tiered Model of Interoperability Mediator services Metadata harvesting Document models

111 111 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

112 112 LOCKSS Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels Initial content: journals Emory (Martin Halbert) –Help deploy and adapt –Help apply in other contexts Another registry Set of publisher manifests (information providers) Set of storage systems (archival storage) –NDIIP: AmericanSouth, MetaArchive

113 113 OCKHAM Library Network

114 114 OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement Funded by NSF in NSDL, with P2P

115 115 Lightweight Protocols “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. Successes of protocols considered lightweight is illuminating. Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

116 116 Reference Models Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement

117 117 OCKHAM Proposed Services Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry (plus others such as from adapted ODL)

118 CS -> CSTC -> CRIM NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/ College of NJ, U. Ill. Springfield, Virginia Tech Focus initially on labs, visualization, multimedia Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)

119 CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org

120 120

121 121 Browsing (1)

122 122 Browsing (2)

123 123

124 124

125 125

126 126 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections  www.citidel.org

127 www.CITIDEL.org Led by Virginia Tech, with co-PIs: –Fox (director, DL systems) –Lee (history) –Perez (user interface, Spanish support) Partners –College of New Jersey (Knox) –Hofstra (Impagliazzo) –Villanova (Cassel) –Penn State (Giles)

128 128 Multi-dimensional Categorization

129 129 Overview of CITIDEL architecture

130 130 Distributed repository structure

131 131 Digital library architecture for local and interoperable CITIDEL services

132 132 CITIDEL: Computing & Information Technology Interactive Digital Education Library

133 133

134 134

135 135

136 136

137 137

138 138 CITIDEL Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies OAI: Used to collect DL Resources and DL Interoperability XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) Perl: Component Integration ESSEX: Search Engine Functionality Very fast, utilizing in-memory processing Includes snap-shots for persistence Multi-scheming Integrates multiple classifications / views through maps, closure

139 139

140 140 Cluster Search Results from CITIDEL

141 141 Cluster NDLTD-Computing

142 142 CITIDEL + PIPE Adds Interaction Personalization to CITIDEL Automatically handles multi-modal conversion to Cell phone, PDA, Etc. Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

143 CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL National Science Digital Library www.nsdl.org (Next slides courtesy Lee Zia, NSF)

144 144 Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories;... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate;...

145 145 Supports: Users Content Tools (profiles) (metadata) (protocols) Learning communities Customizable collections Application services

146 146 Enables: Environments for Communication Collaboration Creation Validation Evaluation Recognition... Discovery Stability Reliability Reusability Interoperability Customizability... of Resources AND

147 147 NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks Pathways: large efforts across broad ranges of areas or approaches or users

148 148 Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy

149 149 Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms

150 150

151 151

152 152

153 153 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

154 A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org

155 NDLTD: How can a university get involved? Select planning/implementation team –Graduate School –Library –Computing / Information Technology –Institutional Research / Educ. Tech. Join online, give us contact names –www.ndltd.org/join Adapt Virginia Tech or other proven approach –Build interest and consensus –Start trial / allow optional submission

156 Student Gets Committee Signatures and Submits ETD Signed Grad School

157 Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD

158 158

159 159 ETD Union Collection (OAI)

160 160 Union catalog: OCLC OCLC will expand OAI data provider on TDs. Is getting data from WorldCat (so, from many sites!). Will harvest from all others who contact them. Need DC and either ETD-MS or MARC. Has a set for ETDs.

161 161

162 162

163 163

164 164 OCLC SRU Interface

165 165 Union catalog: VTLS, VT VTLS will enhance search/browse service for ETDs –Will harvest from OCLC’s set of ETD records –Will receive through other mechanisms –Will work with MARC-21 and ETD-MS VT will continue to offer experimental services

166 166

167 167 ETD Union Search Mirror Site in China (CALIS) (http://ndltd.calis.edu.cn – popular site!)

168 168

169 169 VTLS Union Catalog Content Languages The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow

170 170 Language = German; hits = 137

171 171 Full record display

172 172

173 173

174 174 Complex to Simple MARC ($50)Dublin Core (DC) + thesis

175 175

176 176 Why ETD? Short Answer For Students: –Gain knowledge and skills for the Information Age –Richer communication (digital information, multimedia, …) For Universities: –Easy way to enter the digital library field and benefit thereby For the World: –Global digital library – large, useful, many services General: –Save time and money –Increased visibility for all associated with research results

177 177 5. Open Source, Repositories, DigArch, ODL Open Source DL Examples Eprints (www.eprints.org) Fedora Greenstone (www.greenstone.org) Many systems in NSF DLI projects VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD

178 178

179 179

180 180

181 181

182 182

183 183

184 184

185 185 What is a Digital Object Repository?  Also called: digital rep., digital asset rep., institutional repository  Stores and maintains digital objects (assets)  Provides external interface for Digital Objects  Creation, Modification, Access  Enforces access policies  Provides for content type disseminations Adapted from Slide by V. Chachra, VTLS

186 186 Goals of Institutional Repositories (by Steven Harnad, U. Southampton)  Self Archiving of Institutional Research  Thesis and Dissertations (VTLS NDLTD Project)  Article preprints and post prints  Internal documents and maps  Management of digital collections  Preservation of materials – decentralized approach  Housing of teaching materials  Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objects Adapted from Slide by V. Chachra, VTLS

187 187 Fedora ™ Digital Object Architecture Persistent ID (PID) Disseminators SystemMetadata EAD, TEI, DC, MARC, VRA Core, MIX, etc. Datastreams Images, E-books, E-journals, Music, Video, etc. Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object The Mellon Fedora Project Adapted from Slide by V. Chachra, VTLS

188 188 Fedora™ Repository Web Service Exposure Layer Adapted from Slide by V. Chachra, VTLS

189 189

190 190

191 191

192 192

193 193

194 194

195 195

196 196 Digitization and Preservation Community and Activity (selected) Archivists worldwide International collaboration –Million book project in US, China, India (Reddy, Chen, Balakrishnan) US Library of Congress –Matching funds –American Memory –Infrastructure: NDIIP Dutch National Library + IBM Associations: ARL, DLF People –Harnad: Self-archiving movement –Lorie: Universal virtual computer –Gladney: technology, philosophy (http://home.pacbell.net/hgladney/ddq_3_1.htm) –Besser, Trant, …

197 197 DigArch Complexities: Document Models, Representations, and Accesses Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only Multilingual: content, summary, metadata Structured: MARC; SGML, HTML, XML Distributed collection: Kleisli, CIMI, Z39.50 Federated search: collecting, picking site(s), parallel search / fall-back, fusing results Access: IPR, payment, security, scenarios

198 198 DigArch Complexities: Multimedia Multiple media types, representations –Self-describing (structures), provenance Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real time), QoS, rendering –Popularity (e.g., PowerPoint) vs. longevity (SMIL?) JPEG, MPEG (and versions)

199 199 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ? ODL: Open Digital Library

200 200 ? 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video ? digital library Monolithic and/or Custom-built web-based application

201 201 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

202 202 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video open digital library OA PMH XPMH

203 203 Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting

204 204 Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

205 205 Open Digital Library Deployments NDLTD (www.ndltd.org) Computer Science Teaching Center (www.cstc.org) Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) Open Archives Distributed (NSF, DFG) – enhancements to PhysNet OCKHAM Open to others through DL-in-a-box

206 206 Open Digital Library Network of Extended Open Archives where each node acts as either a provider of data, services or both. Component = Node Protocol = Arc

207 207 Open Digital Library Components Running now –XML-File (data provider from file system) –Search: simple or in-memory (Essex) or generalized –Union, browse, recent, filter –E-journal/review, Submit, Edit, Annotation –Recommender, Rating; Mirroring (see JCDL’02) –Working with NCSA: from DB, unstructured text Others in process –Classification/categorization –Registry (and other connections with web services)

208 208 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 ETD-1 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 ETD-2 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 ETD-3 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 ETD-4 ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

209 209 OAI, ODL, DL-in-a-box Open Archives Initiative –since 1999, www.openarchives.org Open Digital Libraries –since 2001, from www.dlib.vt.edu –with Hussein Suleman (now U. Cape Town) DL-in-a-box –NSDL support since 2001 –Aimed to help new collections / services projects –http://dlbox.nudl.org

210 210

211 211

212 212

213 213

214 214

215 215

216 216

217 217

218 218

219 219

220 220

221 221

222 222

223 223

224 224 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL


Download ppt "1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, Digital Library Research Laboratory, Dept. of CS Virginia Tech,"

Similar presentations


Ads by Google