Download presentation
Presentation is loading. Please wait.
Published byRose Watkins Modified over 9 years ago
1
1 ICADL 2004 Tutorial Digital Library: Overview and Framework Edward A. Fox, fox@vt.edu Digital Library Research Laboratory, Dept. of CS Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/2004/ http://fox.cs.vt.edu/cv.htm
2
Acknowledgements (Selected) Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR- 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
3
3 Acknowledgements: Faculty, Staff Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
4
4 Acknowledgements: Students Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …
5
5 For More Information Magazine: www.dlib.org Books: http://fox.cs.vt.edu/DLSB.html (1994) –MIT Press: Arms, plus by Borgman, Licklider (1965) –Morgan Kaufmann: Witten... (several), Lesk (2 nd edition) Conferences –ECDL: www.ecdl2005.org –ICADL: http://icadl2004.sjtu.edu.cn –JCDL: www.jcdl2005.org Associations –ASIS&T DL SIG –IEEE TCDL: www.ieee-tcdl.org (student awards, consortium) NSF: www.dli2.nsf.gov Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/
6
6
7
7 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL
8
8 Outline 1. 5S Framework for DL 1.1. Motivation: the problem –Hypotheses and research questions 1.2. Theory –5S: introduction, formal definitions –The formal ontology 1.3. Tools/Applications –Language –Visualization –Generation –Logging 1.4. Quality 1.5. Conclusions, Future Work
9
9 1.1. Motivation Digital Libraries (DLs): what are they?? –No definitional consensus –Conflicting views –Makes interoperability a hard problem DLs are not benefiting from formal theories as are other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lack of support for tailoring/customization Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. –Lack of specific DL models, formalisms, languages
10
10 Hypotheses A formal theory for DLs can be built based on 5S. The formalization can serve as a basis for modeling and building high- quality DLs.
11
11 Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
12
12 1.2. Informal 5S Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
13
13 5Ss SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
14
14 Digital Objects (DOs) Born digital Digitized version of “real” object –Is the DO version the same, better, or worse? –Decision for ETDs: structured + rendered Surrogate for “real” object –Not covered explicitly in metamodel for a minimal DL –Crucial in metamodel for archaeology DL
15
15 Metadata Objects (MDOs) MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps
16
16 Other Key Definitions –coll, catalog, repository, service, archive, (minimal) DL –See Gon ç alves et al. in April 2004 ACM Transactions on Information Systems (TOIS)
17
17 5S and DL formal definitions and compositions (April 2004 TOIS)
18
18 Glossary: Concepts in the Minimal DL and Representing Symbols
19
19 5S Static / Passive Dynamic / Active
20
20 Digital Library Formal Ontology
21
21 Ontology: Applications Expand definition of minimal DL by characterizing –typical DL services –in the context of “employs” and “produces” relationships Use characterization to: –Reason about how DL services can be built from other DL components –As well as be composed with other services through extension or reuse
22
22 Ontology: Applications
23
23 Ontology: Taxonomy of Services Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting PreservationalCreational Add Value Repository-Building Information Satisfaction Services Infrastructure Services
24
24 Composition of key fundamental / infrastructure services
25
25 Composition of additional services
26
26 Approach
27
27 1.3. Tools/Applications
28
28 5SL: a DL design language Domain specific languages –Address a particular class of problems by offering specific abstractions and notations for the domain at hand –Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. XML-based realization of 5S –Interoperability –Use of many sub-languages (e.g., MIME types, XML Schemas, UML notations)
29
29 5SL – The Minimal DL Metamodel
30
30 <stream value=`ETDText'> <stream value=`ETDAudio'>... %XMLSchema% Example of Document declaration in the Structures Model <Attribute name='name‘ type='String'/> <Attribute name='ID‘ type='Integer'/> Converting Reviewing Cataloguing ……… Example of Actors declaration in the Societies Model Simple scenario for an NDLTD site searching service Patron InterfaceManager collection query InterfaceManager SearchManager collection query SearchManager InterfaceManager WtdSet …. Example of Service declaration in the Scenario Model
31
31 Help users model their own instances of a digital library (DL) in the 5S language (5SL). A simple modeling process which enables rapid generation of digital libraries Features –5SGraph loads and displays a metamodel in a structured toolbox. –The structured editor of 5SGraph provides a top- down visual building environment for the DL designer. –5SGraph produces syntactically correct 5SL files according to the visual model built by the designer. 5SGraph: A DL Modeling Tool
32
32 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)
33
33 5SGraph: Other Key Features Flexible and extensible architecture Reuse of models –Load, save, and change common (sub-) models Synchronization of views Enforcing of semantic constraints
34
34 5SGraph Evaluation: Usability Study
35
35 5SGen Version 1 -- MARIAN as the target system –Focused on rich structures: semantic networks –Behavior attached to nodes/links Version 2 -- Shifted for later work to componentized (ODL) approach –Focused on scenarios/societies –Structures/Spaces encapsulated within components (e.g., relational tables, indexes) –Only textual streams supported
36
36 5SLGen – Version 2: ODL, Services, Scenarios
37
37 5SLGen Proof of Concept: prototyping –CITIDEL –Viaduct –NDLTD Union Catalog –BDBComp
38
38 XML-based DL Log Standard Log analysis –is a source of information on: How patrons really use DL services How systems behave while supporting user information seeking activities Used to: –Evaluate and enhance services –Guide allocation of resources Common practice in the web setting –Supported by web servers, proxy caches DL Logging can be more detailed
39
39 DL Logging Features Captures high level user and system behaviors Organized according to the 5S framework –Hierarchical organization (XML-based) –Centered on the notions of events Record only events related to initial user inputs and final system outputs Help to understand user interactions and the perceived value of responses
40
40 The XML Log Format Log SessionIdMachineInfo StatementTransactionTimestamp SessionInfoRegisterInfo StatementEventTimestamp Action SearchBrowse StoreSysInfoUpdate SearchBy QueryString CatalogCollection PresentationInfo StatusInfo Timeout
41
41 1.4. Describing Quality in Digital Libraries What’s a “good” digital Library? –Central Concept: Quality! –Hypotheses of this work: Formal theory can help to define “what’s a good digital library” by: New formalizations of quality indicators for DLs within our 5S framework Contextualizing these measures within the Information Life Cycle
42
42 Quality Dimensions
43
43 Digital Objects: Accessibility A digital object is accessible by an DL actor or patron, if 1.it exists in the DL collections 2.is retrievable from the repository 3.it is not restricted from access –by metadata on rights –For actor or actor’s society
44
44 Digital Objects: Pertinence Inf(do i ) = information carried by a digital object or any of its descriptions IN(ac j ) = information need of an actor Context jk = an amalgam of societal factors which can impact the judgment of pertinence by ac j at time k. –Factors include time, place, the actor's history of interaction, task in hand, and factors implicit in the interaction and ambient environment.
45
45 Digital Objects: Pertinence The pertinence of a digital object to a user ac j is an indicator function Pertinence(do i, ac j ): Inf(do i ) IN(ac j ) Context jk defined as: –1, if Inf(do i ) is judged by ac j to be informative with regards to IN(ac j ) in context Context jk ; –0, otherwise
46
46 Digital Objects: Relevance Relevance (do i,q) 1, if do i is judge by external-judge to be relevant to q 0, otherwise Relevance Estimate –Rel(do i,q) = do i dj / |do i | |q | Objective, public, social notion –Established by a general consensus in the field, not subjective, private judgment by an actor with an information need
47
47 Metadata Specifications and Metadata Format: Completeness Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. Completeness(ms x ) = 1 - (no. of missing attributes in ms x / total attributes of the schema to which ms x conforms)
48
48 Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog
49
49 Metadata Specifications and Metadata Format: Conformance An attribute att xy of a metadata specification ms x is cardinally conformant to a metadata format/standard if: –it appears at least once, if att xy is marked as mandatory; –its value is from the domain defined for att xy ; –it does not appear more than once, if it is not marked as repeatable. Conformance(ms x ) = ( ( attribute att xy of ms x ) degree of conformance of att xy )/ total attributes).
50
50 Metadata Specifications and Metadata Format: Conformance Based on ETD-MS
51
51 Services: Efficiency / Effectiveness Effectiveness –Very common measures: Precision, Recall, F1, 10- precision, R-Precision –Other services may have different measures: e.g., Recommending, etc. Efficiency –let t(e) be the time of an event e – let e ix and e fx be the initial and the final event of service se x. –For service se x, efficiency is defined as: Efficiency(se x ) = t(e fx ) - t(e ix )
52
52 Services: Extensibility & Reusability A service Y reuses a service X if the behavior of Y incorporates the behavior of X. A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events.
53
53 Services: Extensibility & Reusability (2) Macro-Reusability(Serv) = no. of reused services/ total number of services Micro-Reusability(Serv) = number of lines of code of managers that implement (run) reused services/ total lines of code
54
54 Services: Extensibility and Reusability Macro-Reusability = 4/16 = 0.25 Micro-Reusability = 3630 / 11910 = 0.304
55
55 Quality and the Information Life Cycle
56
56 Quality Model: Evaluation Focus groups –3 librarians –Major points Focus on DLs not traditional libraries Some indicators may have more theoretical than practical use in some contexts Liked minimalist approach Interesting and potentially useful mainly for education and evaluation
57
57 1.5. Conclusions We have answered the almost 40-year-old challenge of Licklider to build a unified CS / LIS theory by –Proposing and formalizing the first comprehensive formal framework for digital libraries Showed how to move from theory to practice by –Applying the framework to the problems of –Materializing these application into languages, tools, formats, etc. –Explaining and evaluating these applications (usability studies, focus groups, prototyping, etc.)
58
58 Future Work Theory –Apply to formally describe other systems –Complete formal definitions of all services with further events –Load axioms in knowledge base to automatically assess quality of models (correctness, etc.) Applications/Tools –Language Make different versions uniform Extend with METS, less complex scenario, society models New metamodels –Domain/application oriented (e.g., archaeology, education) –For traditional libraries
59
59 Future Work (cont’d) Applications/Tools –Visualization Integration with other tools –through Wizard New visualizations Applying as educational tool –Generation Use of Web services Incorporation of Native XML repositories Improvement of Scenario Algorithms – Logging Promote use Consider privacy issues New actions Deal with scalability issues
60
60 Future Work (cont’d) Quality –Development of more usage-oriented measures Current measures are mostly system-oriented Focus on log format and evaluation –Development of Quality ToolKit (5SQual) for DL managers with following features: Mapping tool to map local log format to standard XML Log format Components to implement all measures Visualization of data and measures Broken into several logical pieces to be used in the different phases of the information life cycle Others, e.g., personalization Create theories, tools, languages, methods for personalization based on 5S
61
61 2. DL Integration What is “DL Integration” –Hide distribution –Hide heterogeneity –Enable autonomy of individual component Why Integration –island-DLs –inability to seamlessly and transparently access knowledge across DLs Utilize various autonomous DLs in concert
62
62 Integration: Rationale We can read any paper book (ignoring limitations of language, vision, …). Scholarship requires access, analysis, and synthesis spanning disciplines and sources. New theories, systems, and services build upon our past accomplishments. Our “Small World” and the “Internet Age” demand that we, and our computers, work together and interoperate.
63
63 Integration: Urgency, Longevity If we collect, capture, acquire, or produce information, will it be usable in 100 years? NSF Digital Archiving Program Library of Congress National Digital Information Infrastructure and Preservation Program
64
64 Integration: Standards Standards don’t exist in many areas. Standards that do exist create a jumble: –Conversion between (without loss?) –Bridging gaps (Z39.50 -> OAI) –Managing legacy content and systems Standards in DLs have focused on: –Metadata (e.g., Dublin Core) –Architecture (e.g., handles, repositories)
65
65 Integration: Challenges “Semantic Web” is vision, not reality. How can we integrate without a theory? How can we interoperate without a common framework? How can we have a science of DLs if we lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?
66
66 Hypothesis and Research Questions The 5S framework provides effective solutions to DL integration. –Formally define the DL integration problem? –Guide integration of domain focused DLs? How to formally model such domain specific DLs? How to integrate formally defined DL models into a union DL model? How to use the union DL model to help design and implement high quality integrated DLs? –Assess the integration?
67
67 Related Work DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use SemInt has an example LSD has an example Interrelated with
68
68 DL interoperability approach Intermediary-basedmapping-based Consists of mediatorwrapperagent use two architectures federationUnion Archiving used in Consists of hybrid mappercomposite mapper use schema mapping use Interrelated with GA trained by DL integration formalization based on
69
69 Formal Definition of DL Integration DL i =(R i, DM i, Serv i, Soc i ), 1 i n –R i is a network accessible repository –DM i is a set of metadata catalogs for all collections –Serv i is a set of services –Soc i is a society UnionRep UnionCat UnionServices UnionSociety
70
70 Formal Definition of DL Integration (Cont.) DL integration problem definition: Given n individual libraries, integrate the n DLs to create a UnionDL.
71
71 Repository1 DL1 Repository2 Union Catalog Union Repository Catalog1Catalog2 Searching Union DLDL2 archaeologists Society General Public Society Archaeologists General Public Union Society Service Browsing Service Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Architecture of a Union DL
72
72 Example of Union Service: CitiViz
73
73 Integration of Domain Focused DLs Union archaeological metadata catalog generation Modeling archaeological DLs (ArchDLs) in the 5S framework ArchDL integration case study: ETANA-DL
74
74 Union Catalog Integration VN Metadata Format Global Metadata Format VN Catalog HD Catalog Union Catalog Mapping Tool Wrapper Mapping Tool Wrapper HD Metadata Format Virtual Nimrin (VN) Halif DigMaster (HD) Union ArchDL
75
75 Modeling ArchDLs in the 5S Framework Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services Minimal DL Minimal ArchDL
76
76 Digital Object Repository Collection Minimal DL Metadata Catalog Descriptive Metadata Specification A Minimal DL in the 5S Framework Structural Metadata Specification StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream
77
77 StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification SpaTemOrg StraDia Arch Descriptive Metadata specification ArchDO ArchObj ArchColl Arch Metadata catalog ArchDColl ArchDR Minimal ArchDL A Minimal ArchDL in the 5S Framework
78
78 Integration of Domain Focused DLs Modeling archaeological DLs (ArchDLs) in the 5S framework Union archaeological metadata catalog generation ArchDL integration case study: ETANA-DL
79
79 ETANA-DL Archaeological DL Integrated DL –Heterogeneous data handling Applies and extends the OAI-PMH –Open Archives Initiative Protocol for Metadata Handling Design considerations –Componentized –Extensible –Portable
80
80 5S Meta Model 5SGraph DL Expert DL Designer 5SL DL Model 5SLGen Practitioner Researcher Tailored DL Services Teacher c omponent pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Requirements (1) Analysis (2) Implementation (4) Design (3) 5SGraph5SGen Mapping Tool 5SSuite
81
81 5SGraph 5S Archaeology MetaModel ArchDL Expert ArchDL Designer Structure Sub-model ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing … Scenario Sub-model VN Metadata Format ETANA-DL Metadata Format HD Metadata Format Mapping Tool Wrapper4VNWrapper4HD Inverted Files Services DB Index Browse Service Search Service Browse DB Other ETANA-DL Services Web Interface XOAI VN Catalog HD Catalog Union Catalog 5SGen Component Pool Browsing …
82
82 ETANA-DL Architecture UsersServicesData ETANA-DL Union ServicesUsers DigBaseDigKit
83
83 ETANA-DL Architecture DigBase and DigKit Lahav Nimrin Umayri Hisban Megiddo Jalul New Sites DATABASEWRAPPERSDATABASEWRAPPERS ETANA-DL UNION CATALOG Search USERINTERFACEUSERINTERFACE Browse Recommend Note Personalize Review Visualizations Archaeology Specific Work in progress …
84
84 Assessment of Integrated DL Union catalog quality measurement Union service quality measurement Initial example
85
85 Union Catalog Quality Measurement Complete –All the catalogs to be integrated are complete. Consistent –All the catalogs to be integrated are consistent. –Each descriptive metadata specification in the union catalog describes only one digital object.
86
86 Union Catalog Quality Measurement (Cont.) Mapping-Completeness n is the total number of local schemas
87
87 Union Services Quality Measurement Internal quality measurement –Composability: reusability and extensibility External quality measurement –Searching: coverage q = –Browsing: knowledge-gain browse =
88
88 ArchDL1ArchDL2 UnionArchDL Site1 *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Sites Site2 *Sub-partition *Container*Artifact*Locus*Partition Artifacts Path(ArchDL1)=6 Path(ArchDL2)=2 Path(UnionArchDL) = (6+6+2) + 4*6*2=62 Browsing: knowledge-gain browse Site *Sub-partition *Container*Artifact*Locus*Partition Bone*BoneName Knowledge-gain browse =
89
3. DL Overview Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration –wherein all can contribute and benefit –which leverages investment in networking –which provides useful content on Internet & WWW –which will tie nations and peoples together more strongly and through deeper understanding
90
90 Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community
91
91 Synchronous Scholarly Communication Same time, Same or different place
92
92 Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place
93
93 Digital Libraries Shorten the Chain from Editor Publisher A&I Consolidator Library Reviewer
94
94 DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian
95
Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore Note: we should consider 4 dimensions: computing, communications, content, and community (people)
96
96
97
97 AmericanSouth.Org – Roles, Content SOLINETLibraries (Data Providers)Scholars Intellectual Organization Controlled vocabulary Metadata extension development Collection Decisions Selection Criteria Controlled vocabulary Central Server MaintenanceLocal Server MaintenanceProvision of Context Metadata RepositoryMetadata Creation/MaintenanceOrganizational Structure and Annotation Tools Central Interface Design/MaintenanceLocal Interface Design/MaintenanceSelection of Other Annotation Tools Central Indices Creation/MaintenanceLocal IndicesSelection of Thesauri Coordination of Metadata Gateway Development Gateway ImplementationConcept Mapping Digital Objects
98
98 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal African-American cultural life64694123101872 Agricultural crisis of late 19 th century113114819 Codification of segregation laws13211816 Configuration of white supremacy13331920 Cultural values and activities31517415152071 Disenfranchising movements122121615 Educational movements61118621352798 Emergence of Holiness & Pentecostal Groups111710 Emergence of new musical forms311128 Emergence of organized groups expressing farmers concerns 221813 … ………………………… Total Each Format411451161381331379301831
99
99 Application Domain Related InstitutionsExamples Technical ChallengesBenefit / Impact Publishing Publishers, Eprint archives OAI Quality control, opennessAggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, CultureMuseumAMICO, PRDLA Digitization, describing, catalogingGlobal understanding Science Government, Academia, Commerce NVO, PDG, SwissProt, UK eScience,European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutionsCourt cases, patents Developing standardsStandardization, economic development History, Heritage FoundationsAmerican Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Cross- cutting Library, Archive Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness Reagan MooreEd FoxReagan MooreEd Fox June 2002for NSFJune 2002for NSF
100
100
101
101
102
102
103
103
104
104 As data, information, and knowledge play increasingly central roles … digital library research should focus on: Increasing the scope and scale of information resources and services; Employing context at the individual, community, and societal levels to improve performance; Developing algorithms and strategies for transforming data into actionable information; Demonstrating the integration of information spaces into everyday life; and Improving availability, accessibility, and, thereby, productivity.
105
105 An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions: Acquisition of new information resources; Effective access mechanisms that span media type, mode, and language; Facilities to leverage the utilization of humankind’s knowledge resources; Assured stewardship over humanity’s scholarly and cultural legacy; and Efficient and accountable management of systems, services, and resources.
106
106 4. OAI, OCKHAM, CSTC, NSDL, NDLTD: Open Archives Initiative Advocacy for interoperability Standard for transferring metadata among digital libraries –Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility Support for PMH => Open Archive (OA)
107
107 OAI = Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums
108
108 OAI – Repository Perspective Required: Protocol DO MDO
109
109 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7
110
110 Tiered Model of Interoperability Mediator services Metadata harvesting Document models
111
111 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI
112
112 LOCKSS Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels Initial content: journals Emory (Martin Halbert) –Help deploy and adapt –Help apply in other contexts Another registry Set of publisher manifests (information providers) Set of storage systems (archival storage) –NDIIP: AmericanSouth, MetaArchive
113
113 OCKHAM Library Network
114
114 OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement Funded by NSF in NSDL, with P2P
115
115 Lightweight Protocols “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. Successes of protocols considered lightweight is illuminating. Examples: TCP/IP, HTTP, LDAP, and the OAI PMH
116
116 Reference Models Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement
117
117 OCKHAM Proposed Services Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry (plus others such as from adapted ODL)
118
CS -> CSTC -> CRIM NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/ College of NJ, U. Ill. Springfield, Virginia Tech Focus initially on labs, visualization, multimedia Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)
119
CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org
120
120
121
121 Browsing (1)
122
122 Browsing (2)
123
123
124
124
125
125
126
126 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections www.citidel.org
127
www.CITIDEL.org Led by Virginia Tech, with co-PIs: –Fox (director, DL systems) –Lee (history) –Perez (user interface, Spanish support) Partners –College of New Jersey (Knox) –Hofstra (Impagliazzo) –Villanova (Cassel) –Penn State (Giles)
128
128 Multi-dimensional Categorization
129
129 Overview of CITIDEL architecture
130
130 Distributed repository structure
131
131 Digital library architecture for local and interoperable CITIDEL services
132
132 CITIDEL: Computing & Information Technology Interactive Digital Education Library
133
133
134
134
135
135
136
136
137
137
138
138 CITIDEL Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies OAI: Used to collect DL Resources and DL Interoperability XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) Perl: Component Integration ESSEX: Search Engine Functionality Very fast, utilizing in-memory processing Includes snap-shots for persistence Multi-scheming Integrates multiple classifications / views through maps, closure
139
139
140
140 Cluster Search Results from CITIDEL
141
141 Cluster NDLTD-Computing
142
142 CITIDEL + PIPE Adds Interaction Personalization to CITIDEL Automatically handles multi-modal conversion to Cell phone, PDA, Etc. Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.
143
CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL National Science Digital Library www.nsdl.org (Next slides courtesy Lee Zia, NSF)
144
144 Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories;... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate;...
145
145 Supports: Users Content Tools (profiles) (metadata) (protocols) Learning communities Customizable collections Application services
146
146 Enables: Environments for Communication Collaboration Creation Validation Evaluation Recognition... Discovery Stability Reliability Reusability Interoperability Customizability... of Resources AND
147
147 NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks Pathways: large efforts across broad ranges of areas or approaches or users
148
148 Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy
149
149 Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms
150
150
151
151
152
152
153
153 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”
154
A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
155
NDLTD: How can a university get involved? Select planning/implementation team –Graduate School –Library –Computing / Information Technology –Institutional Research / Educ. Tech. Join online, give us contact names –www.ndltd.org/join Adapt Virginia Tech or other proven approach –Build interest and consensus –Start trial / allow optional submission
156
Student Gets Committee Signatures and Submits ETD Signed Grad School
157
Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD
158
158
159
159 ETD Union Collection (OAI)
160
160 Union catalog: OCLC OCLC will expand OAI data provider on TDs. Is getting data from WorldCat (so, from many sites!). Will harvest from all others who contact them. Need DC and either ETD-MS or MARC. Has a set for ETDs.
161
161
162
162
163
163
164
164 OCLC SRU Interface
165
165 Union catalog: VTLS, VT VTLS will enhance search/browse service for ETDs –Will harvest from OCLC’s set of ETD records –Will receive through other mechanisms –Will work with MARC-21 and ETD-MS VT will continue to offer experimental services
166
166
167
167 ETD Union Search Mirror Site in China (CALIS) (http://ndltd.calis.edu.cn – popular site!)
168
168
169
169 VTLS Union Catalog Content Languages The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow
170
170 Language = German; hits = 137
171
171 Full record display
172
172
173
173
174
174 Complex to Simple MARC ($50)Dublin Core (DC) + thesis
175
175
176
176 Why ETD? Short Answer For Students: –Gain knowledge and skills for the Information Age –Richer communication (digital information, multimedia, …) For Universities: –Easy way to enter the digital library field and benefit thereby For the World: –Global digital library – large, useful, many services General: –Save time and money –Increased visibility for all associated with research results
177
177 5. Open Source, Repositories, DigArch, ODL Open Source DL Examples Eprints (www.eprints.org) Fedora Greenstone (www.greenstone.org) Many systems in NSF DLI projects VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD
178
178
179
179
180
180
181
181
182
182
183
183
184
184
185
185 What is a Digital Object Repository? Also called: digital rep., digital asset rep., institutional repository Stores and maintains digital objects (assets) Provides external interface for Digital Objects Creation, Modification, Access Enforces access policies Provides for content type disseminations Adapted from Slide by V. Chachra, VTLS
186
186 Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Self Archiving of Institutional Research Thesis and Dissertations (VTLS NDLTD Project) Article preprints and post prints Internal documents and maps Management of digital collections Preservation of materials – decentralized approach Housing of teaching materials Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objects Adapted from Slide by V. Chachra, VTLS
187
187 Fedora ™ Digital Object Architecture Persistent ID (PID) Disseminators SystemMetadata EAD, TEI, DC, MARC, VRA Core, MIX, etc. Datastreams Images, E-books, E-journals, Music, Video, etc. Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Internal view: metadata necessary to manage the object Protected view: content that makes up the “basis” of the object The Mellon Fedora Project Adapted from Slide by V. Chachra, VTLS
188
188 Fedora™ Repository Web Service Exposure Layer Adapted from Slide by V. Chachra, VTLS
189
189
190
190
191
191
192
192
193
193
194
194
195
195
196
196 Digitization and Preservation Community and Activity (selected) Archivists worldwide International collaboration –Million book project in US, China, India (Reddy, Chen, Balakrishnan) US Library of Congress –Matching funds –American Memory –Infrastructure: NDIIP Dutch National Library + IBM Associations: ARL, DLF People –Harnad: Self-archiving movement –Lorie: Universal virtual computer –Gladney: technology, philosophy (http://home.pacbell.net/hgladney/ddq_3_1.htm) –Besser, Trant, …
197
197 DigArch Complexities: Document Models, Representations, and Accesses Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only Multilingual: content, summary, metadata Structured: MARC; SGML, HTML, XML Distributed collection: Kleisli, CIMI, Z39.50 Federated search: collecting, picking site(s), parallel search / fall-back, fusing results Access: IPR, payment, security, scenarios
198
198 DigArch Complexities: Multimedia Multiple media types, representations –Self-describing (structures), provenance Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real time), QoS, rendering –Popularity (e.g., PowerPoint) vs. longevity (SMIL?) JPEG, MPEG (and versions)
199
199 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ? ODL: Open Digital Library
200
200 ? 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video ? digital library Monolithic and/or Custom-built web-based application
201
201 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
202
202 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video open digital library OA PMH XPMH
203
203 Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting
204
204 Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE
205
205 Open Digital Library Deployments NDLTD (www.ndltd.org) Computer Science Teaching Center (www.cstc.org) Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) Open Archives Distributed (NSF, DFG) – enhancements to PhysNet OCKHAM Open to others through DL-in-a-box
206
206 Open Digital Library Network of Extended Open Archives where each node acts as either a provider of data, services or both. Component = Node Protocol = Arc
207
207 Open Digital Library Components Running now –XML-File (data provider from file system) –Search: simple or in-memory (Essex) or generalized –Union, browse, recent, filter –E-journal/review, Submit, Edit, Annotation –Recommender, Rating; Mirroring (see JCDL’02) –Working with NCSA: from DB, unstructured text Others in process –Classification/categorization –Registry (and other connections with web services)
208
208 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 ETD-1 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 ETD-2 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 ETD-3 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 ETD-4 ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library
209
209 OAI, ODL, DL-in-a-box Open Archives Initiative –since 1999, www.openarchives.org Open Digital Libraries –since 2001, from www.dlib.vt.edu –with Hussein Suleman (now U. Cape Town) DL-in-a-box –NSDL support since 2001 –Aimed to help new collections / services projects –http://dlbox.nudl.org
210
210
211
211
212
212
213
213
214
214
215
215
216
216
217
217
218
218
219
219
220
220
221
221
222
222
223
223
224
224 Outline 1. 5S Framework for DL 1.1. Motivation: the problem 1.2. Theory 1.3. Tools/Applications 1.4. Quality 1.5. Conclusions, Future Work 2. DL Integration 3. DL Overview 4. OAI, OCKHAM, CSTC, NSDL, NDLTD 5. Open Source, Repositories, DigArch, ODL
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.