Download presentation
Presentation is loading. Please wait.
Published byTodd Ellis Modified over 9 years ago
1
1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu http://fox.cs.vt.edu Dept. of Computer Science, Virginia Tech Blacksburg, VA 24061 USA
2
Acknowledgements (selected) Colleagues: Lillian Cassel, Debra Dudley, Weiguo Fan, Marcos Gonçalves, Doug Gorton, Rohit Kelapure, Neill Kipp, Aaron Krowne, Ming Luo, Yi Ma, Uma Murthy, Manuel Perez, Ananth Raghavan, Rao Shen, Venkat Srinivasan, Hussein Suleman, Srinivas Vemuri, Layne Watson, Seungwon Yang, … Sponsors: ACM, AOL, CAPES, DFG, Google, IBM, IMLS, INL, Microsoft, NSF (CCF-0722259; IIS- 9986089, 0080748, 0086227, 0307867, 0325579, 0535057, 0535060, 0736055 ; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059, 0532825), SUN, …
3
3 Acknowledgements - Mentors JCR Licklider – undergrad advisor (1969-71) –Author in 1965 of “Libraries of the Future” –Before, at ARPA, funded start of Internet Michael Kessler – BS thesis advisor –Project TIP (technical information project) –Defined bibliographic coupling Gerard Salton – graduate advisor (1978-83) –“Father of Information Retrieval”
4
4 Living In the KnowlEdge Society (LIKES) North Carolina A & T Santa Clara University Villanova University Virginia Tech NSF CPATH: CCF-0722259,0722276,0722289, and 0752865
5
5 LIKES Vision - Disciplines Knowledge Society HCI Visualization Knowledge Management Systems Analysis & Design Programming Database Algorithms Architecture Net-Centricity Intelligent Systems Social & Ethical Library / Information Science Sociology Simulation Commun- ications Political Science Archi- tecture Health- care Economics Finance Psychology Marketing Physics Music Engi- neering History Biology Art Chemistry Geography Math Geology English
6
6 LIKES Vision - Applications Knowledge Society HCI Visualization Knowledge Management Systems Analysis & Design Programming Database Algorithms Architecture Net-Centricity Intelligent Systems Social & Ethical Library Information Science GIS Simulation Online Shopping Multi Media Semantic Web CSCW Digital Government Healthcare Services
7
7 Four Workshops Workshop 1 – Theme: Defining Problems and Applications of the Knowledge Society –Santa Clara University, Dec. 2007 Workshop 2 – Theme: Testing LIKES Vision –North Carolina Agricultural and Technical State University –Completed April 18-19 Workshop 3 – Theme: LIKES Pedagogy –Virginia Tech, Fall 2008 Workshop 4 – Theme: LIKES in Practice –Villanova, Spring 2009
8
8 LIKES Vision Build a community, leading the way to change how computing concepts are taught in both computing-related disciplines and in the disciplines of the broader workforce & society. Reach a broader audience of potential students and produce a larger number of professionals with the computing competencies and skills for LIKES. Improve computing competencies and skills of people in all disciplines, to help them address the pervasive and growing needs for computing in society.
9
9 Transform CS Education Find Interesting Problems to Bring into Computing Courses for Learning in Context Thus, in a database class, students can: –See the value of hierarchical data structures to biology by representing the taxonomy of species. –See the value of hierarchical data structures to political science and management by representing the organization chart of the executive branch of U.S. government.
10
10 Potential Course Areas/Courses Personal Knowledge Management –Computer Science and Information Systems, e.g., multi-media, process design and evaluation, and Human-Computer / Human-Information interaction. –Psychology, e.g., knowledge organization principles, human cognitive processes. –Industrial Systems Engineering, e.g., Ergonomic factors of knowledge environments. –Ethics, e.g., ethical issues of information disclosure. Communication and Collaboration –Communications, e.g., Communication using digital visualizations, using knowledge access in constructing digital messages. –Information Systems and Computer Science, e.g., computer supported cooperative work and group support systems. –Marketing, e.g., influence of knowledge presentation on on-line customer behavior. Organization –Information Systems, e.g., service innovation and development, system design and development. –Management Science, e.g., decision support systems concepts, capabilities, techniques, and tools. –Management, Marketing, Accounting, and Finance, e.g., business in the information age. Society –Sociology, e.g., impact of knowledge differentials across society and countries. –Political Science, e.g., governmental collection and use of knowledge, impact of technology on elections and government.
11
11 Interdisciplinary Work Example: Virtual Jamestown Project Director –Prof. Crandall Shifflett, Dept. of History, VT –In 1996 he conceived the idea of combining technology, history, and Jamestown 2007. Project Staff Members –Julie Richter: Ph.D. in early American history –Matthew Parrott: computer science major, chief modeler, animator Virtual Jamestown is a product of collaboration between Virginia Tech, the University of Virginia, and the Virginia Center for Digital History at the University of Virginia.
12
12 Information Life Cycle Authoring Modifying Organizing Indexing Storing Retrieving Distributing Networking Retention / Mining Accessing Filtering Using Creating
13
13 DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian
14
14 DL Definitions - 1 “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.” Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003
15
15 DL Definitions - 2 “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities” Waters,D.J. CLIR Issues, July/August 1998 www.clir.org/pubs/issues/issues04.html
16
16 DL Definitions - 3 Issues and Spectra –Collection vs. Institution –Content vs. System –Access vs. Preservation –“Free” vs. Quality –Managed vs. Comprehensive –Centralized vs. Distributed
17
17 DL Definitions - 4 NOT a “digitized library” NOT a “deconstruction” of existing systems and institutions, moving them to an electronic box in a Library IS a new way to deal with knowledge –Authoring, Self-archiving, Collecting, –Organizing, Preserving, –Accessing, Propagating, Re-using
18
18
19
19 Informal 5S & DL Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
20
20 Hypotheses A formal theory for DLs can be built based on 5S. The formalization can serve as a basis for modeling and building high- quality DLs.
21
21 5Ss SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
22
22
23
23
24
24 ETANA Societies 1.Historic and pre-historic societies (being studied) 2.Archaeologists (in academic institutes, fieldwork settings, or local and national governmental bodies) 3.Project directors 4.Technical staff (consisting of photographers, technical illustrators, and their assistants) 5.Field staff (responsible for the actual work of excavation) 6.Camp staff (e.g., camp managers, registrars, tool stewards) 7.General public (e.g., educators, learners, citizens)
25
25 ETANA Societies Social issues 1.Who owns the finds? 2.Where should they be preserved? 3.What nationality and ethnicity do they represent? 4.Who has publication rights? 5.What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?
26
26 ETANA Scenarios 1.Life in the site in former times 2.Digital recording: the planning stage and the excavation stage 3.Planning stage: remote sensing, fieldwalking, field surveys, building surveys, consulting historical and other documentary sources, and managing the sites and monuments 4.Excavation 1.Detailed information is recorded, including for each layer of soil, and for features such as pole holes, pits, and ditches. 2.Data about each artifact is recorded together with information about its exact find spot. 3.Numerous environmental and other samples are taken for laboratory analysis, and the location and purpose of each is carefully recorded. 4.Large numbers of photographs are taken, both general views of the progress of excavation and detailed shots showing the contexts of finds. 5.Organization and storage of material 6.Analysis and hypotheses generation and testing 7.Publications, museum displays 8.Information services for the general public
27
27 ETANA Spaces 1.Geographic distribution of found artifacts 2.Temporal dimension (as inferred by archaeologists) 3.Metric or vector spaces 1.used to support retrieval operations, and to calculate distance (and similarity) 2.used to browse / constrain searches spatially 4.3D models of the past, used to reconstruct and visualize archaeological ruins 5.2D interfaces for human-computer interaction
28
28 ETANA Structures 1.Site Organization 1.Region, site, partition, sub-partition, locus, … 2.Temporal orderings (ages, periods) 3.Taxonomies 1.for bones, seeds, building materials, … 4.Stratigraphic relationships 1.above, beneath, coexistent
29
29 ETANA Streams 1.successive photos and drawings of excavation sites, loci, unearthed artifacts 2.audio and video recordings of excavation activities and discussions 3.textual reports 4.3D models used to reconstruct and visualize archaeological ruins.
30
30 5S and DL formal definitions and compositions (April 2004 TOIS)
31
31 Fox & Gonçalves Book Outline Ch. 1. Introduction (Motivation, Synopsis) Part 1 – The “Ss” Part 2 – Higher DL Constructs Part 3 – Advanced Topics Appendix
32
32 Book Parts and Chapters - 1 Ch. 1. Introduction (Motivation, Synopsis) Part 1 – The “Ss” –Ch. 2: Streams –Ch. 3: Structures –Ch. 4: Spaces –Ch. 5: Scenarios –Ch. 6: Societies
33
33 Book Parts and Chapters - 2 Part 2 – Higher DL Constructs –Ch. 7: Collections –Ch. 8: Catalogs –Ch. 9: Repositories and Archives –Ch. 10: Services –Ch. 11: Systems –Ch. 12: Case Studies
34
34 Book Parts and Chapters - 3 Part 3 – Advanced Topics –Ch. 13: Quality –Ch. 14: Integration –Ch. 15: How to build a digital library –Ch. 16: Research Challenges, Future Perspectives Appendix –A: Mathematical preliminaries –B: Formal Definitions: Ss –C: Formal Definitions: DL terms, Minimal DL –D: Formal Definitions: Archeological DL –E: Glossary of terms, mappings
35
35 Chapter 3: (Degree of) Structure Chaotic OrganizedStructured WebDLsDBs
36
36 Digital Objects (DOs) Born digital Digitized version of “real” object –Is the DO version the same, better, or worse? –Decision for ETDs: structured + rendered Surrogate for “real” object –Not covered explicitly in metamodel for a minimal DL –Crucial in metamodel for archaeology DL
37
37 Also Important: Epub, SGML, XML 5S perspective: streams, structures, scenarios Authoring Rendering, presenting Tagging, Markup, DOM Semi-structured information Dual-publishing, eBooks Styles (XSL, XSLT) Structured queries
38
38 Chapter 4 Overview (Spaces) Retrieval models –Boolean, extended Boolean –Vector, LSI –Probabilistic: classical, belief network, inference network, language models User interfaces and visualization – cont’d
39
39 User interfaces and visualization 2D interfaces 3D interfaces GIS Other paradigms: trees, graphs, bubbles, coordinated views, … Stepping Stones and Pathways –http://fox.cs.vt.edu/SSP/
40
40 Chapter 6 Overview (Societies) User communities –Authors, editors, teachers, students, readers –Personal(ization), group(ware), community, global –Accessibility, universal access Librarians: reference, acquisition, operations Research community –Associations, conferences, publications, labs, projects Economics –Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints) –Publishers, catalogers, distributors, sustainability –Open source, commercial, hybrid
41
41 Chapter 9 Archives & Repositories Open Archives Initiative (OAI) Institutional Repositories Persistent storage of digital objects Coupling of metadata with digital objects Use of “handles” as identifiers for digital objects Put, get, harvest
42
42 OAI - Open Archives Initiative Advocacy for interoperability Standard for transferring metadata among digital libraries –Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility Support for PMH => Open Archive (OA)
43
43 OAI – Repository Perspective Required: Protocol DO MDO
44
44 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7
45
45 Institutional Repositories - 1 “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.” Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA www.arl.org/sparc/IR/IR_Guide_v1.pdf
46
46 Chapter 10 Services Taxonomy of services Ontology, composition, reuse Evaluation Key services in-depth: –Crawling, indexing –Clustering, classifying –Recommending, using social networks –Logging
47
47
48
48 Ontology: Applications Expand definition of minimal DL by characterizing –typical DL services –in the context of “employs” and “produces” relationships Use characterization to: –Reason about how DL services can be built from other DL components –As well as be composed with other services through extension or reuse
49
49
50
50 Ontology: Applications
51
51
52
52 5S and Generating DLs 5S Framework 5S definitions, services taxonomy, ontology 5SL (specification language) 5SGraph (to prepare 5SL) 5SGen (for DL development, incl. DSpace) SchemaMapper for development of union DL
53
53 5SL: a DL design language Domain specific languages –Address a particular class of problems by offering specific abstractions and notations for the domain at hand –Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. XML-based realization of 5S –Interoperability –Use of many sub-languages (e.g., MIME types, XML Schemas, UML notations)
54
54 Help users model their own instances of a digital library (DL) in the 5S language (5SL). A simple modeling process which enables rapid generation of digital libraries Features –5SGraph loads and displays a metamodel in a structured toolbox. –The structured editor of 5SGraph provides a top- down visual building environment for the DL designer. –5SGraph produces syntactically correct 5SL files according to the visual model built by the designer. 5SGraph: A DL Modeling Tool
55
55 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)
56
56
57
57
58
58
59
59
60
60 5SGen Version 1 – MARIAN as the target system –Focused on rich structures: semantic networks –Behavior attached to nodes/links Version 2 – Shifted for later work to componentized (ODL) approach –Focused on scenarios/societies –Structures/Spaces encapsulated within components (e.g., relational tables, indexes) –Only textual streams supported Version 3 – Into DSpace (practical DL)
61
61 5SLGen – Version 2: ODL, Services, Scenarios
62
62 Tools/Applications
63
63 5SGraph 5S Archaeology MetaModel ArchDL Expert ArchDL Designer Structure Sub-model ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing … Scenario Sub-model VN Metadata Format ETANA-DL Metadata Format HD Metadata Format Mapping Tool Wrapper4VNWrapper4HD Inverted Files Services DB Index Browse Service Search Service Browse DB Other ETANA-DL Services Web Interface XOAI VN Catalog HD Catalog Union Catalog 5SGen Component Pool Browsing …
64
64 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections www.citidel.org
65
65 Digital library architecture for local and interoperable CITIDEL services
66
CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL National Science Digital Library www.nsdl.org (Next slides courtesy Lee Zia, NSF)
67
67
68
68
69
69 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”
70
A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: ETD-db, DSpace, Proquest, … Collection: local archives, regional collaborations, global union catalog Project: Networked Digital Library of Theses & Dissertations (NDLTD) www.ndltd.org
71
71
72
Student Gets Committee Signatures and Submits ETD Signed Grad School
73
Aiding universities to enhance graduate education, publishing and IPR efforts Helping improve the availability and content of theses and dissertations Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) What are we doing?
74
74 Why ETD? Short Answer For Students: –Gain knowledge and skills for the Information Age –Richer communication (digital information, multimedia, …) For Universities: –Easy way to enter the digital library field and benefit thereby For the World: –Global digital library – large, useful, many services General: –Save time and money –Increased visibility for all associated with research results
75
75 Metamodels in the 5S Framework Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services Minimal DL Minimal ArchDL …
76
76 Digital Object Repository Collection Minimal DL Metadata Catalog Descriptive Metadata Specification A Minimal DL in the 5S Framework Structural Metadata Specification StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream
77
77 StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification SpaTemOrg StraDia Arch Descriptive Metadata specification ArchDO ArchObj ArchColl Arch Metadata catalog ArchDColl ArchDR Minimal ArchDL A Minimal ArchDL in the 5S Framework
78
78 Moving from a minimal DL towards a DL reference model (1/2) Minimal DLDL reference model Multimedia Annotation Knowledge management Practical DL systems PIM DL quality Domain- specific DLs
79
79 Moving from a minimal DL towards a DL reference model (2/2) Content-based image retrieval services in a DL A superimposed- information-supported DL Practical DL generation
80
80 Superimposing information Superimposed layer New information/structures Base layer Existing information from heterogeneous sources: text, images, audio/video documents Mark Reference to base information element
81
81 Preliminary SI-DL metamodel
82
82 StreamStructureSpaceServiceSociety Image Stream Feature Vector Image Descriptor Structured Featute Vector Image Content Description Image Digital Object Image Object User Info Need Image Collection Visualization Operation Content-based Image Searching Service Image Descriptor Metadata Catalog Composite Descriptor KNNQ RQ Minimal CBIR DL
83
83 Summary 5S and Generating DLs –5S Framework –5S definitions, services taxonomy, ontology –5SL –5SGraph –5SGen (and DL development) –DL development of union DL –5SGen into DSpace 5S Metamodels –Minimal DL –Archaeology DL –Multimedia (CBIR) DL –Union DL –Practical DL, superimposed information, personal DL, …
84
84 DL Curriculum Project (NSF supporting VT, UNC-CH) Identify, develop and test educational DL modules, guided by - Experts, international collaborators - Computing Curriculum 2001 - 5S framework - Analysis of DL course syllabi …
85
85 CC2001 Information Management Areas IM1. Information models and systems* IM8. Distributed DBs IM2. Database systems*IM9. Physical DB design IM3. Data modeling*IM10. Data mining IM4. Relational DBsIM11. Information storage and retrieval IM5. Database query languagesIM12. Hypertext and hypermedia IM6. Relational DB designIM13. Multimedia information & systems IM7. Transaction processingIM14. Digital libraries
86
86 Why Modular Design Flexibility, e.g., for ETD programs: –Self-study by NDLTD trainers –Self-study by ETD authors –Short courses by NDLTD trainers of ETD authors –A course based on a single module –Course sequence (program) from multiple modules –Plug in modules into an existing course (enhancement) Module 1. Overview + Module 10. DL Education & Research
87
87 Modules 1.Collection Development 2.Digital objects / Composites / Packages 3.Metadata, Cataloging, Author submission 4.Architecture, Interoperability 5.Data visualization 6.Services 7.Intellectual property rights management, Privacy, Protection 8.Social issues / Future of DLs 9.Archiving and Preservation
88
88 Ascertaining Priority Topics We’ve manually classified and analyzed publications using 9 Modules: SourceCount ProceedingsJCDL ’01 – ’05354 ProceedingsACM DL ’96 – ’00189 Magazine articlesD-Lib ’95 – ‘06521 Session titlesJCDL, ACM DL, ECDL 264
89
89 Conference papers x modules
90
90 Analysis Results: -Total of 543 proceedings: Most popular topics were architecture (module 4) and services (module 6)
91
91 Distribution of D-Lib Magazine Articles across Module Topics
92
92 Analysis Results: -Total of 521 articles: Most popular topics were architecture (module 4), services (module 6) and social issues (module 8)
93
93 Distribution of Session Titles across Module Topics
94
94 Analysis Results: -Total of 264 session titles (JCDL, ECDL, ICADL): Most popular topic was services (module 6) followed by architecture (module 4)
95
95 Pointers and Summary http://fox.cs.vt.edu http://fox.cs.vt.edu/talks www.dlib.vt.edu fox@vt.edu DL, 5S Education: CITIDEL, NSDL, NDLTD, LIKES, DLcurric
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.