Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virginia Tech, Blacksburg, VA USA

Similar presentations


Presentation on theme: "Virginia Tech, Blacksburg, VA USA"— Presentation transcript:

1 Virginia Tech, Blacksburg, VA 24061 USA
RDEC-ACE Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 5, SAIC Edward A. Fox, Virginia Tech, Blacksburg, VA USA

2

3 Acknowledgements (Selected)
Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS , , , ; ITR ; DUE , , , ), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS Our efforts have been made possible through the support of sponsors, faculty, staff, and students. We gratefully acknowledge their assistance and collaboration. IBM has donated a large amount of equipment. The largest grants have come from NSF, FIPSE, and SURA. Content has been shared by ACM in a variety of efforts related to learning about computing. Many companies, like Adobe, Microsoft, and OCLC, have provided software and related assistance. A large number of colleagues have worked on the various projects discussed. Students, serving as research assistants, preparing a thesis or dissertation, or engaged in class projects, have helped develop many of the systems and publications about Virginia Tech digital library initiatives.

4 Acknowledgements: Faculty, Staff
Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

5 Acknowledgements: Students
Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …

6 Stepping Stones & Pathways:
Improving retrieval by chains of relationships between document topics Fernando Das-Neves, Virginia Tech DLRL

7 A Little Experiment (Compare a simple query with a longer version that explicitly includes stepping stones) “Literary Style in Sherlock Holmes stories” Note: Numbers are total relevant web pages in top 20 Google results for the query made up of terms on either end of the link. No. of rel. docs. VS.

8 Another Example “What is the Relationship between Data Mining and Recommender Systems?” Naïve Results: There are many matches that are possible answers. Discussion: But, many of the pages with co-occurrences give no real information about the requested relationship. 7 Recommender Systems Data Mining VS. 10 Machine Learning 10 Data Mining Collaborative Recommender Social Networks Filtering 15 Systems 9 11

9 An Alternative Interpretation of a Query in IR:
A query represents two related, separable concepts. Objective: Retrieve a sequence of documents that support a valid set of chains of relationships between the two concepts. Input: a query representing two concepts. Output: two groups of documents + a set of stepping stones (document groups, i.e., clusters) connecting the topics by pathways (relations among clusters).

10 Type of Questions Matching Alternative Interpretation
Ill-defined questions, with non-enumerated answers: “How or why is X related to Y?” “What is the X of Y?” Even if queries with form “give me something about X” lead to relevant docs, it is possible to increase the quantity and quality of information in the query result, when relations are explicit (as a result of our semi-automatic method).

11 Why is this useful? Questions of this type are common.
For example, such questions often occur during research studies. These occur often in educational settings, e.g., for homework. These occur often in workplace settings, requiring gathering and relating of information. Handling of this type of question by current systems often is inadequate.

12 How to Build Stepping Stones and Pathways?
Our approach involves a belief network, to combine content+structure in document similarity calculation, including citation and co-citation similarities. Find two relevant document sets, each related to one of the two original sub-queries. Find a diverse set of strong candidates, each connecting the two subsets, but as different as possible from other candidates. Create stepping stones by finding similar documents to those candidates; keep the clusters that are heavily cited, or whose documents are highly correlated (in all aspects). Repeat the process, finding a new stepping stone in between each pair of clusters that are weakly related, until the pathway length is too long, or the similarity is sufficient.

13 Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications Marcos André Gonçalves Doctoral defense Virginia Tech, Blacksburg, VA USA

14 Informal 5S Definition: DLs are complex systems that
help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

15 5Ss Ss Examples Objectives Streams Structures Spaces Scenarios
Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines service managers, responsible for running DL services; actors, that use those services

16 Hypotheses A formal theory for DLs can be built based on 5S.
The formalization can serve as a basis for modeling and building high-quality DLs.

17 5S Framework and DL Development (Gonçalves)

18 5SLGen: Automatic DL Generation

19 Research Questions 1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?

20 Outline Motivation: the problem Part 1:Theory
Hypotheses and research questions Part 1:Theory 5S: introduction, formal definitions The formal ontology Part 2: Tools/Applications Language Visualization Generation Logging Part 3: Quality Conclusions, Future Work

21 5S and DL formal definitions and compositions (April 2004 TOIS)

22 Digital Library Formal Ontology

23 Composition of key infrastructure services

24 Composition of additional services

25 Ontology: Taxonomy of Services
Infrastructure Services Information Satisfaction Services Repository-Building Add Value Creational Preservational Acquiring Authoring Cataloging Crawling (focused) Describing Digitizing Harvesting Submitting Conserving Converting Copying/Replicating Translating (format) Annotating Classifying Clustering Evaluating Extracting Indexing Linking Logging Measuring Rating Reviewing (peer) Surveying Training (classifier) Translating Visualizing Binding Browsing Customizing Disseminating Expanding(query) Filtering Recommending Requesting Searching

26 5SL: a DL Modeling language
Domain specific languages Address a particular class of problems by offering specific abstractions and notations for the domain at hand Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. XML-based realization of 5S Interoperability Use of many standard sub-languages (e.g., MIME types, XML Schemas, UML notations)

27 Overview of 5SGraph Workspace (instance model) Structured toolbox
(metamodel)

28 5SGen – Version 2: ODL, Services, Scenarios
5SL 5SL - - Scenario Scenario Model Model (6) (6) DL DL 5SL 5SL - - Societies Societies XPath/JDOM XPath/JDOM Designer Designer Model Model (1) (1) Transform Transform (7) (7) DL DL ODL Search Java Wrapping import Component Pool Browse . StateChart StateChart Component Component Designer Designer Model Model Pool Pool (8) (8) XPATH/JDOM XPATH/JDOM . . Transform Transform (2) (2) Scenario Scenario . . Synthesis Synthesis . . (9) (9) 5SGen Java Java XMI:Class XMI:Class Deterministic Deterministic Model Model ODL ODL (3) (3) FSM FSM (10) (10) Search Search Wrapping Wrapping Xmi2Java Xmi2Java (4) (4) import import SMC SMC (11) (11) Java Java Java Java ODL ODL Finite Finite JSP JSP Java Java Browse Browse import import binds binds User User Classes Classes State Machine State Machine Class Class Interface Interface Wrapping Wrapping Model Model (5) (5) Controller Controller (12) (12) View View (13) (13) Generated DL Services Generated DL Services

29 The XML Log Format Log Transaction SessionId MachineInfo Timestamp
Statement Event SessionInfo RegisterInfo ErrorInfo StatusInfo Action Search Browse Update StoreSysInfo SearchBy QueryString Collection Catalog Timeout PresentationInfo

30 Quality and the Information Life Cycle

31 Rao Shen’s Preliminary Exam: Hypothesis and Research Questions
The 5S framework provides effective solutions to DL integration. Formally define the DL integration problem? Guide integration of domain focused DLs? How to formally model such domain specific DLs? How to integrate formally defined DL models into a union DL model? How to use the union DL model to help design and implement high quality integrated DLs? Assess the integration?

32 DL interoperability approach
Related Work DL interoperability approach Intermediary-based mapping-based Consists of Interrelated with mediator wrapper agent use schema mapping use two architectures federation Union Archiving used in Consists of hybrid mapper composite mapper use SemInt has an example LSD has an example

33 DL integration formalization
based on DL interoperability approach Intermediary-based mapping-based Consists of Interrelated with mediator wrapper agent use schema mapping use two architectures federation Union Archiving used in Consists of hybrid mapper composite mapper use trained by GA

34 Formal Definition of DL Integration
DLi=(Ri, DMi, Servi, Soci), 1 i n Ri is a network accessible repository DMi is a set of metadata catalogs for all collections Servi is a set of services Soci is a society UnionRep UnionCat UnionServices UnionSociety

35 Architecture of a Union DL
    Archaeologists General Public Union Society   archaeologists Society   General Public Society Union Service Service Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Searching Browsing Union Catalog Catalog1 Catalog2 Union Repository Repository1 Repository2

36 Example of Union Service: CitiViz

37 CitiViz: A Visual User Interface to the CITIDEL System
ECDL 2004, Bath, England, September 2004 Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and Edward A. Fox

38 A Minimal DL in the 5S Framework
Streams Structures Spaces Scenarios Societies Structured Stream Structural Metadata Specification services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Metadata Catalog Collection Repository Minimal DL

39 A Minimal ArchDL in the 5S Framework
Streams Structures Spaces Scenarios Societies indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification ArchDO ArchObj ArchColl SpaTemOrg StraDia Arch Descriptive Metadata specification Minimal ArchDL Arch Metadata catalog ArchDColl ArchDR

40 ETANA-DL Metadata Format
5SGraph 5S Archaeology MetaModel ArchDL Expert ArchDL Designer VN Metadata Format ETANA-DL Metadata Format HD Metadata Format Scenario Sub-model Structure Sub-model Mapping Tool ETANA-DL Union Services Descriptions Harvesting Mapping Searching Browsing VN Catalog HD Wrapper4VN Wrapper4HD Inverted Files XOAI Search Service Web Interface 5SGen Component Pool Browsing Index Union Catalog Browse DB Index Browse Service Services DB Other ETANA-DL Services XOAI

41 Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections 

42 www.CITIDEL.org Led by Virginia Tech, with co-PIs:
Fox (director, DL systems) Lee (history) Perez (user interface, Spanish support) Students: Ryan Richardson, Kate McDevitt, Jon Pryor, Baoping Zhang Partners College of New Jersey (Knox) Hofstra (Impagliazzo) Villanova (Cassel) Penn State (Giles) 3 3

43 Digital library architecture for local
and interoperable CITIDEL services

44

45

46 CITIDEL Technology Features
Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies OAI: Used to collect DL Resources and DL Interoperability XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) Perl: Component Integration ESSEX: Search Engine Functionality Very fast, utilizing in-memory processing Includes snap-shots for persistence Multi-scheming (Aaron Krowne, now at Emory U. Library) Integrates multiple classifications / views through maps, closure Extensions: clustering, visualization, personalization, …

47 Cluster Search Results from CITIDEL

48 Cluster NDLTD-Computing

49 Naren Ramakrishnan and Saverio Perugini (U. Dayton)
CITIDEL + PIPE Adds Interaction Personalization to CITIDEL Automatically handles multi-modal conversion to Cell phone, PDA, Etc. Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

50 OCKHAM Library Network (NSDL)

51 OCKHAM (Ming Luo) Simplicity (a la OCCAM’s razor)
Support by Mellon and DLF Four main ideas: Components Lightweight protocols Open reference models (e.g., 5S, OAIS) Community perspective and involvement Funded by NSF in NSDL, with P2P, with Emory, Notre Dame, Oregon State, …

52 OCKHAM Proposed Services
Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry (plus others such as from adapted ODL)

53 A Digital Library Case Study
Project: Networked Digital Library of Theses & Dissertations (NDLTD) (supported by Ming Luo) Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: Collection:

54

55

56

57 OCLC SRU Interface => Dr. A.K. Tyagi

58

59

60 ETD Union Search Mirror Site in China (CALIS) (http://ndltd.calis.edu.cn – popular site!)

61 LOCKSS Extensions: Bing Liu, Xiaoyu Zhang, Ji-Sun Kim
Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels, journals Shift to OAI, esp. for ETDs Collab with Emory (Martin Halbert) NDIIP: AmericanSouth, MetaArchive Help deploy and adapt, apply in other contexts Another registry Set of publisher manifests (information providers) Set of storage systems (archival storage)

62 Hussein Suleman (Capetown, S. Africa)
Program Document Image Video XPMH OA PMH open digital library

63 Example Open Digital Library
Program Document ETD-1 ETD-2 Image ETD-3 Video ETD-4 ODLRecent USER INTERFACE Recent ODLUnion PMH Filter PMH ODLUnion Browse Union PMH ODLBrowse PMH ODLUnion Filter PMH Search ODLSearch ETD DL for the Networked Digital Library of Theses and Dissertations ( Students and researchers ETD collections

64 Open Digital Library Deployments
NDLTD ( Computer Science Teaching Center ( Computing and Information Technology Interactive Digital Educational Library ( Open Archives Distributed (NSF, DFG) – enhancements to PhysNet OCKHAM Open to others through DL-in-a-box

65

66

67 Virginia Tech, Blacksburg, VA 24061 USA
Interest-based User Grouping Model for Collaborative Filtering in Digital Libraries 7th ICADL 2004 Shanghai, P.R. China Dec. 15, 2004 Edward A. Fox, Seonho Kim Virginia Tech, Blacksburg, VA USA

68 Some Other Students/Projects
Wensi Xi: Matrices, reinforcement, clusters (Microsoft) Paul Mather: mod/sim of large DLs on clusters; characterization: uses, files (NASA) Ming Luo: personalization aided by demographics Ryan Richarson: CLIR with concept maps Xiaoyan Yu: Stepping Stones and Pathways (NSF, Fernando Das Neves completed & returned to Argentina) Baoping Zhang: Physics and classification (NSF, DFG) Several: TREC with GP New projects: Superimposed information w. PSU (NSF NSDL) Quality and metasearch and structure w. Emory (IMLS)

69 Conclusion Many DL/IR: areas, projects, students Theory Architecture
Modeling and simulation Systems development and testing to: validate above, demonstrate innovations Users, interfaces, visualization, usability Special thanks to AOL for 4 years of Fellowships!

70 Further - Install Citeseer in the small (from PSU)?
PNNL visualization tools (from Battelle)? Develop Proposers assistant?


Download ppt "Virginia Tech, Blacksburg, VA USA"

Similar presentations


Ads by Google