Download presentation
Presentation is loading. Please wait.
1
Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries Texas A&M University
2
What is this talk about? A system approach to help in managing digital libraries with collections of fluid resources with distributed location and ownership
3
Modern paradigms of digital libraries Pointers rather than the resources Web-based collections NSDL (http://www.ehr.nsf.gov/due/programs/nsdl/) Meta-documents High fluidity Changes vary in relevance Little system aid for assessing relevance of changes
4
This is a problem everybody has: Bookmark lists Yahoo! catalogues Search engines indices
5
Related work David Johnson PhD Dissertation, University of Washington Document distance Weighted, asymmetric Change monitoring systems AIDE, URL Minder, WatzNew Fine-grained yes/no detection WebWatcher (evolving) “Interesting” Identification Syskill & Webert, Do-I-Care-Agent, Letizia Personal, reader specific, profile-based
6
Motivation Managing Walden’s Paths collection Paths are meta-documents Sequential arrangement of Web pages Rhetorically coherent Contextualized Distributed ownership Distributed authorship Continuous revision of the collection
7
Mechanisms for addressing the issue Caching the pages Caching strategies Some changes are desirable Fluid paths Ephemeral paths Rhetorical coherence
8
The real issue Mechanisms only allowed limited reaction to changes Detecting changes is easy but determining the relevance is difficult Humans are still required to determine the significance of changes In order to react to changes the assessment of their relevance is required
9
The perception of change (overview) Observe how humans perceive changes of Web pages Inform and evaluate the approach and design Questions 1. Do people view the same changes in a different way when given different amounts of time? 2. What kind of changes are easily perceived? 3. Of what kind of changes do users want to be notified?
10
Kinds of change Content changes (what) Presentation changes (how) Structural changes (linking) Behavioral changes
11
Results and implications Presentation changes were usually perceived as irrelevant The desire of notification and the perception of overall change increased as the degree of content change did Time played a larger role for the perception of structural changes than for the content changes As the degree of structural change increased, so did the desire of notification Links are useful metrics
12
Path Manager: the system Java based Paths or bookmark lists HTML pages Functional state of the document Original Valid Last-time
13
Algorithms Variation of Johnson Weighted sum of additions, deletions and modifications for each metric Added metric for structure changes Flexible Asymmetric Lack normalization Proportional Determines the proportion of modification for each metric Simple Symmetrical Normalized
14
Initial interface
15
Overall change relevance assessment
16
Document signatures Paragraph checksums Headlines Links Keywords Global checksum
17
View of change metrics
18
Detailed view of page metrics
19
Path information
20
Web page retrieval and connectivity Potentially slow and unpredictable Parallel retrieval Multi-threaded Multiple attempts and retries Different states Connection state Retrieval state Analysis state
21
Challenges and limitations Heuristic identification of document structure (I.e. headings) Indirection Behavior Dynamic pages
22
Conclusions Managing distributed collections of documents remains challenging and time consuming requiring the assistance of humans The Path Manager supports the maintenance of collection of Web pages by recognizing, evaluating and informing the user of relevant changes keeps track of the original, valid and last-time state of Web pages The study conducted indicated the desire for structural changes to be included in the determination of overall change
23
Contact information Luis Francisco-Revilla l0f0954@csdl.tamu.edu Frank M. Shipman, III shipman@csdl.tamu.edu Richard Furuta furuta@csdl.tamu.edu Unmil Karadkar unmil@csdl.tamu.edu Avital Arora avital@csdl.tamu.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.