Download presentation
Presentation is loading. Please wait.
Published byMeredith Heath Modified over 9 years ago
1
An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech
2
Outline Motivation Related Work Problems with existing DL logs The Digital Library Standardized Log Format DL log standard design DL Log format structure DL log tool and its implementation Conclusions and future work
3
Motivation Log analysis Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns Used to: Evaluate Enhance services Help and design user interfaces Better allocation of resources Common practice in the web setting Supported by web servers, proxy caching
4
Motivation (cont.) DLs differ from the web DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured DL Logging should offer much richer information and opportunities Tradeoff : user privacy Current DL logs Differences in formats and recorded information Problems: Lack of interoperability No reuse of analysis tools Comparability of log analysis results
5
Related Work Web Servers (Common Log Format) Focused in browsing, stateless bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:21 -0400] "GET /~harley/courses.html HTTP/1.0" 200 1734 bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:22 -0400] "GET /~harley/clip_art/word_icon.gif HTTP/1.0" 200 1050 www4.e-softinc.com - - [22/Oct/1998:00:20:27 -0400] "HEAD / HTTP/1.0" 200 0 user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/capehatteras.html HTTP/1.0" 200 328 user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/PB2panforringed.mirror.gif HTTP/1.0" 200 20222 eger-dl01.agria.hu - - [22/Oct/1998:00:20:51 -0400] "GET /~tjohnson/pinouts/ HTTP/1.0" 200 26994
6
Related Work (cont.) DL- Greenstone ADMINISTRATION 37 /fast-cgi-bin/niupepalibrary (a) its-www1.massey.ac.nz (b) [Thu Dec 07 23:47:00 NZDT 2000] (c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq2=, d=, e=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h2=, hl=1, hp=, il=l, j=, j2=, k=1, ky=, l=en, m=50, n=, n2=, o=20, p=home, pw=, q=, q2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw1=, umnpw2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z=130.123.128.4- 950647871) (d) "Mozilla/4.08 [en] (Win95; I ;Nav)"
7
Relate Work (cont.) Search Engine - OpenText Mon Sep 28 17:48:42 1998 ----- Starting Search ----- Mon Sep 28 17:48:42 1998 {Transaction Begin} Mon Sep 28 17:48:42 1998 {RankMode Relevance1} Mon Sep 28 17:48:42 1998 "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998 P0 = "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998 R = (*D including (*P0)) Mon Sep 28 17:48:42 1998 R = (((*R rankedby *P0))) Mon Sep 28 17:48:42 1998 S = (subset.1.10 (*R)) Mon Sep 28 17:48:42 1998 SL0 = (region "OTSummary" within.1 (*S)) Mon Sep 28 17:48:42 1998 (*SL0 within.1 ( subset.1.1 *S )) Mon Sep 28 17:48:42 1998 (*SL0 within.1 ( subset.2.1 *S )) Mon Sep 28 17:48:42 1998 {Transaction End}
8
Related Work (cont.) Problems with existing DL logs Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness
9
The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy
10
The Digital Library Standardized Log Format- Design (cont.) Capture high level user and system behaviors Hierarchical organization Encapsulated in transactions Interactions between the users and the system or among the system components Log format designed to record a number of different kinds of transactions Examples: 1. Login to the system 2. Submission of search query 3. Browsing a result list 4. Recording of a user failure
11
The Digital Library Standardized Log Format- Design (cont.) Design Reflective of DL behavior Based on the 5S formal theory Unifying, mathematical theory to formally describe the semantics of DL components Guidance for how to organize the log structure
12
The Digital Library Standardized Log Format- Design (cont.) 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. SocietiesSets of communities and relationships among them User information
13
The Digital Library Standardized Log Format (cont.) Specification Collection of extensive, flat set of attributes query event registering transaction session error browse actiontimestamp Machine information help search update Sorting rule search catalog collection Result cutoff response
14
The Digital Library Standardized Log Format - Specification Organization in structured logical way XML- XML Schema Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction of analysis tools
15
The Digital Library Standardized Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...
16
The Digital Library Standardized Log Format – Structure (cont.) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo
17
AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update The Digital Library Standardized Log Format – Structure (cont.) Decomposition of event
18
The Digital Library Standardized Log Format – Structure (cont.) Search Attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog
19
DL Log Tool and Implementation Java classes XMLLogData: store data XMLLogManager: methods to read and write log information according to the format Synchronized read and writes: avoid conflicts and inconsistencies Middleware for plug-in DL tool to target system Events based on target system architecture and implementation Implemented in the MARIAN DL system
20
DL Log Tool and Implementation (cont.): the MARIAN DL system Database Layer Search Layer User Interaction Layer Data Analysis, Collection Builders & Loading Tools Webgate Semantic networks persistent storage Generalized inverted index interfaces DL Information networks characterization, indexing and loading Tailored DL Infrastructure generation Database management API Searcher community Semantic network Management API Fusion modules Distributed client communication Structured logging Customization and personalization Query history Multilingual support
21
DL Log Tool and Implementation (cont.) MARIAN User Layer XMLLogManager writeLogEntry (parameters) c1 XMLLogData c2 Log middleware System event storelogData (parameters) User event Analysis tool getLogData (parameters) logData Analysis request result DL patron DL analyst
22
DL Log Tool and Implementation (cont.) Example 1: Login to the system 987654usr3 Start mhabib 2002-05-31T20:10:55.000-05:00 128.173.244.56 8000
23
DL Log Tool and Implementation... Dirline CommunityRecord SearchByAnyParts NonPersistant low back pain 2002-05-31T20:11:07.000-05:00 2002-05-31T20:11:09.000-05:00 List ByRank 217 20... Example 2: query all Dirline records about “low back pain”
24
DL Log Tool and Implementation 987654usr3... 5114 University of Washington School of Medicine Multidisciplinary Pain Center (UWPC)... Example 3: Browse an item of the ranked list returned as an answer for the previous search
25
In conclusion Analysis of current DL log formats Need for standardization, common practices, interoperable tools Designed an XML-based log format standard for DL logging analysis Captures a rich, detailed set of system and user behaviors. Implemented format in a log component tool Connected to the MARIAN DL system
26
Future Work Build suite of Components for Evaluation Use log format and tools to evaluate several projects Networked Digital Library of Theses and Dissertations (NDLTD) CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and architectures Consider user privacy issues Explore info for personalization
27
Future work Crosswalks to other standards (e.g. CLF) “Not yet other standard” More challenges Distributed Logs Large settings Investigate compression issues to deal with XML verboseness Promote discussions: Listserv: dl-log-l@listserv.vt.edudl-log-l@listserv.vt.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.