Presentation is loading. Please wait.

Presentation is loading. Please wait.

An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia.

Similar presentations


Presentation on theme: "An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia."— Presentation transcript:

1 An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech

2 Outline Motivation Related Work Problems with existing DL logs The Digital Library Standardized Log Format DL log standard design DL Log format structure DL log tool and its implementation Conclusions and future work

3 Motivation Log analysis Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns Used to: Evaluate Enhance services Help and design user interfaces Better allocation of resources Common practice in the web setting Supported by web servers, proxy caching

4 Motivation (cont.) DLs differ from the web DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured  DL Logging should offer much richer information and opportunities  Tradeoff : user privacy Current DL logs Differences in formats and recorded information Problems: Lack of interoperability No reuse of analysis tools Comparability of log analysis results

5 Related Work Web Servers (Common Log Format) Focused in browsing, stateless bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:21 -0400] "GET /~harley/courses.html HTTP/1.0" 200 1734 bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:22 -0400] "GET /~harley/clip_art/word_icon.gif HTTP/1.0" 200 1050 www4.e-softinc.com - - [22/Oct/1998:00:20:27 -0400] "HEAD / HTTP/1.0" 200 0 user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/capehatteras.html HTTP/1.0" 200 328 user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/PB2panforringed.mirror.gif HTTP/1.0" 200 20222 eger-dl01.agria.hu - - [22/Oct/1998:00:20:51 -0400] "GET /~tjohnson/pinouts/ HTTP/1.0" 200 26994

6 Related Work (cont.) DL- Greenstone ADMINISTRATION 37 /fast-cgi-bin/niupepalibrary (a) its-www1.massey.ac.nz (b) [Thu Dec 07 23:47:00 NZDT 2000] (c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq2=, d=, e=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h2=, hl=1, hp=, il=l, j=, j2=, k=1, ky=, l=en, m=50, n=, n2=, o=20, p=home, pw=, q=, q2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw1=, umnpw2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z=130.123.128.4- 950647871) (d) "Mozilla/4.08 [en] (Win95; I ;Nav)"

7 Relate Work (cont.) Search Engine - OpenText Mon Sep 28 17:48:42 1998 ----- Starting Search ----- Mon Sep 28 17:48:42 1998 {Transaction Begin} Mon Sep 28 17:48:42 1998 {RankMode Relevance1} Mon Sep 28 17:48:42 1998 "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998 P0 = "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998 R = (*D including (*P0)) Mon Sep 28 17:48:42 1998 R = (((*R rankedby *P0))) Mon Sep 28 17:48:42 1998 S = (subset.1.10 (*R)) Mon Sep 28 17:48:42 1998 SL0 = (region "OTSummary" within.1 (*S)) Mon Sep 28 17:48:42 1998 (*SL0 within.1 ( subset.1.1 *S )) Mon Sep 28 17:48:42 1998 (*SL0 within.1 ( subset.2.1 *S )) Mon Sep 28 17:48:42 1998 {Transaction End}

8 Related Work (cont.) Problems with existing DL logs Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness

9 The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy

10 The Digital Library Standardized Log Format- Design (cont.) Capture high level user and system behaviors Hierarchical organization Encapsulated in transactions Interactions between the users and the system or among the system components Log format designed to record a number of different kinds of transactions Examples: 1. Login to the system 2. Submission of search query 3. Browsing a result list 4. Recording of a user failure

11 The Digital Library Standardized Log Format- Design (cont.) Design Reflective of DL behavior Based on the 5S formal theory Unifying, mathematical theory to formally describe the semantics of DL components Guidance for how to organize the log structure

12 The Digital Library Standardized Log Format- Design (cont.) 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. SocietiesSets of communities and relationships among them User information

13 The Digital Library Standardized Log Format (cont.) Specification Collection of extensive, flat set of attributes query event registering transaction session error browse actiontimestamp Machine information help search update Sorting rule search catalog collection Result cutoff response

14 The Digital Library Standardized Log Format - Specification Organization in structured logical way XML- XML Schema Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction of analysis tools

15 The Digital Library Standardized Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...

16 The Digital Library Standardized Log Format – Structure (cont.) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo

17 AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update The Digital Library Standardized Log Format – Structure (cont.) Decomposition of event

18 The Digital Library Standardized Log Format – Structure (cont.) Search Attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog

19 DL Log Tool and Implementation Java classes XMLLogData: store data XMLLogManager: methods to read and write log information according to the format Synchronized read and writes: avoid conflicts and inconsistencies Middleware for plug-in DL tool to target system Events based on target system architecture and implementation Implemented in the MARIAN DL system

20 DL Log Tool and Implementation (cont.): the MARIAN DL system Database Layer Search Layer User Interaction Layer Data Analysis, Collection Builders & Loading Tools Webgate Semantic networks persistent storage Generalized inverted index interfaces DL Information networks characterization, indexing and loading Tailored DL Infrastructure generation Database management API Searcher community Semantic network Management API Fusion modules Distributed client communication Structured logging Customization and personalization Query history Multilingual support

21 DL Log Tool and Implementation (cont.) MARIAN User Layer XMLLogManager writeLogEntry (parameters) c1 XMLLogData c2 Log middleware System event storelogData (parameters) User event Analysis tool getLogData (parameters) logData Analysis request result DL patron DL analyst

22 DL Log Tool and Implementation (cont.) Example 1: Login to the system 987654usr3 Start mhabib 2002-05-31T20:10:55.000-05:00 128.173.244.56 8000

23 DL Log Tool and Implementation... Dirline CommunityRecord SearchByAnyParts NonPersistant low back pain 2002-05-31T20:11:07.000-05:00 2002-05-31T20:11:09.000-05:00 List ByRank 217 20... Example 2: query all Dirline records about “low back pain”

24 DL Log Tool and Implementation 987654usr3... 5114 University of Washington School of Medicine Multidisciplinary Pain Center (UWPC)... Example 3: Browse an item of the ranked list returned as an answer for the previous search

25 In conclusion Analysis of current DL log formats Need for standardization, common practices, interoperable tools Designed an XML-based log format standard for DL logging analysis Captures a rich, detailed set of system and user behaviors. Implemented format in a log component tool Connected to the MARIAN DL system

26 Future Work Build suite of Components for Evaluation Use log format and tools to evaluate several projects Networked Digital Library of Theses and Dissertations (NDLTD) CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and architectures Consider user privacy issues Explore info for personalization

27 Future work Crosswalks to other standards (e.g. CLF) “Not yet other standard” More challenges Distributed Logs Large settings Investigate compression issues to deal with XML verboseness Promote discussions: Listserv: dl-log-l@listserv.vt.edudl-log-l@listserv.vt.edu


Download ppt "An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia."

Similar presentations


Ads by Google