An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia.

Slides:



Advertisements
Similar presentations
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
Advertisements

Database System Concepts and Architecture
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
Digital Library Architecture and Technology
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
What we learned while building DLESE Katy Ginger Metadata Architect, Meteorologist, Instructional Designer.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
International Workshop on Web Engineering ACM Hypertext 2004 Santa Cruz, August 9-13 An Engineering Perspective on Structural Computing: Developing Component-Based.
Chapter 2 CIS Sungchul Hong
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Adaptive Hypermedia Tutorial System Based on AHA Jing Zhai Dublin City University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
The Web-DL Environment for Building Digital Libraries from the Web P. Calado 1, M. Gonçalves 2, E. Fox 2, B. Ribeiro-Neto 1, A. Laender 1, A. da Silva.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
XXDL and CSTC and Virginia Tech NSDL Fall 2000 PI Meeting September 22-24, 2000 NSF, Arlington, VA Edward A. Fox CS DLRL.
Logging in Digital Libraries. Last week …. Introduction to quality indicators and the way in which these are formalized and made computable, according.
1 Chapter 1 Introduction to Databases Transparencies.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Portals: Architecture & Best Practices Greg Hinkle February 2005.
Visual Semantic Modeling of Digital Libraries Qinwei Zhu, Marcos André Gonçalves, Rao Shen, Edward A. Fox – Virginia Tech,, Blacksburg, VA, USA Lillian.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
SCENARIO-BASED GENERATION OF DIGITAL LIBRARY SERVICES Rohit Kelapure, Marcos André Gonçalves, Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Chapter 2 Database Environment.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Application architectures Advisor : Dr. Moneer Al_Mekhlafi By : Ahmed AbdAllah Al_Homaidi.
Chapter 1 Overview of Databases and Transaction Processing.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
The Components of Information Systems
User Characterization in Search Personalization
DATA MODELS.
AMGA Web Interface Salvatore Scifo INFN sez. Catania
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Components of Information Systems
Chapter 2 Database Environment.
Lecture 1: Multi-tier Architecture Overview
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research Laboratory Virginia.
AMGA Web Interface Vincenzo Milazzo
Information System Building Blocks
Presentation transcript:

An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech

Outline Motivation Related Work Problems with existing DL logs The Digital Library Standardized Log Format DL log standard design DL Log format structure DL log tool and its implementation Conclusions and future work

Motivation Log analysis Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns Used to: Evaluate Enhance services Help and design user interfaces Better allocation of resources Common practice in the web setting Supported by web servers, proxy caching

Motivation (cont.) DLs differ from the web DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured  DL Logging should offer much richer information and opportunities  Tradeoff : user privacy Current DL logs Differences in formats and recorded information Problems: Lack of interoperability No reuse of analysis tools Comparability of log analysis results

Related Work Web Servers (Common Log Format) Focused in browsing, stateless bbn-cache-3.cisco.com - - [22/Oct/1998:00:20: ] "GET /~harley/courses.html HTTP/1.0" bbn-cache-3.cisco.com - - [22/Oct/1998:00:20: ] "GET /~harley/clip_art/word_icon.gif HTTP/1.0" www4.e-softinc.com - - [22/Oct/1998:00:20: ] "HEAD / HTTP/1.0" user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20: ] "GET /~lhuang/junior/capehatteras.html HTTP/1.0" user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20: ] "GET /~lhuang/junior/PB2panforringed.mirror.gif HTTP/1.0" eger-dl01.agria.hu - - [22/Oct/1998:00:20: ] "GET /~tjohnson/pinouts/ HTTP/1.0"

Related Work (cont.) DL- Greenstone ADMINISTRATION 37 /fast-cgi-bin/niupepalibrary (a) its-www1.massey.ac.nz (b) [Thu Dec 07 23:47:00 NZDT 2000] (c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq2=, d=, e=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h2=, hl=1, hp=, il=l, j=, j2=, k=1, ky=, l=en, m=50, n=, n2=, o=20, p=home, pw=, q=, q2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw1=, umnpw2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z= ) (d) "Mozilla/4.08 [en] (Win95; I ;Nav)"

Relate Work (cont.) Search Engine - OpenText Mon Sep 28 17:48: Starting Search Mon Sep 28 17:48: {Transaction Begin} Mon Sep 28 17:48: {RankMode Relevance1} Mon Sep 28 17:48: "Bacillus thuringiensis " Mon Sep 28 17:48: P0 = "Bacillus thuringiensis " Mon Sep 28 17:48: R = (*D including (*P0)) Mon Sep 28 17:48: R = (((*R rankedby *P0))) Mon Sep 28 17:48: S = (subset.1.10 (*R)) Mon Sep 28 17:48: SL0 = (region "OTSummary" within.1 (*S)) Mon Sep 28 17:48: (*SL0 within.1 ( subset.1.1 *S )) Mon Sep 28 17:48: (*SL0 within.1 ( subset.2.1 *S )) Mon Sep 28 17:48: {Transaction End}

Related Work (cont.) Problems with existing DL logs Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness

The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy

The Digital Library Standardized Log Format- Design (cont.) Capture high level user and system behaviors Hierarchical organization Encapsulated in transactions Interactions between the users and the system or among the system components Log format designed to record a number of different kinds of transactions Examples: 1. Login to the system 2. Submission of search query 3. Browsing a result list 4. Recording of a user failure

The Digital Library Standardized Log Format- Design (cont.) Design Reflective of DL behavior Based on the 5S formal theory Unifying, mathematical theory to formally describe the semantics of DL components Guidance for how to organize the log structure

The Digital Library Standardized Log Format- Design (cont.) 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. SocietiesSets of communities and relationships among them User information

The Digital Library Standardized Log Format (cont.) Specification Collection of extensive, flat set of attributes query event registering transaction session error browse actiontimestamp Machine information help search update Sorting rule search catalog collection Result cutoff response

The Digital Library Standardized Log Format - Specification Organization in structured logical way XML- XML Schema Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction of analysis tools

The Digital Library Standardized Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...

The Digital Library Standardized Log Format – Structure (cont.) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo

AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update The Digital Library Standardized Log Format – Structure (cont.) Decomposition of event

The Digital Library Standardized Log Format – Structure (cont.) Search Attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog

DL Log Tool and Implementation Java classes XMLLogData: store data XMLLogManager: methods to read and write log information according to the format Synchronized read and writes: avoid conflicts and inconsistencies Middleware for plug-in DL tool to target system Events based on target system architecture and implementation Implemented in the MARIAN DL system

DL Log Tool and Implementation (cont.): the MARIAN DL system Database Layer Search Layer User Interaction Layer Data Analysis, Collection Builders & Loading Tools Webgate Semantic networks persistent storage Generalized inverted index interfaces DL Information networks characterization, indexing and loading Tailored DL Infrastructure generation Database management API Searcher community Semantic network Management API Fusion modules Distributed client communication Structured logging Customization and personalization Query history Multilingual support

DL Log Tool and Implementation (cont.) MARIAN User Layer XMLLogManager writeLogEntry (parameters) c1 XMLLogData c2 Log middleware System event storelogData (parameters) User event Analysis tool getLogData (parameters) logData Analysis request result DL patron DL analyst

DL Log Tool and Implementation (cont.) Example 1: Login to the system usr3 Start mhabib T20:10: :

DL Log Tool and Implementation... Dirline CommunityRecord SearchByAnyParts NonPersistant low back pain T20:11: : T20:11: :00 List ByRank Example 2: query all Dirline records about “low back pain”

DL Log Tool and Implementation usr University of Washington School of Medicine Multidisciplinary Pain Center (UWPC)... Example 3: Browse an item of the ranked list returned as an answer for the previous search

In conclusion Analysis of current DL log formats Need for standardization, common practices, interoperable tools Designed an XML-based log format standard for DL logging analysis Captures a rich, detailed set of system and user behaviors. Implemented format in a log component tool Connected to the MARIAN DL system

Future Work Build suite of Components for Evaluation Use log format and tools to evaluate several projects Networked Digital Library of Theses and Dissertations (NDLTD) CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and architectures Consider user privacy issues Explore info for personalization

Future work Crosswalks to other standards (e.g. CLF) “Not yet other standard” More challenges Distributed Logs Large settings Investigate compression issues to deal with XML verboseness Promote discussions: Listserv: