Logging in Digital Libraries. Last week …. Introduction to quality indicators and the way in which these are formalized and made computable, according.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Object-Oriented Application Development Using VB.NET 1 Chapter 5 Object-Oriented Analysis and Design.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Implementation of the DDI at the Roper Center A Pilot Project on Resource Integration Marc Maynard and Hui Wang The Roper Center.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
© Copyright Eliyahu Brutman Programming Techniques Course.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CSI315CSI315 Web Development Technologies Continued.
Introduction to Databases
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Requirements Engineering Methods for Requirements Engineering Lecture-30.
INFORMATION SYSTEM FOR SUPPORT OF REGIONAL DEVELOPMENT (INFOREG) IN THE SLOVAK REPUBLIC INFOSTAT, Bratislava, Slovakia Prepared by Lenka Priehradnikova,
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
ECDL. Word processing Work with documents and save them in different file formats Choose built-in options such as the Help function to enhance productivity.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia.
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
A Generalized Architecture for Bookmark and Replay Techniques Thesis Proposal By Napassaporn Likhitsajjakul.
Visual Semantic Modeling of Digital Libraries Qinwei Zhu, Marcos André Gonçalves, Rao Shen, Edward A. Fox – Virginia Tech,, Blacksburg, VA, USA Lillian.
Computing and Information Technology Interactive Digital Educational Library Technical Development Content Collection Edward Fox (director) John A. N.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Collaborative Query Previews in Digital Libraries Lin Fu, Dion Goh, Schubert Foo Division of Information Studies School of Communication and Information.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Data mining in web applications
Introduction to DBMS Purpose of Database Systems View of Data
Usage scenarios, User Interface & tools
Architecture Concept Documents
Chapter 1: Introduction
System Design.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
ece 627 intelligent web: ontology and beyond
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research Laboratory Virginia.
AMGA Web Interface Vincenzo Milazzo
Introduction to DBMS Purpose of Database Systems View of Data
Chapter 1: Introduction
Chapter 5 Architectural Design.
Database System Concepts and Architecture
Chapter 1: Introduction
Presentation transcript:

Logging in Digital Libraries

Last week …. Introduction to quality indicators and the way in which these are formalized and made computable, according to one view. Making a digital library as good as it cam be requires understanding what it is and how it is being used. Information comes from logs

Another aspect A category of quality indicator that comes from seeing what happens when users visit the library An important tool -- the logs All web based systems have logs of interaction from the outside world to the web server –Not specifically designed for digital libraries We will look at a proposed standard for digital library specific log analysis

This work Done by Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox - Virginia Tech and Lillian N. Cassel, Filip Jagodzinski - Villanova

Motivation Log analysis Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns Used to: Evaluate Enhance services Help design user interfaces Better allocation of resources Common practice in the web setting Supported by web servers, proxy caching

Motivation (cont.) DLs differ from the web –DL collections are explicitly organized, described, managed, and preserved –Users with more specific tasks and needs –Digital objects and collections more structured  DL Logging should offer much richer information and opportunities  Tradeoff : user privacy Current DL logs –Differences in formats and recorded information –Problems: Lack of interoperability No reuse of analysis tools Comparability of log analysis results

Related Work Problems with existing DL logs –Incompatibility –Incompleteness –Complexity of analysis –Lack of organization –Ambiguity –Inflexibility –Verboseness --- Generally, lack of a global view of need for understanding how the DL is or is not serving its users

The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate varying systems Succinct enough to be implemented Concern: user privacy

DL Standarized Log Format Design Capture high level user and system behaviors –Hierarchical organization –Encapsulated in transactions Interactions between the users and the system or among the system components Log format designed to record a number of different kinds of transactions Examples:  Login to the system  Submission of search query  Browsing a result list  Recording of a user failure

Log format design (cont.) Design –Reflective of DL function –Based on the 5S formal theory Unifying, mathematical theory to describe formally the semantics of DL components Guidance for how to organize the log structure

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content StructuresLabeled directed graphs; provide organization within the DL SpacesSets, properties and operations on those sets Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. SocietiesSets of communities and relationships among them

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL SpacesSets, properties and operations on those sets Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. SocietiesSets of communities and relationships among them

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. SocietiesSets of communities and relationships among them

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. SocietiesSets of communities and relationships among them

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. SocietiesSets of communities and relationships among them

Log design and 5S 5SDefinitionUse in Log Design StreamsRepresent static and dynamic multimedia content Temporal events, types of digital objects StructuresLabeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme SpacesSets, properties and operations on those sets Retrieval mode, Presentation information, Scenariossequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. SocietiesSets of communities and relationships among them User information

DL Log Format Specification Organization in structured logical way –XML- XML Schema Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction of analysis tools

Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...

DL Log Format - Structure (cont) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo

Log Format - Structure (cont.) Decomposition of event AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update

DL Log Format Structure (cont) Search attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog

DL Log Tool Implementation Digital Library User Layer XMLLogManager writeLogEntry (parameters) c1 XMLLogData c2 Log middleware System event storelogData (parameters) User event Analysis tool getLogData (parameters) logData Analysis request result DL patron DL analyst

Log Tool example: login Example 1: Login to the system usr3 Start mhabib T20:10: :

Log tool example: query a collection Example 2: query all Dirline records about “low back pain”.. Dirline CommunityRecord SearchByAnyParts NonPersistant low back pain T20:11: : T20:11: :00 List ByRank Ref to GMT

Log Analyzer Overview XML Log Log Data Parser/ Error Checker Routine module usr T20:10: … low back pain … 5114 Step 1: Extract Browse Query StringUser ID SearchError Doc ID Step 3: Populate Databases, Increment Global Variables, etc. Step 4: Create Final Statistics module module Step 2: Parse XML; Send Log Line Final Report/ Statistics Databases Step 1: Extract Log Data, SAX parser can be employed here Step 2: Parse Log Data and check for log errors (eg. server stalls and incomplete log line is output to XML Log) Step 3: The different modules populate various databases and/or increment the appropriate counters. Each module can adjust various databases, as for example the module, which increments the appropriate month hit counter and records that the user usr3 made a request at time T. Step 4: Aggregate data and output final statistics; all databases are made available

Summarizing this class and last week Looked at a view of DL quality –By examining the components of the DL independent of usage (explicit computation) –By looking at the view of the DL obtained by a visitor (log analysis) Each is a view that has been widely promulgated and well received, but is not an industry standard

Next week Joseph Lucia, Director of Villanova’s Falvey Library will talk about what is happening in this very innovative and significant digital library. Come prepared with questions and ready to comment and discuss what he presents.

References Gonçalves, M. A., Luo, M., Ali, M. F., and Fox, E. A. “An XML Log Standard and Tool for Digital Library Logging Analysis” In Research and Advanced Technology for Digital Libraries, 6th European Conference, ECDL 2002, Rome, Italy, September , 2002, Proceedings Klas, C., et al "A Logging Scheme for Comparative Digital Library Evaluation” Research and Advanced Technology for Digital Libraries, 10th European Conference, ECDL 2006, Alicante,Spain, September 17-29, 2006, Proceedings