EUBA: The Emory User Behavior Analysis System Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab Math & Computer Science Department Arthur Murphy, Selden Deemer, Kyle Fenton Emory Libraries
2 Intelligent Information Access Lab Goals/Motivation Evaluate effectiveness of search and discovery with automatic behavioral metrics Perform aggregate and longitudinal studies Develop tools for usability studies “in the wild” Scale (hundreds/thousands of “participants”) Realistic behavior and tasks On-demand playback of “interesting” sessions Unified analysis/query framework for internal and external resource access and usage statistics Web-based query and statistics interface Access auditing, privacy, anonymity enforced
3 Intelligent Information Access Lab Approach: Client-side instrumentation Implemented on top of the Emory Installation of the LibX Toolbar: ( Extended LibX to track UI events: JavaScript patch to sample the mouse movements and other events on pre-specified web search pages. Events are encoded into a string and buffered, and periodically sent to the server (on internal library network).
4 Intelligent Information Access Lab Events captured (v0.5, Aug. 2008) Button/link clicks/Url changes Name of the button, link, other meta-info Mouse movements (x,y) coordinates sampled ~every 10ms Scrolling Start, stop position, ~ every 10ms Text entry, keypress (ctrl-c, ctrl-v) Query text, options changes Menu item events Print, bookmark, save (all of them) Hover over important elements Mouse-in/out of browser
5 Intelligent Information Access Lab How it works On login to Learning Commons, Firefox is started with If previously opted in (or out), goto homepage Else show consent form Store user choice in database; if opted in, also store salted hash string for user login Can opted-in user behavior over “lifetime” No way to recover login id by dictionary attack Can be removed at any time by deleting mapping
6 Intelligent Information Access Lab How it works (2 of 3): Consent
7 Intelligent Information Access Lab How it works (3 or 3): which URLs? For all visited URLs LibX notifies the server; information varies by type of site: White list (search sites): Black list (known private sites): Only domain name is saved All “ and “mail.*” URLs White list (known search/discovery sites): EUCLID, Primo, Google, Google Scholar, Yahoo and Live search engines, Wikipedia All events captured Gray list (search results and important public sites) Mouse moves and clicks (no keypress/text) The rest: Only URL, button clicks, and menu items
8 Intelligent Information Access Lab Emory User Behavior Analysis System Combines client side instrumentation, server-side caching, log management, querying, and analysis Client-side instrumentation, data mining/machine learning (Qi Guo) Log DB parsing, indexing, web-based interface for querying, playback, annotation (Ryan Kelly) Plan: to release the system to research/library community (2009?)
9 Intelligent Information Access Lab EUBA Web-based analysis interface Prototype: user: test password: notsafe
10 Intelligent Information Access Lab Future Plans Incorporate log data for ranking, discovery, query suggestion, collaborative filtering Richer statistics and visualization Streamline usability studies Comments and suggestions welcome!