Secondary Evidence for User Satisfaction With Community Information Systems Gregory B. Newby University of North Carolina at Chapel Hill ASIS Midyear Meeting 1999
What do we want to know? n Who are information seekers ; users? n What are their needs? n Are their needs being met? n Context: the goals and missions of the community net
What else do we want to know? n Are people viewing sponsorship information? n Reading policy documents? n Displaying images? n Using search engines or indexes? n Local or remote? n Browsing or reading?
Possible sources of evidence n Content analysis: what’s available on the system(s)? Questions asked. n Sociological research: talk to people, look at what they use the net for, etc. n Psychological research: evaluate cognitive change in user knowledge, etc. n Market research: broad data collection from multiple potential audiences
More possible sources of evidence n Secondary data: artifacts generated by information system use n Today’s focus: analysis of log file entries –Web usage statistics –Instrumenting online menu systems –Login or call history –Other system logs ( , FTP)
What questions may be asked of secondary data? n What content is accessed, with what frequency? n What paths are followed to content? n Are entry points, policy documents, or other front-end material bypassed? n Is content read, skimmed or skipped through? n What subsets of content are viewed by individuals (patterns of use)
What’s wrong with Web server logs? n Aggregate level access to content: not the whole story! n What are SESSIONS like (a sequence of accesses by a single person)? n What are paths from item to item (transcends a single “referrer” log) n Are data used linearly (following hyperlinks)? n How long is spent on a document?
More analysis is feasible. Sample: Web server logs n Single line entries for each “hit” (HTTP “GET” or similar request) n Separate file for errors, referrers n Sample entry: n 56kdial52.absi.net - - [22/May/1999:20:12: ] "GET /index.html HTTP/1.0"
Sources of complexity: n Multiple types of servers might be on a single system (e.g., RealServer, database server, search engine) n A Web page visit might involve many files n Frames and other authoring techniques can confuse n More than one person might use the same remote computer
Question: Can we get the “story” of a session? n Yes! Just track through all the “hits” from the same host within a narrow time period –Challenge: how narrow a time period? –Challenge: some hosts support multiple simultaneous users (but not many) –Challenge: lots of files per page might confuse things (but narrow +/- a few second time frames can help) –Challenge: what is structure of site?
Sample “GET” might include multiple files n [20/May/1999:18:44: ] "GET /~gbnewby/inls80/explore2.html HTTP/1.1" n [20/May/1999:18:44: ] "GET /~gbnewby/inls80/octo.gif HTTP/1.1" n [20/May/1999:18:44: ] "GET /~gbnewby/inls80/pmail.gif HTTP/1.1"
Here’s a “story” (gbn’s pages) n [08/May/1999:09:30: ] "GET /~gbnewby/index_top.html HTTP/1.0" [09/May/1999:00:44: ] "GET /~gbnewby/index_top.html HTTP/1.0" [09/May/1999:11:43: ] "GET /gbnewby/forms HTTP/1.0" [09/May/1999:12:06: ] "GET /gbnewby/forms/ HTTP/1.0" [09/May/1999:16:36: ] "GET /~gbnewby HTTP/1.0" [09/May/1999:17:44: ] "GET /~gbnewby/ HTTP/1.0" [10/May/1999:06:20: ] "GET /gbnewby/review2.html HTTP/1.0" [10/May/1999:09:33: ] "GET /gbnewby/vita.html HTTP/1.0" [10/May/1999:13:33: ] "GET /gbnewby/inls80/explore1.html HTTP/1.0" [11/May/1999:02:43: ] "GET /gbnewby/inls80/explore2.html HTTP/1.0" [11/May/1999:09:21: ] "GET /~gbnewby/vita.html HTTP/1.0" [11/May/1999:10:05: ] "GET /gbnewby/presentations/security.html HTTP/1.0" [11/May/1999:13:35: ] "GET /gbnewby/index_top.html HTTP/1.0"
Question: What are entry points for particular documents? n You’re on easy street with httpd “referrer” logs, but these are often not kept (for efficiency) n Otherwise, you don’t know where someone came from unless it was from YOUR site n By looking through a session “story” you can see the path people take to particular pages. Analyze finding aids!
Here’s a path, including searching and reading n [20/May/1999:11:08: ] "GET /docsouth HTTP/1.0" n [20/May/1999:11:08: ] "GET /docsouth/dasmain.html HTTP/1.0" n [20/May/1999:11:08: ] "GET /docsouth/dasnav.html HTTP/1.0" n [20/May/1999:11:08: ] "GET /docsouth/images/greensquare.gif HTTP/1.0" n [20/May/1999:11:08: ] "GET /docsouth/search.html HTTP/1.0"
(part II. This is via metalab.unc.edu) n [20/May/1999:11:08: ] "GET /docsouth/images/greenarrow.gif HTTP/1.0" n [20/May/1999:11:19: ] "GET /docsouth/southlit/southlit.html HTTP/1.0" n [20/May/1999:11:20: ] "GET /docsouth/southlit/southlitmain.html HTTP/1.0" n [20/May/1999:11:20: ] "GET /docsouth/southlit/southlitnav.html HTTP/1.0"
(Part III.) n [20/May/1999:11:38: ] "GET /docsouth/neh/neh.html HTTP/1.0" n [20/May/1999:11:38: ] "GET /docsouth/neh/nehmain.html HTTP/1.0" n [20/May/1999:11:38: ] "GET /docsouth/neh/nehnav.html HTTP/1.0" n [20/May/1999:11:39: ] "GET /docsouth/neh/specialneh.html HTTP/1.0" n [20/May/1999:11:39: ] "GET /docsouth/neh/texts.html HTTP/1.0" n [20/May/1999:11:40: ] "GET /docsouth/harriet/menu.html HTTP/1.0" n [20/May/1999:11:40: ] "GET /docsouth/harriet/small.gif HTTP/1.0" n [20/May/1999:11:41: ] "GET /docsouth/harriet/harriet.html HTTP/1.0" n [20/May/1999:11:41: ] "GET /docsouth/harriet/harrietcva.gif HTTP/1.0" n [20/May/1999:11:41: ] "GET /docsouth/harriet/harriettpa.gif HTTP/1.0"
Question: Where do people go from a particular location? n Again, your “story” logs can track this n Again, caching is a particular challenge. For example, a user might follow hyperlinks, but the logs show discontinuities (because they went via a cached document)
Sample: going from specifics, to index, to sub-index n 4blah18.blahinc.com - - [22/May/1999:00:21: ] "GET /mrm/father.html HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:21: ] "GET /mrm/bluegrass.gif HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:27: ] "GET /index.html HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:27: ] "GET /beige_pale.gif HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:27: ] "GET /pnetlogo.gif HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:28: ] "GET /directory.html HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:28: ] "GET /directory/culture.html HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:28: ] "GET /directory/buggy.jpg HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:28: ] "GET /prairienations/index.htm HTTP/1.0" n 4blah18.blahinc.com - - [22/May/1999:00:30: ] "GET /directory/nature.html HTTP/1.0"
Question: How long is spent on a document? n Easy: inter-click time from a session n You could even make an “average time per document” for some gateway documents (such as user agreements). Or, infer AT/D by tracking those sessions that “seem” to be contiguous. This is challenging: what if someone goes to another site, or takes a nap? n Caching is still a problem
Analysis of other secondary sources of data n See Newby & Bishop 1997 for instrumentation of menu systems –Log choices of menu options –Correlate with basic user demographics (collected online) –Problem: most modern systems are not login-based, they’re Web-based n Access logs: are people coming in from dial-up lines, academic locations, etc? Dial-up = watch graphics!
Conclusions n The “easy” automated tools for Web log analysis are insufficient n They could be extended with some programming effort or utilities n “Eyeballing” the logs is still useful n Be cautious about privacy - both your own site’s policy, and the problems of posting some log data