Download presentation
Presentation is loading. Please wait.
Published byBlake Glenn Modified over 9 years ago
1
Secondary Evidence for User Satisfaction With Community Information Systems Gregory B. Newby University of North Carolina at Chapel Hill ASIS Midyear Meeting 1999
2
What do we want to know? n Who are information seekers ; users? n What are their needs? n Are their needs being met? n Context: the goals and missions of the community net
3
What else do we want to know? n Are people viewing sponsorship information? n Reading policy documents? n Displaying images? n Using search engines or indexes? n Local or remote? n Browsing or reading?
4
Possible sources of evidence n Content analysis: what’s available on the system(s)? Questions asked. n Sociological research: talk to people, look at what they use the net for, etc. n Psychological research: evaluate cognitive change in user knowledge, etc. n Market research: broad data collection from multiple potential audiences
5
More possible sources of evidence n Secondary data: artifacts generated by information system use n Today’s focus: analysis of log file entries –Web usage statistics –Instrumenting online menu systems –Login or call history –Other system logs (email, FTP)
6
What questions may be asked of secondary data? n What content is accessed, with what frequency? n What paths are followed to content? n Are entry points, policy documents, or other front-end material bypassed? n Is content read, skimmed or skipped through? n What subsets of content are viewed by individuals (patterns of use)
7
What’s wrong with Web server logs? n Aggregate level access to content: not the whole story! n What are SESSIONS like (a sequence of accesses by a single person)? n What are paths from item to item (transcends a single “referrer” log) n Are data used linearly (following hyperlinks)? n How long is spent on a document?
8
More analysis is feasible. Sample: Web server logs n Single line entries for each “hit” (HTTP “GET” or similar request) n Separate file for errors, referrers n Sample entry: n 56kdial52.absi.net - - [22/May/1999:20:12:45 -0500] "GET /index.html HTTP/1.0" 200 6353
9
Sources of complexity: n Multiple types of servers might be on a single system (e.g., RealServer, database server, search engine) n A Web page visit might involve many files n Frames and other authoring techniques can confuse n More than one person might use the same remote computer
10
Question: Can we get the “story” of a session? n Yes! Just track through all the “hits” from the same host within a narrow time period –Challenge: how narrow a time period? –Challenge: some hosts support multiple simultaneous users (but not many) –Challenge: lots of files per page might confuse things (but narrow +/- a few second time frames can help) –Challenge: what is structure of site?
11
Sample “GET” might include multiple files n 203.87.57.76 - - [20/May/1999:18:44:48 -0400] "GET /~gbnewby/inls80/explore2.html HTTP/1.1" 200 9681 n 203.87.57.76 - - [20/May/1999:18:44:50 -0400] "GET /~gbnewby/inls80/octo.gif HTTP/1.1" 200 12053 n 203.87.57.76 - - [20/May/1999:18:44:53 -0400] "GET /~gbnewby/inls80/pmail.gif HTTP/1.1" 200 593
12
Here’s a “story” (gbn’s pages) n 116.33.237.26 - - [08/May/1999:09:30:59 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030 116.33.237.26 - - [09/May/1999:00:44:45 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030 116.33.237.26 - - [09/May/1999:11:43:31 -0400] "GET /gbnewby/forms HTTP/1.0" 301 186 116.33.237.26 - - [09/May/1999:12:06:30 -0400] "GET /gbnewby/forms/ HTTP/1.0" 200 1837 116.33.237.26 - - [09/May/1999:16:36:06 -0400] "GET /~gbnewby HTTP/1.0" 301 181 116.33.237.26 - - [09/May/1999:17:44:47 -0400] "GET /~gbnewby/ HTTP/1.0" 200 1355 116.33.237.26 - - [10/May/1999:06:20:22 -0400] "GET /gbnewby/review2.html HTTP/1.0" 200 5178 116.33.237.26 - - [10/May/1999:09:33:51 -0400] "GET /gbnewby/vita.html HTTP/1.0" 200 29487 116.33.237.26 - - [10/May/1999:13:33:30 -0400] "GET /gbnewby/inls80/explore1.html HTTP/1.0" 200 3977 116.33.237.26 - - [11/May/1999:02:43:15 -0400] "GET /gbnewby/inls80/explore2.html HTTP/1.0" 200 9681 116.33.237.26 - - [11/May/1999:09:21:56 -0400] "GET /~gbnewby/vita.html HTTP/1.0" 200 29487 116.33.237.26 - - [11/May/1999:10:05:31 -0400] "GET /gbnewby/presentations/security.html HTTP/1.0" 200 11270 116.33.237.26 - - [11/May/1999:13:35:27 -0400] "GET /gbnewby/index_top.html HTTP/1.0" 200 7030
13
Question: What are entry points for particular documents? n You’re on easy street with httpd “referrer” logs, but these are often not kept (for efficiency) n Otherwise, you don’t know where someone came from unless it was from YOUR site n By looking through a session “story” you can see the path people take to particular pages. Analyze finding aids!
14
Here’s a path, including searching and reading n 128.22.40.142 - - [20/May/1999:11:08:34 -0400] "GET /docsouth HTTP/1.0" 301 307 n 128.22.40.142 - - [20/May/1999:11:08:45 -0400] "GET /docsouth/dasmain.html HTTP/1.0" 200 2705 n 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/dasnav.html HTTP/1.0" 200 679 n 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/images/greensquare.gif HTTP/1.0" 200 55 n 128.22.40.142 - - [20/May/1999:11:08:56 -0400] "GET /docsouth/search.html HTTP/1.0" 200 3778
15
(part II. This is via metalab.unc.edu) n 128.22.40.142 - - [20/May/1999:11:08:57 -0400] "GET /docsouth/images/greenarrow.gif HTTP/1.0" 200 113 n 128.22.40.142 - - [20/May/1999:11:19:58 -0400] "GET /docsouth/southlit/southlit.html HTTP/1.0" 200 3685 n 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitmain.html HTTP/1.0" 200 2583 n 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitnav.html HTTP/1.0" 200 789
16
(Part III.) n 128.22.40.142 - - [20/May/1999:11:38:40 -0400] "GET /docsouth/neh/neh.html HTTP/1.0" 200 3539 n 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehmain.html HTTP/1.0" 200 2743 n 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehnav.html HTTP/1.0" 200 759 n 128.22.40.142 - - [20/May/1999:11:39:21 -0400] "GET /docsouth/neh/specialneh.html HTTP/1.0" 200 16549 n 128.22.40.142 - - [20/May/1999:11:39:51 -0400] "GET /docsouth/neh/texts.html HTTP/1.0" 200 11999 n 128.22.40.142 - - [20/May/1999:11:40:16 -0400] "GET /docsouth/harriet/menu.html HTTP/1.0" 200 2085 n 128.22.40.142 - - [20/May/1999:11:40:27 -0400] "GET /docsouth/harriet/small.gif HTTP/1.0" 200 43701 n 128.22.40.142 - - [20/May/1999:11:41:01 -0400] "GET /docsouth/harriet/harriet.html HTTP/1.0" 200 217418 n 128.22.40.142 - - [20/May/1999:11:41:07 -0400] "GET /docsouth/harriet/harrietcva.gif HTTP/1.0" 200 85180 n 128.22.40.142 - - [20/May/1999:11:41:11 -0400] "GET /docsouth/harriet/harriettpa.gif HTTP/1.0" 200 77742
17
Question: Where do people go from a particular location? n Again, your “story” logs can track this n Again, caching is a particular challenge. For example, a user might follow hyperlinks, but the logs show discontinuities (because they went via a cached document)
18
Sample: going from specifics, to index, to sub-index n 4blah18.blahinc.com - - [22/May/1999:00:21:01 -0500] "GET /mrm/father.html HTTP/1.0" 200 1760 n 4blah18.blahinc.com - - [22/May/1999:00:21:03 -0500] "GET /mrm/bluegrass.gif HTTP/1.0" 200 26959 n 4blah18.blahinc.com - - [22/May/1999:00:27:48 -0500] "GET /index.html HTTP/1.0" 200 6216 n 4blah18.blahinc.com - - [22/May/1999:00:27:51 -0500] "GET /beige_pale.gif HTTP/1.0" 200 2085 n 4blah18.blahinc.com - - [22/May/1999:00:27:53 -0500] "GET /pnetlogo.gif HTTP/1.0" 200 3861 n 4blah18.blahinc.com - - [22/May/1999:00:28:07 -0500] "GET /directory.html HTTP/1.0" 302 216 n 4blah18.blahinc.com - - [22/May/1999:00:28:16 -0500] "GET /directory/culture.html HTTP/1.0" 200 2980 n 4blah18.blahinc.com - - [22/May/1999:00:28:18 -0500] "GET /directory/buggy.jpg HTTP/1.0" 200 8213 n 4blah18.blahinc.com - - [22/May/1999:00:28:38 -0500] "GET /prairienations/index.htm HTTP/1.0" 200 9136 n 4blah18.blahinc.com - - [22/May/1999:00:30:23 -0500] "GET /directory/nature.html HTTP/1.0" 200 6865
19
Question: How long is spent on a document? n Easy: inter-click time from a session n You could even make an “average time per document” for some gateway documents (such as user agreements). Or, infer AT/D by tracking those sessions that “seem” to be contiguous. This is challenging: what if someone goes to another site, or takes a nap? n Caching is still a problem
20
Analysis of other secondary sources of data n See Newby & Bishop 1997 for instrumentation of menu systems –Log choices of menu options –Correlate with basic user demographics (collected online) –Problem: most modern systems are not login-based, they’re Web-based n Access logs: are people coming in from dial-up lines, academic locations, etc? Dial-up = watch graphics!
21
Conclusions n The “easy” automated tools for Web log analysis are insufficient n They could be extended with some programming effort or utilities n “Eyeballing” the logs is still useful n Be cautious about privacy - both your own site’s policy, and the problems of posting some log data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.