Guide to the Clickstream Data Petr Berka University of Economics, Prague berka@vse.cz
Web Usage Mining Domain click-stream - a sequential series of page view (displays on user’s browser at one time) requests, server session - a click-stream of page views for a single user for a particular web site, user session - is the click-stream of page views for a single user across the entire web. Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 The Clickstream Data ~3Millions of records (24 days) from a www shop web server log Contains information about time; IP address; session ID; page request; referer There are hundreds of thousands of sessions; most of them very short, on average 16 pages Each page request in this www shop has the same structure – page type / content ID (product ID) Page types are for example dp (detail of product), sb (shopping basket), ct (contact) Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 Example of the Data unix time ;IP address ; session ID ; page request; referee 1074589200;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=124 ;www.google.cz; 1074589201;194.213.35.234;3995b2c0599f1782e2b40582823b1c94;/dp/?id=182 ; 1074589202;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/ ;www.seznam.cz; 1074589233;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=148 ;/dp/?id=124; 1074589245;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/ ;/dp/?id=148; 1074589248;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/contacts/ ; /; 1074589290;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/ ;/sb/; Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 Data Description table “obchod” (shop) - name of the internet shop (7 entries), table “kategorie” (category) - info about category of products (64 entries), table “list” (sheet) - info about a specific product of a more detailed type (157 entries), table “znacka” (brand) - name of the producer or brand of a product (197 entries), table “tema” (theme) - info about themes discussed in the on-line advice (36 entries) Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 Data Summary (1/3) 3 617 171 page requests 522 410 sessions 318 523 single page 203 887 length > 1 avg. length 16 median 8 modus 2 longest 15454 Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 Data Summary (2/3) time spent during a session avg. time 00:24:46 median 00:03:08 modus 00:00:09 longest 433:27:53 Clickstream Data, Discovery Challenge 2005
Clickstream Data, Discovery Challenge 2005 Data Summary (3/3) distribution of sessions with length > 1 Clickstream Data, Discovery Challenge 2005