Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N"

Similar presentations


Presentation on theme: "© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N""— Presentation transcript:

1 © 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 2: Web Server Log An extract from KDnuggets web log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)"

2 © 2006 KDnuggets Web Server Log – An Example http://www.kdnuggets.com/jobs/ KDnuggets.com Server Web server log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET … HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /gps.html HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 … Page contents

3 © 2006 KDnuggets Web (Server) Log – In Depth A sample web log line 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 152.152.98.11 - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining &hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)"

4 © 2006 KDnuggets Web log field: IP 152.152.98.11 IP address - can be converted to host name, such as xyz.example.com

5 © 2006 KDnuggets Web log fields: Name, Login - The name of the remote user (usually omitted and replaced by a dash “ - ” ) - Login of the remote user (also usually omitted and replaced by a dash “ - ” )

6 © 2006 KDnuggets Web log field: Date/Time/TZ [16/Nov/2005:16:32:50 -0500] Date: DD/Mon/YYYY Time: HH:MM:SS Time Zone: (+|-)HH00 relative to GMT -0500 is US EST

7 © 2006 KDnuggets Web log field: Request "GET /jobs/ HTTP/1.1" Method: GET HEAD POST OPTIONS … URL: relative to domain HTTP protocol: e.g. HTTP/1.0 or HTTP/1.1 Note: the request is recorded as sent, so it may contain errors, hacks, and any strange thing you can imagine

8 © 2006 KDnuggets Web log field: Status code 200 Status (Response) code. Most important ones are:  200 – OK (most frequent, hopefully)  206 – partial access  301 – permanently redirected (e.g. access to /courses is redirected to /courses/ )  302 – temporarily redirected  304 – not modified  404 – not found  …

9 © 2006 KDnuggets Web log field: Object size 15140 size of the object returned to the client, in bytes Can also be “ - ” if status code is 304 (not modified)

10 © 2006 KDnuggets Web log field: Referrer http://www.google.com/search?q=salary +for+data+mining&hl=en&lr=&start=10 &sa=N URL the visitor came from (here it was a Google query for “salary for data mining”, 2 nd page of results – starting from 10) Referrer can also be a static page, internal (same domain) or external (different domain), or “-” in case of a direct request (e.g. type-in, bookmark) Referrer analysis is very valuable

11 © 2006 KDnuggets Web log field: User agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)" User agent (browser) http://en.wikipedia.org/wiki/User_agenthttp://en.wikipedia.org/wiki/User_agent Almost all browsers start with Mozilla – for historic reasons In many cases additional information: Browser type, version : MSIE 6.0 - Internet Explorer 6.0 OS: Windows NT 5.1 (XP SP2) with.NET Framework 1.1 installed

12 © 2006 KDnuggets Web Usage Mining  Basic  Totals  Simple  Request level breakdowns  Advanced  Visit level analysis  Target pages; Conversion analysis

13 © 2006 KDnuggets Web Log Analysis Programs  Free  Analog, awstats, webalizer  Google analytics  Commercial  WebTrends, WebSideStory, … www.kdnuggets.com/software/web-mining.html

14 © 2006 KDnuggets Web Usage Mining - Basic  Totals for each component  Hits – total number of requests  Files – number of GETs  Pages – number of HTML pages  Sites – unique IP addresses  Response codes  Kbytes – total Kbytes transferred  User Agents

15 © 2006 KDnuggets Example: KDnuggets.com Nov 2005 totals Monthly Statistics (from webalizer) TotalValue Hits 1,121,643 Files930,468 Pages312,889 Kbytes10,578,535 Unique Sites (IP)35,942 Unique URLs6,769 Unique Referrers7,213 Unique User Agents2,724 More details Q: What is the meaning of the difference between Hits and Files?

16 © 2006 KDnuggets Example: KDnuggets.com Nov 2005 totals, 2 Monthly stats for Files by Status Code CodeHits Code 200 - OK930,468 Code 206 - Partial Content 9,303 Code 301 - Moved Permanently 4,217 Code 302 - Found457 Code 304 - Not Modified 170,874 Code 404 - Not Found6,297 Other27 Answer: the difference between Hits and Files is the number of requests with status code not 200.

17 © 2006 KDnuggets Difference between Files and Pages  Q: What is the meaning of difference between Files and Pages ?

18 © 2006 KDnuggets Difference between Files and Pages  A: the difference between Files and Pages is the number of non-HTML files (e.g. image, javascript, etc  In November 2005 KDnuggets log HTML files were about 1/3 of all requests  However, this data does not separate bot requests (which are heavily weighted towards HTML pages)

19 © 2006 KDnuggets Notes: web log formats  We used web log in Apache standard format  Some old logs have a different format without the last 2 fields (referrer and user agent), but these are now rare.


Download ppt "© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N""

Similar presentations


Ads by Google