Download presentation
Presentation is loading. Please wait.
Published byVerity Todd Modified over 9 years ago
1
© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 2: Web Server Log An extract from KDnuggets web log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)"
2
© 2006 KDnuggets Web Server Log – An Example http://www.kdnuggets.com/jobs/ KDnuggets.com Server Web server log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET … HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /gps.html HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 … Page contents
3
© 2006 KDnuggets Web (Server) Log – In Depth A sample web log line 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)“ 152.152.98.11 - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining &hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)"
4
© 2006 KDnuggets Web log field: IP 152.152.98.11 IP address - can be converted to host name, such as xyz.example.com
5
© 2006 KDnuggets Web log fields: Name, Login - The name of the remote user (usually omitted and replaced by a dash “ - ” ) - Login of the remote user (also usually omitted and replaced by a dash “ - ” )
6
© 2006 KDnuggets Web log field: Date/Time/TZ [16/Nov/2005:16:32:50 -0500] Date: DD/Mon/YYYY Time: HH:MM:SS Time Zone: (+|-)HH00 relative to GMT -0500 is US EST
7
© 2006 KDnuggets Web log field: Request "GET /jobs/ HTTP/1.1" Method: GET HEAD POST OPTIONS … URL: relative to domain HTTP protocol: e.g. HTTP/1.0 or HTTP/1.1 Note: the request is recorded as sent, so it may contain errors, hacks, and any strange thing you can imagine
8
© 2006 KDnuggets Web log field: Status code 200 Status (Response) code. Most important ones are: 200 – OK (most frequent, hopefully) 206 – partial access 301 – permanently redirected (e.g. access to /courses is redirected to /courses/ ) 302 – temporarily redirected 304 – not modified 404 – not found …
9
© 2006 KDnuggets Web log field: Object size 15140 size of the object returned to the client, in bytes Can also be “ - ” if status code is 304 (not modified)
10
© 2006 KDnuggets Web log field: Referrer http://www.google.com/search?q=salary +for+data+mining&hl=en&lr=&start=10 &sa=N URL the visitor came from (here it was a Google query for “salary for data mining”, 2 nd page of results – starting from 10) Referrer can also be a static page, internal (same domain) or external (different domain), or “-” in case of a direct request (e.g. type-in, bookmark) Referrer analysis is very valuable
11
© 2006 KDnuggets Web log field: User agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR 1.1.4322)" User agent (browser) http://en.wikipedia.org/wiki/User_agenthttp://en.wikipedia.org/wiki/User_agent Almost all browsers start with Mozilla – for historic reasons In many cases additional information: Browser type, version : MSIE 6.0 - Internet Explorer 6.0 OS: Windows NT 5.1 (XP SP2) with.NET Framework 1.1 installed
12
© 2006 KDnuggets Web Usage Mining Basic Totals Simple Request level breakdowns Advanced Visit level analysis Target pages; Conversion analysis
13
© 2006 KDnuggets Web Log Analysis Programs Free Analog, awstats, webalizer Google analytics Commercial WebTrends, WebSideStory, … www.kdnuggets.com/software/web-mining.html
14
© 2006 KDnuggets Web Usage Mining - Basic Totals for each component Hits – total number of requests Files – number of GETs Pages – number of HTML pages Sites – unique IP addresses Response codes Kbytes – total Kbytes transferred User Agents
15
© 2006 KDnuggets Example: KDnuggets.com Nov 2005 totals Monthly Statistics (from webalizer) TotalValue Hits 1,121,643 Files930,468 Pages312,889 Kbytes10,578,535 Unique Sites (IP)35,942 Unique URLs6,769 Unique Referrers7,213 Unique User Agents2,724 More details Q: What is the meaning of the difference between Hits and Files?
16
© 2006 KDnuggets Example: KDnuggets.com Nov 2005 totals, 2 Monthly stats for Files by Status Code CodeHits Code 200 - OK930,468 Code 206 - Partial Content 9,303 Code 301 - Moved Permanently 4,217 Code 302 - Found457 Code 304 - Not Modified 170,874 Code 404 - Not Found6,297 Other27 Answer: the difference between Hits and Files is the number of requests with status code not 200.
17
© 2006 KDnuggets Difference between Files and Pages Q: What is the meaning of difference between Files and Pages ?
18
© 2006 KDnuggets Difference between Files and Pages A: the difference between Files and Pages is the number of non-HTML files (e.g. image, javascript, etc In November 2005 KDnuggets log HTML files were about 1/3 of all requests However, this data does not separate bot requests (which are heavily weighted towards HTML pages)
19
© 2006 KDnuggets Notes: web log formats We used web log in Apache standard format Some old logs have a different format without the last 2 fields (referrer and user agent), but these are now rare.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.