Bringing It All Together Analyzing Web Server Log Files Eric Landrieu Lead Developer, PerfMan for Web Servers The Information.

Slides:



Advertisements
Similar presentations
PHP syntax basics. Personal Home Page This is a Hypertext processor It works on the server side It demands a Web-server to be installed.
Advertisements

ITIS 1210 Introduction to Web-Based Information Systems Chapter 44 How Firewalls Work How Firewalls Work.
Copyright © 2012 Certification Partners, LLC -- All Rights Reserved Lesson 4: Web Browsing.
Evaluation Workshop: Quantitative Evaluation Methods Peter Dowdell NOF-digitise Technical Advisory Service web:
1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
Collecting, Analyzing and Using Visitor Data Chapter 12.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
DT211/3 Internet Application Development Active Server Pages & IIS Web server.
How Clients and Servers Work Together. Objectives Web Server Protocols Examine how server and client software work Use FTP to transfer files Initiate.
© 2010, Robert K. Moniot Chapter 1 Introduction to Computers and the Internet 1.
Understanding Networks. Objectives Compare client and network operating systems Learn about local area network technologies, including Ethernet, Token.
Technologies for EC/EB Walt Scacchi FEMBA 290 Winter 2003.
ASP.NET 2.0 Chapter 6 Securing the ASP.NET Application.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Session Management A290/A590, Fall /25/2014.
Internet Information Server (IIS)
© De Montfort University, Web Servers Chris Hand And Howell Istance De Montfort University.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Cookies COEN 351 E-commerce Security. Client / Session Identification HTTP does not maintain state. State Information can be passed using: HTTP Headers.
E-insights, LLC © 2000 All rights reserved. Understanding Web Traffic Michael Whelan Part 1 of 2.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Mastering Windows Network Forensics and Investigation Chapter 11: Text Based Logs.
Evaluating Web Server Log Analysis Tools David Strom SD’98 2/13/98.
Christopher M. Pascucci Basic Structural Concepts of.NET Browser – Server Interaction.
Spying and security on the Internet Some tricks to know.
CS 401 Paper Presentation Praveen Inuganti
E.halFILE 2.2 New Application Features Session II.
Configuring a Web Server. Overview Overview of IIS Preparing for an IIS Installation Installing IIS Configuring a Web Site Administering IIS Troubleshooting.
Server tools. Site server tools can be utilised to build, host, track and monitor transactions on a business site. There are a wide range of possibilities.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
1 Chapter 6: Proxy Server in Internet and Intranet Designs Designs That Include Proxy Server Essential Proxy Server Design Concepts Data Protection in.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols.
HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.
SUS Commander Sean Merritt. Background Department of Natural Resources uses a Software Update Server to update the user’s PCs. The log files are cryptic.
2440: 141 Web Site Administration Web Server Monitoring and Analysis Instructor: Enoch E. Damson.
COMP3121 E-Commerce Technologies Richard Henson University of Worcester November 2011.
1 Welcome to CSC 301 Web Programming Charles Frank.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
NETWORK HARDWARE AND SOFTWARE MR ROSS UNIT 3 IT APPLICATIONS.
Turning Windows 7 into a Web Server Ch 28. Understanding Internet Information Services.
COSC 513 Operating Systems Project Presentation: Internet Security Instructor: Dr. Anvari Student: Ying Zhou Spring 2003.
Cookies COEN 351 E-commerce Security. Client / Session Identification HTTP Headers Client IP Address HTTP User Login FAT URLs Cookies.
1 DataWeb: The Horror Stories A talk given at the Institutional Web Management Workshop, Newcastle, September 1998 Victoria Marshall and Kevin O'Neill,
ECMM6018 Enterprise Networking for Electronic Commerce Tutorial 7
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
Chapter 12: How Private are Web Interactions?. Why we care? How much of your personal info was released to the Internet each time you view a Web page?
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Web Server.
 Shopping Basket  Stages to maintain shopping basket in framework  Viewing Shopping Basket.
WEB SERVER SOFTWARE FEATURE SETS
WHAT IS E-COMMERCE? E-COMMERCE is a online service that helps the seller/buyer complete their transaction through a secure server. Throughout the past.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
Web Measurement. The Web is Different from other Commuication Media More precise measurement of activity on Web sites is available More precise measurement.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
1 © 1999, Cisco Systems, Inc. 1293_07F9_c1 LocalDirector Version3.1.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
WEB TESTING
Block 5: An application layer protocol: HTTP
Web Development Web Servers.
Active Server Pages Computer Science 40S.
Warm Handshake with Websites, Servers and Web Servers:
PHP / MySQL Introduction
Chapter 12: Automated data collection methods
Web Privacy Chapter 6 – pp 125 – /12/9 Y K Choi.
INTELLIGENT BROWSERS Cenk Ursavas.
Presentation transcript:

Bringing It All Together Analyzing Web Server Log Files Eric Landrieu Lead Developer, PerfMan for Web Servers The Information Systems Manager, Inc.

Bringing It All Together Growth of Web Server Has become a vital part of the business model Internet web servers must be reliable, as they are truly an international 24x7x365 sales mechanism Content of site(s) can be just as damaging in user’s eyes as poor performance – we have a 2-edged sword

Bringing It All Together So how do we monitor the web server? OS-level tools –Performance Monitor (Windows NT) –SMF, RMF (OS/390) –Third-party offerings “Active” web site monitors (give a client-side view of the site) Database/Application monitoring Web server log files

Bringing It All Together So how do we monitor the web server? No one method can give you the whole picture on your web server’s health and performance OS Statistics Log File Analysis Active Site Monitoring Database Health

Bringing It All Together What’s in the Log Files? View of client-server “transactions” – client request, with the server response Multiple “transactions” can be required for a web page GET /parking/space.asp 404 File Not Found

Bringing It All Together What’s in the Log Files? Each “transaction” is totally separate in the log file Any “user-level” data must be manually grouped using criteria available in the particular log file

Bringing It All Together So what is in these log files?

Bringing It All Together Information in the log files Client IP - Usually the IP address, but can be resolved to DNS by the web server (not recommended) File requested by client (including directory) Method used in request (GET, POST, etc.)

Bringing It All Together Information in the log files Return Code - was it successful, and if not, why? Bytes Sent back to the client in the response Referring URL – where did the user find the link to this request? Browser String telling what browser is being used

Bringing It All Together Information in the log files Username - anonymous or authenticated access Cookie – The cookie relating to this “transaction”, if any Bytes Received by the server in the request Time Taken by the server to process the request

Bringing It All Together Standardized Log Formats Common Log Format (CLF) Extended Common Log Format W3C Standard Other formats may be product- specific, and many are extensions of the CLF or Extended CLF formats.

Bringing It All Together Common Log Format Advantages –Supported by just about every web server ever written Disadvantages –Inflexible –Contains very limited data: no Bytes Received, Time Taken, User agent (Browser), or Referer fields available.

Bringing It All Together Common Log Format [16/Feb/2001:06:59: ] "GET /cgi-bin/Count.cgi?df=gecbhome&dd=B HTTP/1.0" [16/Feb/2001:06:59: ] "GET /java/FixFontHeadline.class HTTP/1.0" [16/Feb/2001:06:59: ] "GET /graphics/trombone.gif HTTP/1.0" [16/Feb/2001:06:59: ] "GET /images/joinband.jpg HTTP/1.0" [16/Feb/2001:07:00: ] "GET /images/parade.jpg HTTP/1.0" [16/Feb/2001:10:20: ] "GET /schedule.shtml HTTP/1.0" [16/Feb/2001:10:26: ] "GET /index.shtml HTTP/1.0" [16/Feb/2001:10:21: ] "GET /about.shtml HTTP/1.0" [16/Feb/2001:10:26: ] "GET /communty.shtml HTTP/1.0" [16/Feb/2001:10:18: ] "GET /join.shtml HTTP/1.0" [16/Feb/2001:10:24: ] "GET /write.shtml HTTP/1.0" [16/Feb/2001:10:54: ] "GET /robots.txt HTTP/1.0"

Bringing It All Together Extended Common Log Format Adds User Agent (Browser) and Referrer to Common Log Format Advantages –Most web servers support it –More information available than CLF Disadvantages –Still no Time Taken or Bytes Received –Still inflexible

Bringing It All Together Extended Common Log Format [16/Feb/2001:06:59: ] "GET /cgi-bin/Count.cgi?df=gecbhome&dd=B HTTP/1.0" " "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)" [16/Feb/2001:06:59: ] "GET /java/FixFontHeadline.class HTTP/1.0" "-" "Java 1.1" [16/Feb/2001:06:59: ] "GET /graphics/trombone.gif HTTP/1.0" " "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)" [16/Feb/2001:06:59: ] "GET /images/joinband.jpg HTTP/1.0" " "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)" [16/Feb/2001:07:00: ] "GET /images/parade.jpg HTTP/1.0" " "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000; Windows 98)" [16/Feb/2001:10:20: ] "GET /schedule.shtml HTTP/1.0" "-“ [16/Feb/2001:10:26: ] "GET /index.shtml HTTP/1.0" "-“ [16/Feb/2001:10:21: ] "GET /about.shtml HTTP/1.0" "-“ [16/Feb/2001:10:26: ] "GET /communty.shtml HTTP/1.0" "-“ [16/Feb/2001:10:18: ] "GET /join.shtml HTTP/1.0" "-“ [16/Feb/2001:10:24: ] "GET /write.shtml HTTP/1.0" "-“ [16/Feb/2001:10:54: ] "GET /robots.txt HTTP/1.0" “-” “-”

Bringing It All Together W3C Extended Log Format Advantages –Very Flexible –Extensible Disadvantages –Not as universally supported by web servers

Bringing It All Together W3C Extended Log Format #Software: Microsoft Internet Information Services 5.0 #Version: 1.0 #Date: :01:20 #Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-version cs-host cs(User-Agent) cs(Cookie) cs(Referer) :01: GET /Default.asp HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e :01: GET /corporate.css HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF :01: GET /images/vDivider2.gif HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF :01: GET /images/toc_quicklink.gif HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF :01: GET /images/region_am.jpg HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF :01: GET /images/orange_square_bullet.gif HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF :01: GET /corpnews/images/org_pointer_2.gif HTTP/1.1 entry.corp.com Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+95) SITESERVER=ID=547754cdab354b60fcd92cd e;+ASPSESSIONIDGGQQGZEC=KEJNEBECDJLKONONHOOBBINF

Bringing It All Together Which Format(s) Does My Web Server Support Server Common Log FormatExtended CLF W3C Extended Log Format ApacheDefaultAvailable*No Microsoft IISAvailableNoDefault IBM HTTP Server (Websphere) (Based on Apache) DefaultAvailable*No iPlanet Web ServerDefaultAvailable*No Website Pro (Orielly) AvailableNoAvailable Lotus DominoDefaultAvailableNo

Bringing It All Together Which Format(s) Does My Web Server Support Server Common Log FormatExtended CLF W3C Extended Log Format AOLServerDefaultAvailableNo Zeus Web ServerDefaultAvailable*No XitamiAvailableDefaultNo I/Net Commerce Server/400 DefaultNo WebStar (Mac)AvailableNoAvailable Servertec Internet Server Available

Bringing It All Together Limitations -or- Why we can’t ignore other sources of information

Bringing It All Together Log File Limitations Not enough information to get the whole picture on the site’s performance and health –We need to correlate the log data with other sources. OS-level statistics (Performance Monitor, SMF, 3 rd party) “Active” web analysis (e.g. Keynote) Data on databases or other components of the site Web Server Clien t Internet Back End DB Web Site

Bringing It All Together Log File Limitations Not enough information to get the whole picture on the site’s performance and health –We need to correlate the log data with other sources. OS-level statistics (Performance Monitor, SMF, 3 rd party) “Active” web analysis (e.g. Keynote) Data on databases or other components of the site Web Server Clien t Internet Back End DB Web Site

Bringing It All Together Log File Limitations Not enough information to get the whole picture on the site’s performance and health –We need to correlate the log data with other sources. OS-level statistics (Performance Monitor, SMF, 3 rd party) “Active” web analysis (e.g. Keynote) Data on databases or other components of the site Web Server Clie nt Internet Back End DB Web Site

Bringing It All Together Log File Limitations Not enough information to get the whole picture on the site’s performance and health –We need to correlate the log data with other sources. OS-level statistics (Performance Monitor, SMF, 3 rd party) “Active” web analysis (e.g. Keynote) Data on databases or other components of the site Web Server Clien t Internet Back End DB Web Site

Bringing It All Together Log File Limitations Only when fit together with the other pieces do we get the complete picture of your total web site health. Web Server Clien t Internet Back End DB Web Site

Bringing It All Together Log File Limitations You may also have to deal with log file formats which don’t include all of the information that you would like. Bytes Received Time Taken Referrer User Agent Common Log Format

Bringing It All Together Issues With Log Files User or Session level statistics Caching Clustering What constitutes a “site”?

Bringing It All Together User or Session Level Statistics The server doesn’t give you statistics for the user (e.g. how long were they on the site?) You have to mine these yourself from the data available You will only be able to get approximations with this data, not exact figures

Bringing It All Together How do we group records for user-level statistics? Client’s IP address –Proxy Servers and firewalls with Network Address Translation (NAT) will make all users from behind the firewall look like one user –If the proxy or firewall has multiple IP addresses (or it is an array), multiple accesses of site from one user may look like multiple users

Bringing It All Together How do we group records for user-level statistics? Cookies –If the site assigns cookies to track users through the site, you can group the records based upon the cookie –Users who disable cookies on their browser mess this up –Not all log file formats include the cookie

Bringing It All Together How do we group records for user-level statistics? User name –Useful for intranet, but you must have the server disallow anonymous access –Impractical for most internet sites (except restricted access)

Bringing It All Together Caching Content from the web site may be cached outside of the web server The web server may not get notification of requests for content that are serviced by these caches The caches may be in Proxy Servers, Browsers, or elsewhere

Bringing It All Together Clustering Each server in a web cluster may maintain its own log file You have to combine the log files to get information relevant to the entire site One user accessing your site may get data from multiple servers You may still want information on each individual server, to verify that they are load-balancing properly

Bringing It All Together What constitutes a web site? You have to decide exactly what you want to call a site: –A load-balanced cluster –A single site running on a dedicated server –A single site on a server running multiple sites –A directory within a site on a server –Multiple servers which act as your web presence (home, support, e-commerce…)

Bringing It All Together What good is analyzing log files? OS-level analysis can’t: –Provide user (session)-level info –Break down by return code, file type or name, directory, etc.

Bringing It All Together What good is analyzing log files? “Active” monitoring: –Gives the client-side perspective –May not distinguish between a slow link/router and a slow response from server –Some are concerned only with response to the testing system, not server load –If a browser-based product, it may have troubles with browser incompatabilities

Bringing It All Together So what’s the key to analyzing log files? Grouping your log file records into useful statistics that will help you understand what is going on with your site

Bringing It All Together Example: 404 Errors When a user gets a 404 Error (File Not Found), they may perceive a lack of “professionalism” or “quality” with your site. You want to know not only what non- existent files are being requested, but why they are being requested (outdated link?)

Bringing It All Together Example: 404 Errors

Bringing It All Together Example: 404 Errors

Bringing It All Together Example: User Session Time You want to get as useful an approximation as is possible for how long users are staying at your site (at least, marketing will) Obviously, the longer they are browsing your site, the more interested they may be in what you have to offer You can use their first and last requests for files to get a rough approximation

Bringing It All Together Example: User Session Time Most sessions were very short (1-2 pages) This was an “Entry server” cluster, which passed off to other sites A few (<20% of total sessions) were very long

Bringing It All Together Example: Cluster Load- Balancing Ideally, your clustered servers for the site would be sharing the load equally If one server is carrying a larger load, it can lead to overall perceived slowdown of your site (most people going to a heavily loaded server while an idle server sits and does nothing)

Bringing It All Together Example: Cluster Load- Balancing

Bringing It All Together Example: Cluster Load- Balancing

Bringing It All Together So What Should I Take Out Of This? -or- Is there a point???

Bringing It All Together Summary Web server log file analysis is an important part of your monitoring of your web servers Log file analysis alone will not give you the complete picture of your web server, but you can’t get the complete picture without it Know what is useful in the log files, what limitations are inherent in them, and how to analyze them