Web Servers & Log Analysis What can we learn from looking at Web server logs? - What server resources were requested - When the files were requested -

Slides:



Advertisements
Similar presentations
The Internet and the Web
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
4.01 How Web Pages Work.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Browsers and Servers CGI Processing Model ( Common Gateway Interface ) © Norman White, 2013.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Topics in this presentation: The Web and how it works Difference between Web pages and web sites Web browsers and Web servers HTML purpose and structure.
Dreamweaver 8 Concepts and Techniques Introduction Web Site Development and Macromedia Dreamweaver 8.
1st Project Introduction to HTML.
What Is A Web Page? An Introduction to the Internet.
E-insights, LLC © 2000 All rights reserved. Understanding Web Traffic Michael Whelan Part 1 of 2.
Chapter ONE Introduction to HTML.
WEB DESIGN SOME FOUNDATIONS. SO WHAT IS THIS INTERNET.
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
1 Web Server Concepts Dr. Awad Khalil Computer Science Department AUC.
Copyright © cs-tutorial.com. Introduction to Web Development In 1990 and 1991,Tim Berners-Lee created the World Wide Web at the European Laboratory for.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Server tools. Site server tools can be utilised to build, host, track and monitor transactions on a business site. There are a wide range of possibilities.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
Creating Web Pages Overview. Design – Start with a Purpose Before you start any web page, you need to design the website. The first question that should.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Introduction To Internet
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 1 1 Browser Basics Introduction to the Web and Web Browser Software Tutorial.
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
1 UNIT 15 Webpage Creator Lecturer: fadwa tlaelan.
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Web Measurement. The Web is Different from other Commuication Media More precise measurement of activity on Web sites is available More precise measurement.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
COM: 111 Introduction to Computer Applications Department of Information & Communication Technology Panayiotis Christodoulou.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
WIRED - Web Analytics Week WIRED System Evaluations due now Web Logs overview Web Analytics - Understanding Queries - Tracking Users Web Log Reliability.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Chapter 8 Browsing and Searching the Web
CISC103 Web Development Basics: Web site:
Warm Handshake with Websites, Servers and Web Servers:
Evolution of Internet.
E-commerce | WWW World Wide Web - Concepts
Project 1 Introduction to HTML.
E-commerce | WWW World Wide Web - Concepts
Chapter 27 WWW and HTTP.
Web Page Concept and Design :
4.01 How Web Pages Work.
Presentation transcript:

Web Servers & Log Analysis What can we learn from looking at Web server logs? - What server resources were requested - When the files were requested - Who requested them (where IP address = who) - How they requested them (browser types & OS) Some assumptions - A request for a resource means the user did receive it - A resource is viewable & understandable to each user - Users are identified within a loose set of parameters How does knowing request patterns affect or help IA?

Types of Web Server Logs Proxy-based - Web access servers to control access or cache popular files Client-based - Local cache files - Browser History file(s) Network-based - Routers, firewalls & access points Server-based - Web servers to serve content

Using Web Servers The Apache Software Foundation Microsoft Internet Information Server (Services)Microsoft Internet Information Server (Services) These applications “Serve” - Text - HTML, XML, plain text - Graphics - jpeg, gif, png - CGI, servlets, XMLHttpRequest & other logicXMLHttpRequest - other MIME types such as movies & soundMIME types Most servers can log these files - Daily, weekly or monthly - Can not always log CGI or related logic (specifically or “out of the box”)

How Servers Work Hypertext Transfer Protocol - http 1.A file is requested from the browser 2.The request is transferred via the network 3.The server receives the request (& logs it) 4.The server provides the file (& logs it) 5.The browser displays the file Almost all Web servers work this way

Types of Server Logs Access Log - Logs information such as page served or time served Referer Log - Logs name of the server and page that links to current served page - Not always - Can be from any Web site Agent Log - Logs browser type and operating system Mozilla Windows

Log File Format Extended Log File Format - W3C Working Draft WD-logfile W3C Working Draft WD-logfile key advantage: - computer storage cost decreases while paper cost rises every server generates slightly different logs

Extended Log File Formats WWW Consortium Standards Will automatically record much of what is programmatically done now. - faster - more accurate - standard baselines for comparison - graphics standards

What is a log file? A delimited, text file with information about what the server is doing - IP Address or Domain name - Date/Time - Method used & Page Requested - Protocol, Response Code & Bytes Returned - Referring Page (sometimes) - UserAgent & Operating System p0016c74ea.us.kpmg.com - - [01/Sep/2004:08:17: ] "GET /images/sanchez.jpg HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)"

In search of Reliable Data Not as Foolproof as Paper - You can see when someone is reading a page - You can know the page is turned - You can know the book is checked out No State Information - The same person or another person could be reading pages 1 then page 2 - You really can’t tell how many users you have Server Hits not perfectly Representative - Counters inaccurate - Caching & Robots can influence + & - Floods/Bandwidth can Stop “intended” usage

What is a “hit”? Technically, a hit is simply any file requested from the server - That is logged - That represents (usually) part of a request to “see” a whole Web page Hits combine to represent a “page view” Page views combine to represent an “episode” or “session” - Episode is one activity or question a user perfoms or requests on a Web site - Session is a series of episodes that embodies all the interactions a user undertakes using a Web site (per time, based on averages around 30 min.)

Making Servers More Reliable Keep system setups simple - unique file and directory names - clear, consistent structure Configure CMS for logging/serving Use an FTP server for file transfer - frees up logs and server! Judicious use of links Wise MIME types - some hard/impossible to log

Clever Web Server Setup Redirect CGI to find referrer Use a database - store web content - record usage data create state information with programming - NSAPI - ActiveX Have contact information Have purpose statements

Managing Log Files Backup Store Results or Logs? Beginning New Logs Posting Results

Log Analysis Tools Analog Webalizer Sawmill WebTrends AWStats WWWStat GetStats Perl Scripts Data Mining & Business Intelligence tools

WebTrends A whole industry of analytics Most popular commercial application

Log Analysis Cumulative Sample Program started at Tue-03-Dec :20 local time. Analysed requests from Thu-28-Jul :31 to Mon- 02-Dec :59 (858.1 days). Total successful requests: (88 952) Average successful requests per day: (12 707) Total successful requests for pages: (17 492) Total failed requests: (1 649) Total redirected requests: (197) Number of distinct files requested: (2 268) Number of distinct hosts served: (11 284) Number of new hosts served in last 7 days: Corrupt logfile lines: 262 Unwanted logfile entries: 976 Total data transferred: Mbytes ( kbytes) Average data transferred per day: kbytes ( kbytes)

How about the iSchool Web site? Our server files are collected constantly - Daily Daily - Weekly Weekly - Monthly Monthly - Even yearlyyearly What does a quick look tell us? - How well is the server working? Uptime, server errors, logging errors - How popular is our site? Number of hits, popular files - Who is visiting the site? Countries, types of companies - What searches led people here?

UT & its Web server logs UT Web log reports (Figures in parentheses refer to the 7 days to 28-Mar :00). Successful requests: 39,826,634 (39,596,364) Average successful requests per day: 5,690,083 (5,656,623) Successful requests for pages: 4,189,081 (4,154,717) Average successful requests for pages per day: 598,499 (593,530) Failed requests: 442,129 (439,467) Redirected requests: 1,101,849 (1,093,606) Distinct files requested: 479,022 (473,341) Corrupt logfile lines: 427 Data transferred: Gbytes ( Gbytes) Average data transferred per day: Gbytes ( Gbytes)

Neat Analysis Tricks use a search engine to find references - “link: key to using unique names - use many engines update times different blocking mechanisms are different use Web searches (or Yahoo, Bloglines…) - look for references - look for IP addresses of users

Neat Tricks, cont. Walking up the Links - follow URL’s upward Reverse Sort - look for relations Use your own robot to index - Test

Web Surveys, an alternative Surveys actually ask users what they did, what they sought & if it helped GVU, Nielsen and GNN - Qualitative questions phone web forms - Self-selected sample problems random selection oversample

Analysis of a Very Large Search Log What kinds of patterns can we find? Request = query and results page 280 GB – Six Weeks of Web Queries - Almost 1 Billion Search Requests, 850K valid, 575K queries Million User Sessions (cookie issues) - Large volume, less trendy - Why are unique queries important? Web Users: - Use Short Queries in short sessions % one request - Mostly Look at the First Ten Results only - Seldom Modify Queries Traditional IR Isn’t Accurately Describing Web Search Phrase Searching Could Be Augmented Silverstein, Henzinger, Marais, Moricz (1998)

Analysis of a Very Large Search Log 2.35 Average Terms Per Query - 0 = 20.6% (?) - 1 = 25.8% - 2 = 26.0% = 72.4% Operators Per Query - 0 = 79.6% Terms Predictable First Set of Results Viewed Only = 85% Some (Single Term Phrase) Query Correlation - Augmentation - Taxonomy Input - Robots vs. Humans

Real Life Information Retrieval Real Life Information Retrieval 51K Queries from Excite (1997) Search Terms = 2.21 Number of Terms - 1 = 31% 2 = 31% 3 = 18% (80% Combined) Logic & Modifiers (by User) - Infrequent - AND, “+”, “-” Logic & Modifiers (by Query) - 6% of Users - Less Than 10% of Queries - Lots of Mistakes Uniqueness of Queries - 35% successive - 22% modified - 43% identical

Real Life Information Retrieval Queries per user 2.8 Sessions - Flawed Analysis (User ID) - Some Revisits to Query (Result Page Revisits) Page Views - Accurate, but not by User Use of Relevance Feedback (more like this) - Not Used Much (~11%) Terms Used Typical & frequent Mistakes - Typos - Misspellings - Bad (Advanced) Query Formulation Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998)

Downie & Web Usage Server logs are like library usage User-based analyses - who - where - what File-based analyses - amount Request analyses - conform (loosely) to Zipf’s Law Byte-based analyses

Web use analysis & IA? Another tool to begin to understand how people use your Web provided resources With a small amount of setup, you can learn a large amount Server use can be integrated into site usage for users - Lists of popular pages & more interlinking pages - Adding search terms that found the page to related pages - Adjust metadata to reflect searches that find pages - Add pages to the site index or site map First-cut usability information - Pages 1 & 2 were accessed, but not 3 - Why? - Navigation usage, link ordering and design understanding - Knowing what browsers & OS helps tailor design and media types

BREAK! No Presentation this week - Next week: Asset management, content management & version control Break up media development work Examine current pages, style sheets & designs Set up next set of pair & individual deliverables

Media Development work We need to find & create graphics for the new site Content about: - Austin - UT - iSchool - People at the iSchool - Students at work in the iSchool (classes, labs) Screen grab from videos Search the Web for copyright free images Take our own pictures

Current Pages & Designs First version of main iSchool page template and CSS completeiSchool page template Secondary page template & CSS complete - Some secondary pages already built Some secondary pages already built Index page template set Site map page initially set - Big Map - Main pages map

Next steps In class - Test & evaluate current CSS and templates - Improvise secondary home page based on initial design - Examine new Alumni section - Examine new Course Listing pagenew Course Listing page For homework - Complete secondary page migration to new design - Rotate design work Alumni Site Map Home page design ideas - Picture/Media creation work