Download presentation
Presentation is loading. Please wait.
Published byAlan Barber Modified over 9 years ago
1
The Web Server Every web site (the collection of html/css files, data files, scripts and other files) must be stored on a web server The term web server is used for two things – The hardware (computer) storing the data – The software that responds to requests over the Internet and responds to them Depending on the expected amount of traffic (number of requests), the web server hardware will probably need to be more capable than a standard PC – The server will need a sufficiently large hard disk to store all of the web site’s content the hard disk should also be fast fast network access fast processor It is possible that the server will host more than one web site and its also possible that a single web site might be spread across multiple computers
2
Popular Web Servers (software) By far, the most popular web server is Apache – Open source – free, source code available for modification – More than ½ of all web servers run Apache – We will focus on the Apache web server in these notes Others include – Microsoft IIS (Internet Information Services), serves about ¼ of all web servers, supports FTP, FTPS, SMTP, NNTP as well as HTTP/HTTPS – Igor Sysoev’ nginx (pronounced “engine x” – GWS which is a modified version of Apache, marketed by Google – lighttpd is also open source Why pay for a server? – Support – a company whose business requires reliable web service may be wary of open source software since the open source community’s support is not at their beckon call
3
How Web Servers Function The web server receives requests from clients – Typically http requests come into a web server over port 80 (or port 433 if https) – First, the web server must interpret the request What is the command? The most common command is GET to retrieve a page – other commands include POST to post a message to a bulletin board and PUT to upload a file Translate the path – the URL’s path is appended to the server’s “document root” In addition, a path is modified – if the path contains a ~ – if the path is aliased – if the path or file requires redirection – Now, retrieve the file Send back the file (or an error message) Log the communication (request, response, error)
4
Virtual Hosting Many organizations do not have their own web server (hardware) so instead purchase space from another company The web server (software) must be able to host multiple web sites Each web site has its own IP alias but they all must map to the same hardware IP address – Or the hardware might take on multiple IP addresses Thus, the one physical server hosts multiple web sites – this is known as virtual hosting Apache is able to do this quite easily – Each web site will have its own directory to store their web content in – A web site administration will have access to the web site’s directory but not the rest of the directory space Specific configuration data can be placed in the web site directory in a file called.htaccess – This allows the web site administrator to tailor the web server to their needs for instance by allowing access to directories or having their own password file for users of that web site
5
Server Side Scripting Through html/css, you can only dictate how the web page should be formatted/treated It also adds limited functionality such as links and including image or sound files For more dynamic pages, you need the pages to either – Respond to user input through client side scripting – Perform operations on the server through server side scripting Client side scripting is executed in the client’s brower Server side scripting is executed on the server – Or on a computer connected to the server that the server contacts (this might be the case to create proper load balancing) Server side scripting requires that the web server be able to execute specific programming languages – Php and perl are very common – Also popular are Python and Ruby – JavaScript is supposed to be for client side scripting but is also used by some for server side scripting we will briefly look at php later in the semester but you study server side scripting in detail in CSC/CIT 301
6
Features of Apache We study Apache in detail throughout CIT 436, here we will consider a few basic ideas of it – Apache is open source and highly configurable you can make changes to the source code you can configure almost all of the options as you like – Apache comes with A configuration file Error documents including multilanguage documents The ability to generate several types of log files The ability to return different files depending on user preferences (e.g., return the index.html file in English rather than German, return the.jpg version of the file instead of the.gif version) The ability to create virtual hosts The ability to serve as a reverse proxy server The ability to configure Apache differently for different directories, different files and different virtual hosts How to handle https requests
7
Configuring Apache Most of the configuration goes into one file, httpd.conf The configuration breaks down into four different types – Server configuration which includes the number of requests that can be processed at a time, what port(s) to listen to, what email address the server uses, what default name(s) to search for when a file name is omitted (usually index.html) – Virtual host configurations – for every virtual host, what IP alias (or address) it has, the document root location for that host’s file space, and specific configurations for that host (as long as they do not contradict server configurations) – Directory configurations – how to handle a specific directory – for instance, if a file name is omitted and there is no index.html file, should the directory listing appear? Can scripts be run within this directory? – File configurations – how to handle specific files and specific URLs (similar to directory configurations but they only impact one file or all files of a given name)
8
Apache Modules Apache comes with a number of modules that you can add to your server – The speling module (misspelled for some reason) attempts to correct simple URL misspellings For instance, if you omit a letter, transpose two letters or capitalize/lower case a letter incorrectly If you enter www.someserver.com/files/file1.html, this module can correct this if – files was supposed to be Files or FILES – the extension was supposed to be.htm – the name of the file is file.html instead of file1.html If there is a file2.html and file3.html but no file1.html, then both file2.html and file3.html are listed for the user to select – The rewrite module allows you to provide rules to rewrite URLs – for instance changing any path from /pub/files and /images to /pub/extra but to not change /pub/scripts – To add authentication for https and to use a database instead of a simple “flat” text file for passwords – There are dozens of modules available and you can also install 3 rd party modules
9
Status Codes Apache generates a status code after every access, many of these are error codes – 200 means success – 300 means redirection took place – 40x are the various errors The most common error is 404 – file not found Others include 400 – bad request, 401 – unauthorized access (requires a log in but you did not authenticate or your password was wrong), 403 – forbidden (the file is protected), 408 – request timeout, 410 – file gone – 500 means server error You can set up Apache to log each status code differently Typically, you will log all errors in an error file – You might inspect this occasionally to see why errors are occurring and to try to fix them
10
Logging Apache logs every transaction – Requests – Errors – Successful accesses What is logged? – You can specify what exactly to log but usually includes the client’s IP address, the time and date, the URL requested, the status code Why are successful requests logged? – Businesses will often data mine the log file to find trends For instance, if you discover that people often move from page A to page D, then you might make the pages more closely together or move the link to page D higher in page A You might look to see what pages are more popular You might want to see if certain pages are accessed at certain times of day or day of the week
11
Proxy Servers In a company of dozens or hundreds of users, many of whom might view the same web pages, it becomes more efficient to add a proxy server – The server acts as a cache for the entire company – If a page is already cached in the proxy server, it is returned to the user rather than placing a new request over the Internet thus it is faster and reduces Internet traffic The proxy server also provides – Authentication to ensure that only proper users are accessing the Internet – Security and censorship by limiting what sites, file types, names and content can be returned over the Internet – Anonymity by hiding the specific user’s IP address from the web server – only the proxy server’s IP address is known Squid is the most popular proxy server and like Apache, it is open source – Free, configurable and adaptable
12
Reverse Proxy Servers The proxy server sits between the people in your organization and the Internet The reverse proxy server sits between the Internet and your web site The main intention of a reverse proxy server is to check requests as they come in before they reach your web server – If you have multiple servers, the reverse proxy server can perform load balancing by selecting the web server that is least active – The reverse proxy server can examine the URL for mistakes and either correct/redirect the request or discard it so that the web server(s) is not bothered with it – The reverse proxy server can perform authentication The idea is that the reverse proxy server mostly acts as a filter between your web site and the Internet – Squid can be used as a reverse proxy server although it is more useful as a forward proxy server – Apache can serve as a reverse proxy server
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.