Download presentation
Presentation is loading. Please wait.
Published byAxel Adenauer Modified over 6 years ago
1
Web Server Design Week 2 Old Dominion University
Department of Computer Science CS 495/595 Spring 2008 Michael L. Nelson 1/17/12
2
19.6.1.1 Changes to Simplify Multi-homed Web Servers and Conserve IP Addresses
Both clients and servers MUST support the Host request-header. A client that sends an HTTP/1.1 request MUST send a Host header. Servers MUST report a 400 (Bad Request) error if an HTTP/1.1 request does not include a Host request-header. Servers MUST accept absolute URIs.
3
Absolute & Relative URIs
% telnet 80 | tee 3-1.out Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. GET /~mln/index.html HTTP/1.1 Connection: close Host: [deletia] % telnet 80 | tee 3-2.out GET HTTP/1.1 [AIHT:~/Desktop/cs595-s06] mln% diff 3-1.out 3-2.out 5c5 < Date: Mon, 23 Jan :54:49 GMT --- > Date: Mon, 23 Jan :55:02 GMT
4
5.2 The Resource Identified by a Request
The exact resource identified by an Internet request is determined by examining both the Request-URI and the Host header field. An origin server that does not allow resources to differ by the requested host MAY ignore the Host header field value when determining the resource identified by an HTTP/1.1 request. (But see section for other requirements on Host support in HTTP/1.1.) An origin server that does differentiate resources based on the host requested (sometimes referred to as virtual hosts or vanity host names) MUST use the following rules for determining the requested resource on an HTTP/1.1 request: 1. If Request-URI is an absoluteURI, the host is part of the Request-URI. Any Host header field value in the request MUST be ignored. 2. If the Request-URI is not an absoluteURI, and the request includes a Host header field, the host is determined by the Host header field value. 3. If the host as determined by rule 1 or 2 is not a valid host on the server, the response MUST be a 400 (Bad Request) error message. [AIHT:~/Desktop/cs595-s06] mln% telnet 80 | tee 3-4.out Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. GET /~mln/index.html HTTP/1.1 Connection: close Host: foo.bar.edu HTTP/ OK Date: Mon, 23 Jan :59:19 GMT Server: Apache/ (Unix) ApacheJServ/1.1.2 PHP/4.3.4 Last-Modified: Sun, 29 May :46:53 GMT ETag: "1c52-14ed-42992d1d" Accept-Ranges: bytes Content-Length: 5357 Content-Type: text/html [deletia]
5
Is This RFC-2616 Compliant? % telnet www.cs.odu.edu 80
Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD HTTP/1.1 Host: Connection: close HTTP/ OK Date: Fri, 30 Jan :17:46 GMT Server: Apache/2.2.0 Last-Modified: Wed, 14 Jan :45:46 GMT ETag: " cfe-1247a280" Accept-Ranges: bytes Content-Length: 7422 Content-Type: text/html
6
Is This RFC-2616 Compliant? % telnet www.google.com 80
Trying Connected to Escape character is '^]'. HEAD / HTTP/1.1 Connection: close HTTP/ OK Cache-Control: private Content-Type: text/html Set-Cookie: PREF=ID=d ae:TM= :LM= :S=L0vxDxm20siPrfQi; expires=Sun, 17-Jan :14:07 GMT; path=/; domain=.google.com Server: GWS/2.1 Content-Length: 0 Date: Mon, 22 Jan :53:39 GMT Connection closed by foreign host.
7
This is Compliant… % telnet www.cs.odu.edu 80 Trying 128.82.4.2...
Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD / HTTP/1.1 Connection: close HTTP/ Bad Request Date: Mon, 22 Jan :56:07 GMT Server: Apache/2.2.0 Content-Type: text/html; charset=iso Connection closed by foreign host.
8
Host: Matters… % telnet www.cs.odu.edu 80 % telnet www.cs.odu.edu 80
Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD / HTTP/1.1 Host: Connection: close HTTP/ OK Date: Mon, 16 Jan :45:56 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Accept-Ranges: bytes Content-Type: text/html Connection closed by foreign host. % telnet 80 Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD / HTTP/1.1 Connection: close Host: HTTP/ OK Date: Mon, 16 Jan :46:11 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Last-Modified: Sun, 23 Nov :22:06 GMT ETag: "19f2-45c638bee2380" Accept-Ranges: bytes Content-Length: 6642 Content-Type: text/html Connection closed by foreign host.
9
3.2.2 http URL (RFC 2616) The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs. http_URL = " "//" host [ ":" port ] [ abs_path [ "?" query ]] If the port is empty or not given, port 80 is assumed. The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path (section 5.1.2). The use of IP addresses in URLs SHOULD be avoided whenever possible (see RFC 1900 [24]). If the abs_path is not present in the URL, it MUST be given as "/" when used as a Request-URI for a resource (section 5.1.2).
10
3.2.3 URI Comparison (RFC 2616) http://abc.com:80/~smith/home.html
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions: - A port that is empty or not given is equivalent to the default port for that URI-reference; - Comparisons of host names MUST be case-insensitive; - Comparisons of scheme names MUST be case-insensitive; - An empty abs_path is equivalent to an abs_path of "/". Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding. For example, the following three URIs are equivalent:
11
What can be in a URI? (or: “Learning to Love BNF”) (RFC 2396)
uric = reserved | unreserved | escaped unreserved = alphanum | mark mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" alphanum = alpha | digit
12
alphanum alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
13
reserved Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. reserved = ";" | "/" | "?" | ":" | | "&" | "=" | "+" | "$" | "," The "reserved" syntax class above refers to those characters that are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax; they are used as delimiters of the components described in Section 3. Characters in the "reserved" set are not reserved in all contexts. The set of characters actually reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding.
14
escaped An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
15
Encoding Cheat Sheet from:
16
excluded Although they are disallowed within the URI syntax, we include here a description of those US-ASCII characters that have been excluded and the reasons for their exclusion. The control characters in the US-ASCII coded character set are not used within a URI, both because they are non-printable and because they are likely to be misinterpreted by some control mechanisms. control = <US-ASCII coded characters 00-1F and 7F hexadecimal> The space character is excluded because significant spaces may disappear and insignificant spaces may be introduced when URI are transcribed or typeset or subjected to the treatment of word- processing programs. Whitespace is also used to delimit URI in many contexts. space = <US-ASCII coded character 20 hexadecimal>
17
excluded The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields. The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4). The percent character "%" is excluded because it is used for the encoding of escaped characters. delims = "<" | ">" | "#" | "%" | <"> Other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters. unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
18
Clients Hide Details! % telnet www.cs.odu.edu 80 Trying 128.82.4.2...
Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD world.txt HTTP/1.1 Host: Connection: close HTTP/ Not Found Date: Tue, 17 Jan :27:17 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Content-Type: text/html; charset=iso
19
Spaces Are Encoded % telnet www.cs.odu.edu 80 Trying 128.82.4.2...
Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD HTTP/1.1 Host: Connection: close HTTP/ OK Date: Tue, 17 Jan :27:54 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Last-Modified: Tue, 17 Jan :25:55 GMT ETag: "c-4b6b00cf932d2" Accept-Ranges: bytes Content-Length: 12 Content-Type: text/plain
20
Fragments (from RFC-1630) This represents a part of, fragment of, or a sub-function within, an object. Its syntax and semantics are defined by the application responsible for the object, or the specification of the content type of the object. … The fragment-id follows the URL of the whole object from which it is separated by a hash sign (#). If the fragment-id is void, the hash sign may be omitted: A void fragment-id with or without the hash sign means that the URL refers to the whole object. A reference to a particular part of a document may, including the fragment identifier, look like in which case the string "#andy" is not sent to the server, but is retained by the client and used when the whole object had been retrieved. stated less clearly in section 4.1, RFC 2396
21
URLs & Filenames? HIERARCHICAL FORMS
The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical. This enables partial forms of the URI. Substrings consisting of single or double dots ("." or "..") are similarly reserved. The significance of the slash between two segments is that the segment of the path to the left is more significant than the segment of the path to the right. ("Significance" in this case refers solely to closeness to the root of the hierarchical structure and makes no value judgement!) Note The similarity to unix and other disk operating system filename conventions should be taken as purely coincidental, and should not be taken to indicate that URIs should be interpreted as file names. RFC-1630; restated in various ways in other RFCs
22
MIME Types Multipurpose Internet Mail Extensions
RFCs 2045, 2046 used to populate http “Content-Type” response headers Although not part of http, web servers generally have a configurable method of mapping file extensions to MIME types, e.g.: .jpeg, .jpg image/jpeg .pdf application/pdf .ppt application/vnd.ms-powerpoint
23
TRACE Method ignore for now % telnet www.cs.vt.edu 80
Trying Connected to cs.vt.edu. Escape character is '^]'. TRACE / HTTP/1.1 Testing-1: Hello, Sailor! Testing-2: 69,105 Host: Connection: close HTTP/ OK Date: Mon, 16 Jan :23:32 GMT Server: Apache/2.2.3 (CentOS) Transfer-Encoding: chunked Content-Type: message/http 77 ignore for now
24
TRACE no longer on www.cs.odu.edu
% telnet 80 Trying Connected to xenon.cs.odu.edu. Escape character is '^]'. TRACE / HTTP/1.1 Testing-1: Hello, Sailor! Testing-2: 69,105 Host: Connection: close HTTP/ Method Not Allowed Date: Mon, 16 Jan :23:14 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Allow: Content-Length: 353 Content-Type: text/html; charset=iso <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>405 Method Not Allowed</title>
25
OPTIONS to verify… % telnet www.cs.odu.edu 80 Trying 128.82.4.2...
Connected to xenon.cs.odu.edu. Escape character is '^]'. OPTIONS / HTTP/1.1 Testing-1: Hello, Sailor! Testing-2: 69,105 Host: Connection: close HTTP/ OK Date: Mon, 16 Jan :54:43 GMT Server: Apache/ (Unix) PHP/5.3.5 mod_ssl/ OpenSSL/0.9.8q Allow: GET,HEAD,POST,OPTIONS Content-Length: 0 Content-Type: text/html Connection closed by foreign host.
26
Parsing Lines 2.2 Basic Rules …
CR = <US-ASCII CR, carriage return (13)> LF = <US-ASCII LF, linefeed (10)> HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all protocol elements except the entity-body (see appendix 19.3 for tolerant applications). The end-of-line marker within an entity-body is defined by its associated media type, as described in section 3.7. CRLF = CR LF HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. OK: Line1: xyz Line2: ab c Line3: pdq not OK: 19.3 Tolerant Applications … The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.
27
Common Log Format see: http://en.wikipedia.org/wiki/Common_Log_Format
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" [16/Jan/2012:18:18: ] "GET /~mln/ HTTP/1.1" [16/Jan/2012:18:18: ] "HEAD /~mln/ HTTP/1.1" see:
28
Good things to have in a config file(s)…
MIME types (including default type) document root your server name (version, assignment, etc.) location of your access log, debugging log(s) resolve IP addrs --> hostnames (y/n) default port # your server will run on templates (or pointers to templates) for directory listings, dynamically created entities (e.g., for non-200 responses), log formats default timeout (seconds) virtual URIs redirects (code, URI1, URI2)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.