Copyright (c) 2010, Dr. Kuanchin Chen1 The Client-Server Architecture of the WWW Dr. Kuanchin Chen
Copyright (c) 2010, Dr. Kuanchin Chen2 The Hypertext Transfer Protocol (HTTP) HTTP governs the messages exchanged between a Web server and a Web client. Who put together the HTTP standard? What is the current version of it? Links: Main site for HTTP: The HTTP Specification: html html
Copyright (c) 2010, Dr. Kuanchin Chen3 WWW Client-Side Technologies (X)HTML CSS JavaScript (VBScript, PerlScript, etc.) XML/XSLT
Copyright (c) 2010, Dr. Kuanchin Chen4 Server-Side Technologies (I) Active Server Pages (ASP) Languages: VBScript, JavaScript, … Platforms: mostly Windows ASP.NET is supported on more operating systems than ASP, but … Portability: Medium-to-low Execution: Interpreted (compiled in.NET) Technology: Proprietary Java Server Pages (JSP) Language: Java Platforms: Windows, UNIX, Mac OS, … Portability: High Execution: Compiled Technology: Proprietary
Copyright (c) 2010, Dr. Kuanchin Chen5 Server-Side Technologies (II) PHP Hypertext Preprocessor (PHP) Language: PHP Platforms: Windows, UNIX, Mac OS,… Portability: High Execution: Compiled Technology: Open source Common Gateway Interface (CGI) Language: any Platforms: Windows, UNIX, Mac OS, … Portability: High Execution: Interpreted and/or compiled Technology: Proprietary or open source Others
Copyright (c) 2010, Dr. Kuanchin Chen6 What is a Port? A port is a logical connection place for client and server programs to “talk” to each other. Each server program or service is assigned a numeric port number. This is like a main telephone number (the server computer’s physical network connection) and extensions (individual port numbers for services). Well-known port numbers: FTP: port 21; HTTP: port 80; SMTP: port 25; IMAP: port 143 IANA (Internet Assigned Numbers Authority ) port assignment IANA (Internet Assigned Numbers Authority ) port assignment vs
Copyright (c) 2010, Dr. Kuanchin Chen7 WWW Message Exchange Client (E.g., a browser) Server (E.g., a Web server) A REQUEST message A RESPONSE message HTTP
Copyright (c) 2010, Dr. Kuanchin Chen8 The Hypertext Transfer Protocol (HTTP) HTTP governs the messages exchanged between a Web server and a Web client. Who put together the HTTP standard? What is the current version of it? Links: Main site for HTTP: The HTTP Specification: html html
Copyright (c) 2010, Dr. Kuanchin Chen9 Syntax of the REQUEST Message Request = Request-Line …………………….. (1) *(( general-header | request-header | entity-header ) CRLF) …………………….. (2) CRLF …………………….. (3) [ message-body ] …………………….. (4) Symbols used in the syntax: 1. | denotes OR 2. [ ] denotes OPTIONAL 3. * denotes ONE OR MORE 4. CRLF: Carriage Return (ASCII 13) and Line Feed( ASCII 10)
Copyright (c) 2010, Dr. Kuanchin Chen10 Syntax of the Request Line Request-Line = Method SP Request-URI SP HTTP Version CRLF Note: 1. Method: POST | GET | others 2. HTTP version: E.g., HTTP/1.1
Copyright (c) 2010, Dr. Kuanchin Chen11 The REQUEST Message – A Graphical View GET / HTTP/1.1 CRLF Host: CRLFwww.yahoo.com From: Accept: text/plain, text/html CRLF CRLF The request line The header section An empty line The optional message body The REQUEST Message
Copyright (c) 2010, Dr. Kuanchin Chen12 Want More Information about the REQUEST message? See the REQUEST section of the HTTP specification.the REQUEST section
Copyright (c) 2010, Dr. Kuanchin Chen13 The Request Message - Examples Example 1: GET / HTTP/1.1 CRLF Host: CRLF CRLFwww.yahoo.com Example 2: GET /library/ HTTP/1.1 CRLF Host: CRLFwww.wmich.edu CRLF Q: What are the equivalent URLs for the above two examples?
Copyright (c) 2010, Dr. Kuanchin Chen14 Syntax of the RESPONSE Message Response = Status-Line …………………….. (1) *(( general-header | request-header | entity-header ) CRLF) …………………….. (2) CRLF …………………….. (3) [ message-body ] …………………….. (4) Symbols used in the syntax: 1. | denotes OR 2. [ ] denotes OPTIONAL 3. * denotes ONE OR MORE 4. CRLF: Carriage Return (ASCII 13) and Line Feed( ASCII 10)
Copyright (c) 2010, Dr. Kuanchin Chen15 Syntax of the Status Line Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF Note: 1. HTTP version: E.g., HTTP/ Status-Code: See status code def.status code def.
Copyright (c) 2010, Dr. Kuanchin Chen16 Common Status Codes 200 OK: the request is fulfilled and the requested document is attached in the MESSAGE BODY section of the response message. 404 Not Found: the requested document is not found. 505 Internal Server Error: the server cannot fulfill the request (mostly because of the problems in the server-side programs) For more status code definitions, see the HTTP specification. HTTP specification
Copyright (c) 2010, Dr. Kuanchin Chen17 The RESPONSE Message – A Graphical View HTTP/ OK Date: Wed, 25 Dec :45:47 GMT Content-Type: text/html Content-length: 7386 CRLF My Test Page … The status line An empty line The optional message body The REQUEST Message The header section
Copyright (c) 2010, Dr. Kuanchin Chen18 The Response Message – An Example HTTP/ OK CRLF Date: Mon, 08 Sep :04:24 GMT CRLF Content-Type: text/html CRLF CRLF Yahoo! …
Copyright (c) 2010, Dr. Kuanchin Chen19 Exercise “Browser-less” request and response messages
Copyright (c) 2010, Dr. Kuanchin Chen20 Server vs. Client? Web Servers Apache Internet Information Service (IIS) … Web Clients Browsers Programs simulating the functionality of browsers Q: How do you know what server a site uses and how many days a site has been up?
Copyright (c) 2010, Dr. Kuanchin Chen21 Introduction to Web Servers The Web Server marketWeb Server market Market share User community Installation & Configuration See a document on WebCT Directories/Folders Server root Document root
Server Root vs. Document Root Copyright (c) 2010, Dr. Kuanchin Chen22 Internet Information Services (IIS) Apache (actually WAMP, a variant of Apache) Document Root Server Root
Copyright (c) 2010, Dr. Kuanchin Chen23 Default File(s) Default files The file to load when no file name is provided in a URL Apache and others: index.htm … IIS: default.htm, default.htm, index.htm… Locations of files and directories
Copyright (c) 2010, Dr. Kuanchin Chen24 Checking What Server-Side Technology is Used on a Web Site By file extension By checking modules installed on the Web server By asking around
Copyright (c) 2010, Dr. Kuanchin Chen25 XHTML Form Methods GET Used to retrieve an online document Also used to send HTML form data to a web server Data are sent along with the URL (therefore, they are visible in the address box) Limited URL length, therefore limited data volume Easier to bookmark dynamic pages POST Data are sent in the message body section (not visible to the user) Larger volume of data is possible More secure (compared with the GET method) Other form methods
Copyright (c) 2010, Dr. Kuanchin Chen26 URL Encoding (I) An encoded URL: Server%22 Certain characters of form data are encoded "...Only alphanumerics [0-9a-zA-Z], the special characters $- _.+!*'(), and reserved characters used for their reserved purposes may be used unencoded within a URL." (RFC 1738)RFC 1738 Reserved characters that have special meanings in a URL: Dollar ("$") Ampersand ("&") Plus ("+") Comma (",") Forward slash/Virgule ("/") Colon (":") Semi-colon (";") Equals ("=") Question mark ("?") 'At' symbol
Copyright (c) 2010, Dr. Kuanchin Chen27 URL Encoding (II) Unsafe characters that need to be encoded: "{", "}", "|", "\", "^", "~", "[", "]", and "`". How encoding is done.
Copyright (c) 2010, Dr. Kuanchin Chen28 URL Encoding (III) x?name=John+Doe&phone= sp.net&start=0&ie=utf-8&oe=utf- 8&client=firefox- a&rls=org.mozilla:en-US:official