Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web and the HTTP (HyperText Transfer Protocol)

Similar presentations


Presentation on theme: "Web and the HTTP (HyperText Transfer Protocol)"— Presentation transcript:

1 Web and the HTTP (HyperText Transfer Protocol)

2 What is a protocol and http?
A Protocol is a standard procedure for defining and regulating communication. i.e. TCP, UDP, HTTP, etc. HTTP is the foundation of data communication for the World Wide Web The HTTP is the Web’s application-layer protocol for transferring various forms of data between server and client such as plaintext hypertext image videos and Sounds etc. HTTP:

3 HTTP: how a client and server establish a connection,
It is the standard protocol for communication between web browsers and web servers. It defines: how a client and server establish a connection, how the client requests data from the server how the server responds to that request how data is transferred from the server back to the client. and finally, how the connection is closed It assumes very little about a particular system, and does not keep state between different message exchanges. This makes HTTP a stateless protocol. The communication usually takes place over TCP/IP, The default port for TCP/IP is 80, but other ports can also be used. HTTP:

4 Terminology IP Address: An Internet Protocol address (IP address) is a numerical label assigned to each device (e.g., computer, printer) participating in a  computer network that uses the Internet Protocol for communication. Port Number : A port number is a 16 bit number which when associated with IP address , completes the destination address for a communications session. TCP :Transmission Control Protocol (TCP) is one of the two original core protocols of the Internet Protocol Suite (IP), and is so common that the entire suite is often called TCP/IP. TCP provides reliable, ordered, error-checked delivery of a stream between programs running on computers connected to an intranet or the public Internet. Socket : A socket is nothing but a combination of IP address and port number. It is simply an end while communication. HTTP:

5 HTTP Underneath: Sockets
If you are programming a client, then you would open a socket like this: If you are programming a server, then this is how you open a socket: Socket MyClient; try { MyClient = new Socket("Machine name", PortNumber); } catch (IOException e) { System.out.println(e); } ServerSocket MyService; try { MyServerice = new ServerSocket(PortNumber); } catch (IOException e) { System.out.println(e); } HTTP:

6 Web and HTTP First, a review… web page consists of objects
object can be HTML file, JPEG image, Java applet, audio file,… web page consists of base HTML-file which includes several referenced objects each object is addressable by a URL, e.g., host name path name HTTP:

7 HTTP:

8 HTTP overview HTTP: hypertext transfer protocol
PC running Firefox browser server running Apache Web iphone running Safari browser HTTP request HTTP response HTTP overview HTTP: hypertext transfer protocol Web’s application layer protocol client/server model client: browser that requests, receives, and “displays” Web objects (using HTTP protocol) server: Web server sends objects in response to requests (using HTTP protocol) HTTP:

9 How http works? HTTP is implemented in two programs:
a client program and a server program, executing on different end systems, talk to each other by exchanging HTTP messages. The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces. HTTP:

10 HTTP overview (continued)
HTTP uses TCP: Client initiates TCP connection (creates socket) to server at port 80 Server accepts TCP connection from client HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server) TCP connection closed HTTP:

11 HTTP For each request from client to server, there is a sequence of four steps: The client opens a TCP connection to the server on port 80, by default; other ports may be specified in the URL. The client sends a message to the server requesting the resource at a specified path. The request includes a header, and optionally (depending on the nature of the request) a blank line followed by data for the request. The server sends a response to the client. The response begins with a response code, followed by a header full of metadata, a blank line, and the requested document or an error message. The server closes the connection. HTTP:

12 HTTP overview (continued)
HTTP is “stateless” Server maintains no information about past client requests Server does not remember any previous requests. protocols that maintain “state” are complex! past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled HTTP:

13 HTTP connections non-persistent HTTP persistent HTTP
At most one object sent over TCP connection, connection then closed Downloading multiple objects required multiple connections Separate TCP connection is needed to serve each resource (object). persistent HTTP Multiple objects can be sent over single TCP connection between client and server Single TCP connection is needed to serve multiple resources. Server leaves the connection open even after serving the request and closes connection on timeout. HTTP:

14 Typical browser-server interaction
User enters Web address in browser Browser uses DNS to locate IP address Browser opens TCP connection to server Browser sends HTTP request over connection Server sends HTTP response to browser over connection Browser displays body of response in the client area of the browser window HTTP:

15 Browser Steps: Brower-Server Interaction
A. Web Browser: Examines the part of the URL between the double slash and the first single slash For example in , the browser will check This identifies the computer to which you want to connect Because it contains letters, this part of the URL is a domain name, not an IP address Browser sends a request to a DNS server to obtain IP address for From the http: prefix, browser deduces that the protocol is HTTP --Uses port 80 by default It establishes a TCP/IP connection to port 80 at IP address obtained in step 1 HTTP:

16 Browser Steps: Brower-Server Interaction
It deduces from the /index.html that you want to see the file /index.html and sends this request formatted as an HTTP command through the established connection GET /index.html HTTP/1.0 blank line B. Web Server: Running on computer whose IP Address was obtained above receives the request It decodes the request It fetches the file /index.html It sends the file back to the browser on your computer C. Web Browser: The browser displays the contents of the file for you to see Since this file is an HTML file, it translates the HTML codes into fonts, bullets, etc. If the file contains images, it makes more GET requests through the same connection HTTP:

17 references to style.css )
Non-persistent HTTP (contains text, and a references to style.css ) suppose user enters URL: 1a. HTTP client initiates TCP connection to HTTP server (process) at on port 80 1b. HTTP server at host waiting for TCP connection at port 80. “accepts” connection, notifying client 2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object from the folders: Web/SE432/se432.html 3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket time HTTP:

18 Non-persistent HTTP (cont.)
4. HTTP server closes TCP connection. 5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects time 6. Steps 1-5 repeated for each of the referenced objects HTTP:

19 HTTP Methods/1 Communication with an HTTP server follows a request-response pattern: One stateless request followed by one stateless response. Each HTTP request has two or three parts: A start line containing the HTTP method and a path to the resource on which the method (verb) should be executed A header of name-value fields that provide meta-information such as authentication credentials and preferred formats to be used in the request A request body containing a representation of a resource (POST and PUT only) There are four main HTTP methods (verbs), that identify the operations that can be performed: • GET • POST • PUT • DELETE HTTP:

20 HTTP Methods/2 These most common request methods are:
GET: fetch an existing resource The URL contains all the necessary information the server needs to locate and return the resource POST: create a new resource The requests usually carry a payload that specifies the data for the new resource PUT: update an existing resource The payload may contain the updated data for the resource DELETE: delete an existing resource Note that: PUT and DELETE are sometimes considered specialized versions of the POST They may be packaged as POST requests with the payload containing the exact action: create, update or delete. We will see later HTTP:

21 HTTP Methods/3 There are some lesser used methods that HTTP also supports: HEAD: similar to GET, but without the message body It is used to retrieve the server headers for a particular resource, generally to check if the resource has changed, via timestamps. TRACE: used to retrieve the hops (jumps) that a request takes to round trip from the server. Each intermediate proxy or gateway would inject its IP or DNS name into the Via header field. This can be used for diagnostic purposes. OPTIONS: used to retrieve the server capabilities On the client-side, it can be used to modify the request based on what the server can support HTTP:

22 Method types HTTP/1.0: HTTP/1.1: What about HTTP 2 ???
Give me a report HW2 HTTP/1.0: GET POST HEAD asks server to leave requested object out of response HTTP/1.1: GET, POST, HEAD PUT uploads file in entity body to path specified in URL field DELETE deletes file specified in the URL field HTTP:

23 Status Codes With URLs and Methods:
the client can initiate requests to the server. In return, the server responds with status codes and message payloads (i.e. index.html) The status code is important and tells the client how to interpret the server response The HTTP specification defines certain number ranges for specific types of responses: 1xx: Informational Messages 2xx: Successful 3xx: Redirection 4xx: Client Error 5xx: Server Error HTTP:

24 Common Status Codes 200 OK 301 Moved Permanently 302 Moved temporarily
Everything worked, here’s the data 301 Moved Permanently requested object moved, new location specified later in this msg (Location:) 302 Moved temporarily URL temporarily out of service, keep the old one but use this one for now 400 Bad Request There is a syntax error in your request 403 Forbidden You can’t do this, and we won’t tell you why 404 Not Found No such document 408 Request Time-out, 504 Gateway Time-out Request took too long to fulfill for some reason 505 HTTP Version Not Supported HTTP:

25 HTTP Messages Request and Response Message Formats HTTP:

26 HTTP Messages HTTP is the language that web clients and web servers use to talk to each other HTTP is largely “under the hood” but a basic understanding can be helpful two types of HTTP messages: request, response Each message, whether a request or a response, has three parts: The request or the response line A header section The body of the message It’s mandatory to place a new line between the message headers and body. The message can contain one or more headers. message = <start-line> *(<message-header>) CRLF [<message-body>] <start-line> = Request-Line | Status-Line <message-header> = Keyword : Value HTTP:

27 General Headers There are a few headers (general headers) that are shared by both request and response messages: general-header = Cache-Control | Connection | Date | Pragma | Trailer | Transfer-Encoding | Upgrade | Via | Warning HTTP:

28 General Headers Via: header is used in a TRACE message and updated by all intermittent proxies and gateways Pragma: is considered a custom header and may be used to include implementation-specific headers. The most commonly used pragma-directive is Pragma: no-cache, which really is Cache-Control: no-cache under HTTP/1.1. Date: header field is used to timestamp the request / response message Upgrade: is used to switch protocols and allow a smooth transition to a newer protocol. Transfer-Encoding: is generally used to break the response into smaller parts with the Transfer-Encoding: chunked value. This is a new header in HTTP/1.1 and allows for streaming of response to the client instead of one big payload HTTP:

29 Entity headers Request and Response messages may also include entity headers to provide meta-information about the content a.k.a. Message Body or Entity These headers include: entity-header = Allow | Content-Encoding | Content-Language | Content-Length | Content-Location | Content-MD5 | Content-Range | Content-Type | Expires | Last-Modified HTTP:

30 Entity headers All of the Content- prefixed headers provide information about: structure encoding and size of the message body. Some of these headers need to be present if the entity is part of the message. The Expires header indicates a timestamp of when the entity expires. Interestingly, a "never expires" entity is sent with a timestamp of one year into the future. The Last-Modified header indicates the last modification timestamp for the entity HTTP:

31 Request Format The request message has the same generic structure as above (slide 26), except for the request line which looks like: Request-Line = Method SP URI SP HTTP-Version CRLF Method = "OPTIONS" | "HEAD" | "GET" | "POST" | "PUT" | "DELETE" | "TRACE" SP is the space separator between the tokens. HTTP-Version is specified as "HTTP/1.1" and then followed by a new line (CRLF). HTTP:

32 Request Format Thus, a typical request message might look like: Note:
the request line followed by many request headers. The Host header is mandatory for HTTP/1.1 clients.  GET requests do not have a message body, but POST requests can contain the post data in the body GET HTTP/1.1 Host: Connection: keep-alive Cache-Control: no-cache Pragma: no-cache Accept: text/html, application/xhtml+xml, application/xml; q=0.9, */*; q=0.8 HTTP:

33 Request Format A client may indicate that it’s willing to reuse a socket by including a Connection field in the HTTP request header with the value Keep-Alive: Connection: Keep-Alive In Java, the URL class transparently supports HTTP Keep-Alive unless explicitly turned off. That is, it will reuse a socket if you connect to the same server again before the server has closed the connection. HTTP:

34 Request Format The Accept prefixed headers indicate the acceptable:
media-types languages and character sets on the client. From, Host, Referrer and User-Agent identify details about the client that initiated the request The If- prefixed headers are used to: make a request more conditional, and the server returns the resource only if the condition matches. Otherwise, it returns a 304 Not Modified. The condition can be based on a timestamp or an ETag (a hash of the entity). HTTP:

35 Response Format The response format is similar to the request message, except for the status line and headers. The status line has the following structure: HTTP-Version is sent as HTTP/1.1 The Status-Code is one of the many statuses discussed earlier. The Reason-Phrase is a human-readable version of the status code. A typical status line for a successful response might look like so: Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF HTTP/ OK HTTP:

36 Response headers The response headers are also fairly limited, and the full set is given below: Age : is the time in seconds since the message was generated on the server. ETag : is the MD5 hash of the entity and used to check for modifications. Location : is used when sending a redirection and contains the new URL. Server : identifies the server generating the message. response-header = Accept-Ranges | Age | ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | WWW-Authenticate HTTP:

37 Example

38 What the client does, part I
The client sends a message to the server at a particular port (80 is the default) The first part of the message is the request line, containing: A method (HTTP command) such as GET or POST A document address, and An HTTP version number Example: GET /index.html HTTP/1.0 HTTP:

39 What the client does, part II
The second part of a request is optional header information, such as: What the client software is What formats it can accept All information is in the form Name: Value Example: User-Agent: Mozilla/2.02Gold (WinNT; I) Accept: image/gif, image/jpeg, */* A blank line ends the header HTTP:

40 Client request headers
Accept: type/subtype, type/subtype, ... Specifies media types that the client prefers to accept Accept-Language: en, fr, de Preferred language (For example: English, French, German) User-Agent: string The browser or other client program sending the request From: address of user of client program Cookie: name=value Information about a cookie for that URL Multiple cookies can be separated by commas HTTP:

41 What the client does, part III
The third part of a request (after the blank line) is the entity-body, which contains optional data The entity-body part is used mostly by POST requests The entity-body part is always empty for a GET request HTTP:

42 HTTP request message: general format
line header lines body method sp cr lf version URL value header field name ~ entity body HTTP:

43 Example: HTTP request message
ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines carriage return, line feed at start of line indicates end of header lines GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO ,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n carriage return character line-feed character HTTP:

44 Uploading form input POST method: URL method:
web page often includes form input input is uploaded to server in entity body URL method: uses GET method input is uploaded in URL field of request line: HTTP:

45 What the server does, part I
The server response is also in three parts The first part is the status line, which tells: The HTTP version A status code A short description of what the status code means Example: HTTP/ Not Found Status codes are in groups: Informational The request was successful The request was redirected The request failed A server error occurred HTTP:

46 What the server does, part II
The second part of the response is header information, ended by a blank line Example: Content-Length: 2532 Connection: Close Server: Microsoft IIS / 8.5 Date: Sun, 01 Oct :24:50 GMT Content-Type: text/html Cache-control: private Set-Cookie: PREF=ID=05302a93093ec661:TM= :LM= :S=yNWNjraftUz299RH; expires=Sun,17-Jan-2038,19:14:07 GMT; path=/; domain=.google.com All on one line HTTP:

47 Server response headers
Server: Microsoft IIS / 8.5 Name and version of the server Content-Type: type/subtype Should be of a type and subtype specified by the client’s Accept header Set-Cookie: name=value; options Requests the client to store a cookie with the given name and value HTTP:

48 What the server does, part III
The third part of a server response is the entity body This is often an HTML page But it can also be a jpeg, a gif, plain text, etc. anything the browser (or other client) is prepared to accept HTTP:

49 HTTP response message status line (protocol status code status phrase)
HTTP/ OK\r\n Date: Sun, 26 Sep :09:20 GMT\r\n Server: Apache/ (CentOS)\r\n Last-Modified: Tue, 30 Oct :00:02 GMT\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html; charset=ISO \r\n \r\n data data data data data ... header lines data, e.g., requested HTML file HTTP:

50 The GetResponses program, I
Here’s just the skeleton of the program that provided the output on the last slide: import java.net.*; import java.io.*; public class GetResponses { public static void main(String [ ] args) { try { interesting code goes here } catch(Exception e) { e.printStackTrace(); } } } HTTP:

51 The GetResponses program, II
Here’s the interesting part of the code: URL url = new URL(args[0]); URLConnection c = url.openConnection(); System.out.println("Status line: "); System.out.println('\t' + c.getHeaderField(0)); System.out.println("Response headers:"); String value = ""; int n = 1; while (true){ value = c.getHeaderField(n); if (value == null) break; System.out.println('\t' + c.getHeaderFieldKey(n++) ": " + value); } HTTP:

52 Viewing the response There is a header viewer at (with nasty jittery advertisements) Example 2.3 (GetResponses) in the Gittleman book does the same thing Here’s an example (from GetResponses): % java GetResponses /index.html Status line: HTTP/ OK Response headers: Date: Wed, 10 Sep :26:53 GMT Server: Apache/ (Unix) PHP/4.2.2 mod_perl/ mod_ssl/ OpenSSL/0.9.6e Last-Modified: Tue, 09 Sep :24:50 GMT ETag: "1c1ad f5e2902” Accept-Ranges: bytes Content-Length: Keep-Alive: timeout=15, max= Connection: Keep-Alive Content-Type: text/html HTTP:


Download ppt "Web and the HTTP (HyperText Transfer Protocol)"

Similar presentations


Ads by Google