IT Engineering Instructor: Rezvan Shiravi Rezvan_shiravi@yahoo.com HTTP IT Engineering Instructor: Rezvan Shiravi Rezvan_shiravi@yahoo.com
Questions Q.1) How do web server and client browser talk to h each other? Q.2) What is the common protocol? Q.3) How are resources identified? Q.4) What are requests & responses? Q.5) Can/Should server know its clients? Q.6) Who can influence the communication between server & client? Q.7) Is everything public?
Common Protocols In order to 2 remote machines "understand" each other they should ‘‘speak the same language’’ coordinate their ‘‘talk’’ The solution is to use protocols Examples: FTP – File Transfer Protocol SMTP – Simple Mail Transfer Protocol NNTP – Network News Transfer Protocol HTTP – HyperText Transfer Protocol
Internet Protocol Layers The Internet protocols are generally divided into four layers: Application - e.g., HTTP Transport - e.g., TCP Network - e.g., IP Link - e.g., Ethernet What are the 4 layers?
HTTP Hyper-Text Transfer Protocol A web browser is an HTTP client An application protocol for sending hypertext (e.g. HTML) Can transport non-hypertext data Have you download files via http? A web browser is an HTTP client There are HTTP clients other than browsers - technical term is user agent A web server is an HTTP server Other HTTP Clients?
Key HTTP Terms User agent: Browsers, spiders Client and Server systems Connection: TCP Message Resource: Object or service. Identified by URI/URN/URL Entity: Representation of a resource with a header and body URI/URN/URL?
An HTTP Session A basic HTTP session has 4 phases: Client opens the connection (a TCP connection) Client makes a request Server sends a response Server closes the connection
HTTP Transaction
Stateless Protocol HTTP is a stateless protocol Once a server has delivered the requested data to a client, the server retains no memory of what has just taken place (even if the connection is keep-alive) High performance & Low complexity What are the difficulties in working with a stateless protocol? How would you implement a site for buying some items? So why don’t we have states in HTTP? Stateless Protocol?
TCP Connection for a HTTP connection Before systems can exchange HTTP messages, they must establish a TCP connection. Steps 1, 2, and 3 Once the TCP connection is available, the client sends the server an HTTP request. The final two steps, 6 and 7, show the closing of the TCP connection.
Persistent Connections… HTML pages often contain a number of elements (e.g. images) that will all be fetched from the server Each one requires a separate HTTP request Opening a TCP connection for each request is slow and taxes the client and server TCP uses a three-way handshake when establishing a connection, so there is significant latency in establishing a connection client sends SYN, server replies ACK/SYN, client responds with ACK Persistent connections: multiple requests on one TCP connection
...Persistent Connections The client uses the Connection header to request that the connection be kept open Connection: Keep-Alive After the last request, client requests connection to be closed Connection: Close Otherwise, server will keep it open "forever"
Persistent Connections Persistent Connections: A client can issue many HTTP requests over a single TCP connection.
Pipelining HTTP Pipelining allows the user agent to issue requests for multiple items without waiting for responses to arrive Overlap requests and responses Supported in Firefox & IE ver. 7.0
HTTP Request Proxy Server HTTP Request HTTP Response HTTP Response http://www.therationaledge.com HTTP Response Web Server www.therationaledge.com:80 File System
Department Proxy Server University Proxy Server Iran Proxy Server Web Server www.therationaledge.com:80
Intermediate HTTP Systems… Proxy: For performance enhancements. Caches pages. Gateway: For security. Used as firewall boundary. Tunnel: Simple relay Origin server: Holds original content
Intermediate HTTP Systems Proxy, Gateway?
Resources A resource is a chunk of information A resource can be A file A dynamically created page What we see on the browser can be a combination of some resources Each resource must be identified uniquely URI (Uniform Resource Identifier) Common practical URI is URL Uniform Resource Locator
Uniform Resource Identifiers Also called URL, where L=="Locator" Three parts: protocol: http, ftp, telnet, gopher, file host: a DNS host name or IP address file: Unix syntax file name Protocol and host are optional File? Optional?
URL <protocol(scheme)>://<user>:<pass>@<host>:<port>/<path>?<query>#<frag> http://www.aut.ac.ir ftp://kernel.org/pub http://www.bing.com/search?q=web&go=&qs=n&form=QBLH&pq=web&sc=0-0&sp=-1 file://c:\windows\ file:///home/bahador/work
URL Scheme: the application layer protocol HTTP: The web protocol HTTPS: Secure HTTP FTP: File transfer protocol File: Access to a local file mailto: Send email to given address …
URL Path: the path of the object on the specified host with respect to web server (document) root directory E.g. web server root directory: /var/www/ http://www.example.com/1.html /var/www/1.html http://www.example.com/1/2/3.jpg /var/www/1/2/3.jpg
URL Query: a mechanism to pass information from client to active pages or forms Fill information in a university registration form Ask Google to search a phrase Starts with “?” “&” is the border between multiple parameters http://www.example.com/submit.php?name=ali&famility=karimi
URL Frag: A name for a part of resource Handled by browser A section in a document http://www.example.com/paper.html#results Handled by browser Browser gets whole resource (doc) from sever In display time, it jumps to the specified part
HTTP Protocol A complete request contains one or more lines of text First line (request-line) contains the request: Method Path Protocol GET /index.html HTTP/1.1 Additional lines are headers (later) The response consists of a status line, headers, and the content The first line (the status-line) looks like: Protocol status-code message HTTP/1.1 200 OK
HTTP Request Methods GET: gets a document HEAD: like GET BUT just headers POST: Transfers a block of data with request DELETE: To remove the resource PUT: Add message body as the specified resource to server TRACE: Server echo back the received message For troubleshooting & debugging
HTTP Request Methods OPTIONS: Request the list of supported methods by server on the resource CONNECT: Create HTTP tunnel Client asks server (which is proxy/gateway) to create TCP connection to the specified destination After TCP connection establishment, all data sent on TCP connection between client & server are copied to the established new TCP connection
GET Request A request to get a resource from the Web The most frequently used method The request has no message body Parameters can be sent in the request URL http://www.google.com/search?sourceid=navclient-ff&ie=UTF-8&rls=GGGL,GGGL:2006-30,GGGL:en&q=iran
HEAD Request A HEAD request asks the server to return the response headers only, and not the actual resource (No message body) Useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth File size Testing hypertext links for Validity Accessibility Recent modifications
Post Request… POST request can send data to the server POST is mostly used in form-filling The data filled into the form are translated by the browser into a special format and sent to the program on the server using the POST command
Post Request (cont.) Sending a block of data with the request, in the message body Usually extra headers to describe this message body like Content-Type: and Content-Length URL is a URL of a program to handle the sent data, not a simple html file The HTTP response is normally the output of a program, not a static file
Post Example Here's a typical form submission, using POST: POST /path/register.cgi HTTP/1.0 From: frog@ceit.aut.ac.ir User-Agent: MyOwnHTTPTool-MyBrowser/1.0.0.8.9 Content-Type: application/x-www-form-urlencoded Content-Length: 35 home=Tehran+No.13&favorite+flavor=flies Content-Type?
Common HTTP Statuses… Successful 2xx Redirection 3xx Client Error 4xx 200 - OK 201 - Created 202 - Accepted Redirection 3xx 302 - Temporarily moved (redirect) Client Error 4xx 404 - Not Found 401 - Unauthorized
Common HTTP Statuses (cont.) Server Error 5xx 500 Internal Server Error Common with broken CGI programs 502 Bad Gateway The server, while acting as a gateway or proxy, received an invalid response from the upstream server it accessed in attempting to fulfill the request. 503 Service Unavailable The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. Bad Gateway?
Sample Request GET /tbassemi/ HTTP/1.1 Host: pages.google.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive Cookie: __utmz=76351879.1158993258.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utma=76351879.1503535206.1158993258.1159624300.1159625890.4; PHPSESSID=g7h5h7jfav0gvjcq9mai5sbao3 http request? q? deflate?
Sample Response HTTP/1.1 200 OK Proxy-Connection: Keep-Alive Connection: Keep-Alive Date: Tue, 10 Oct 2006 11:17:09 GMT Server: Apache/2.0 X-Powered-By: PHP/5.1.2 Content-Length: 3489 Content-Type: text/html; charset=UTF-8 <html> <head> <meta http-equiv="Content-Language" content="en-us"> <title>Personal Page</title> </head> …
HTTP Request Three components: Request line: method path version Headers (optional) General headers Request headers - info about client Entity headers - info about document being sent Blank line (CRLF) Entity body (optional)
Format of a Request Entity Body header lines method sp URL sp version cr lf header : value cr lf header lines header : value cr lf cr lf Entity Body
Request Example method request URL version headers GET /index.html HTTP/1.1 Accept: image/gif, image/jpeg User-Agent: Mozilla/4.0 Host: ce.aut.ac.ir:80 Connection: Keep-Alive [blank line here] version headers
HTTP Response Three components: Status line: protocol status-code status-message Headers (optional) General headers Request headers - info about client Entity headers-info about document being sent Blank line (CRLF) Entity body (optional) Headers?
Redirection Sometimes the web server redirects the client to visit another page instead Different from HTML redirect with META tag <meta http-equiv="refresh" content="10; url=http://www.aut.ac.ir" /> This is just a normal HTTP response Status code 3xx Location header tells us the new location The client requests the new location
Sample Redirection Response HTTP/1.1 302 Moved Temporarily Server: Netscape-Enterprise/6.0 Date: Tue, 05 Apr 2005 20:02:38 GMT Location: http://www.aut.ac.ir/ Content-length: 0 Content-type: text/html Connection: close
Cookies
Cookies HTTP is a stateless protocol Server does not remember its clients How to personalize pages ( personal portal)? Use http header: Client-ip, From, … Is not usually sent by browsers Find client IP address from TCP connection NAT Network Address Translation (NAT) is the process of modifying IP address information in IPv4 headers while in transit across a traffic routing device
Identifier assigned by server to each user/session Cookies are simply chunks of data sent from a web server to a client with the expectation that the client will resubmit the data in any subsequent request to the server The client browser isn’t intended to read or understand the data - it just passes it back
Cookies Types How it works Session cookies: To identify a session Persistent cookies: To identify a client How it works Server asks client to remember the ID Set-Cookie header in response message Client gives back the ID to server in each request Cookie header in request messages Server customizes responses according to cookie Limitation: JavaScript code may attempt to read the contents of cookies - a security concern
Cookies If the client "supports cookies", then it sends Cookie header in subsequent requests Specifies name-value pair Cookie: name=“B Assemi";phone="21-6454"; PHPSESSID=g7h5h7jfav0gvjcq9mai5sbao3 A server can set any number of cookies in a response The client sends them all in subsequent requests
Cookie Example Server response: Client request: HTTP/1.0 200 OK Set-Cookie: acct=04382374;domain=.amazon.com;Expires=Sun, 16-Feb-2006 04:38:14 GMT;Path=/ Client request: GET /order.pl HTTP/1.0 Cookie: acct=04382374
Caching
proxy server
Multiple proxies
Cache servers Cache servers are proxy servers that relay requests and responses. Keep a local copy of any responses they receive.
Caching Benefits Some things shouldn’t be cached Reduce redundant data transfer Reduce network bottleneck Reduce load on server Reduce delay Some things shouldn’t be cached Client and server cooperate on caching
Caching Some things may be cached for a limited time Objects life-time specified by server Expire header: Absolute expiration time Cache-Control: max-age: Relative expiration time If requested object is not expired Cache server gives it to client If requested object is expired Its freshness must be checked