HTTP and Web Server 16-1. HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol.

HTTP and Web Server 16-1

HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol. HTTP uses TCP. Do not be confused with HTML, which is a mark up language for describing a web page’s format. 16- 2

Web Page Consists of objects. An object is simply a file – HTML file, JPEG, Audio clip, etc. A Web page has a base html file that may reference several objects via the object’s URL. A web browser will first fetch the web page’s base html file and then fetch the objects referenced in the base file from a web server. 16- 3

Uniform Resource Locator (URL) It is a pointer to an object or a service on the Internet. It has five components: –Protocol or applications –Hostname –TCP/IP port –Path name –File name 16- 4

URL Protocols 16- 5

TCP Connections HTTP 1.0 uses nonpersistent connections. –To fetch each object, the web browser needs to open and establish a TCP connection to the web server. Serial open: Mosaic Parallel open: Netscape Persistent connections is the default mode for HTTP 1.1. –The web browser opens and establishs only one TCP connection to the web server. –Then all objects are transferred on this TCP connection. 16- 6

Why Using Persistent? If we use nonpersistent connections –Parallel open “The easiest way to kill the Internet.” said by a famous network researcher, V.B. Before, we have shown that the packet drop rate in the bottleneck router will grow as the number of competing TCP connections grow. –Serial open Better for congestion control but suffer longer download delay. Why, each object needs to open a new TCP connection, which needs to take 1.5 RTT to finish its 3-way handshaking. When the requested object returns, it is already 2 RTT later. 16- 7

Persistent Connections If we use persistent connections: –The Internet congestion is better controlled. –The download delay is reduced from 2*RTT to only 1 RTT. No need to go through the 3-way handshaking. Only the request/response delay. To further reduce download time, a pipelining technique can be used (the default mode). –Requests can be sent before the response of the previous request returns. –All objects can be requested and returned in one RTT. (if they are small.) The server will close the connection after a certain period of idle time. 16- 8

HTTP Request Message Example: Used by POST request Don’t want to use persistent connections 16- 9

HTTP Response Message Used for cache control 16- 10

Methods (Requests) GET –Retrieve whatever information identified by the URL. HEAD –Identical to GET but does not return the entity body POST –Transfer some information to the server. The function performed depends on the requested URL. PUT –Overwrite a file DELETE –Delete a file OPTION –Ask a server to return its feature and capability TRACE –Each back service for debugging purposes 16- 11

Content Negotiation Because the capability (e.g., network bandwidth, screen resolution and size, CPU power, etc.) of user/browser/agent may be very different, HTTP provides a content negotiation mechanism that can be used to choose the best representation for an object. Server-driven negotiation –Server decides which representation is the best for a client and sends back the chosen representation. Agent-driven negotiation –Server just returns the various representation options. Clients make the decision. Transparent negotiation (to client, from the client’s viewpoint) –A proxy server can do the agent-driven negotiation on behalf of the client after getting a list of representation options from the origin server. 16- 12

Stateless Causes Problems HTTP is a stateless protocol, which makes it robust in presence of crashes. However, in some applications, the web server may want to keep states for a client. –Example 1: Only allow an already authenticated user to access some web pages. –Example 2: Implement shopping carts, which can accumulate (remember) the goods that a user has ordered. –Example 3: Personalized commercial product advertisements (know your preferences, know you personal home page, etc…) Authenticate header and cookies are two solutions. 16- 13

Authentication The client first sends an ordinary request to the server. The server then responds with empty body with “401 Authorization Required” status code. –WWW-Authenticate: header is included which specifies how to do the authentication The client resends the request which this time includes an “Authorization” header. –This header typically will include the username and password information. After obtaining the first object, the client continues to send the cached username and password in subsequent requests. 16- 14

Cookies A client contacts a web site for the first time. The server response includes a Set-cookies: header. –This line include an identification number. –E.g., Set-cookie: 1678453 When the client receives the response message and see Set-cookies, it appends the line to its cookies file. In subsequent requests to the same server, the client will include Cookie:1678453 in its request header. In this manner, the server does not know the user name of the user, bust it does know that this user is the same user that made requests before. –Shopping cart thus can be implemented. 16- 15

Cookies Applications If the server requires authentication but does not want to hassle a user with a password prompt every time. If a server wants to remember a user’s preference so it can provide targeted advertisement. If the server wants to implement a shopping cart or log in a user to his home page automatically when the user contacts the web site such as Yahoo. Problem: Cookies poses problems for nomadic users who access the same site from different machines. Problems: There are also many privacy issues. 16- 16

Cache Control Mechanism To reduce network bandwidth usage and download delay, cache is often used. Cache can be employed either at the client, server, or proxy side. –Client side: very good for “back” button –Server side: multiple cache servers sitting in front of the server to do load balancing –Proxy side: multiple clients share the cache in the proxy server A proxy server acts as both a client and a server. –It receives Web requests from clients. –If its cache has the requested objects, these objects are returned to the clients. –Otherwise, it makes requests to the origin server to download the requested objects into its cache, and then returns the objects to the clients. 16- 17

Expiration Model If a web page can specify its lifetime (i.e., when its validness will expire), then the client can check a cached web page to determine whether it needs to download the web page from the origin server again. There are two headers designed for this purposes: –Expire: Thu, 01 Dec 1994 16:00:00 GMT –Cache-Control: max-age = 3600 16- 18

Conditional GET To save network bandwidth, we do not want a client to retrieve an object again if it is still the same as what is cached on the client. By using the Last-Modified and If- modified-since headers, the GET method becomes the conditional GET. The client will retrieve the object only if it has been modified since the last download. 16- 19

Conditional GET Example 16- 20

E-Tag Method Actually to compare whether two objects are still the same, beside comparing the modification time, there is another method. This method is to compare an entity tag, which can be any number or text string. (e.g., ETAG: “abc”) The E-tag functions as a generation number. Whenever the object is modified at the origin server, the object’s E-tag is changed. When an object is downloaded, its E-tag is also downloaded with it to the client. Later if the client wants to access the same object, its request will include the downloaded E-tag. By comparing the current and submitted E-tags, the server then knows whether it should return a more recent object to the client. 16- 21

No Cache In some situations, we may not want our web pages to be cached. For example: –A commercial web site (e.g., Yahoo) wants to know its real hit count to price its advertisement. Using cache would hide its user. –You are afraid to lose your control over your content. People just keep spreading your content without coming back to your origin site to download new content. Methods that you can use to disable cache: –Cache-control: max-age = 0 –Cache-control: no-cache –Expire: (set the time to a very old time that has passed) –Pragma: no-cache 16- 22

Request Redirection A server can redirect a received request to another server or proxy if it does not have the requested object and it knows that some other server/proxy may have it. This feature can be used as a cache mechanism to reduce network bandwidth usage or download delay. This feature can also be used as a load balancing mechanism for a very busy web site such as yahoo. (yahoo.com -> yahoo3.com): –LAN: in a closet (e.g., a web cluster) –WAN: spread over the globe The method designed for this purpose: –Location: www.yahoo3.com (a response header) Note: This mechanism is a layer-7 mechanism. It is easy to use but its performance is not very good. 16- 23

Referer If you have a web server hosting several web pages, are you interested in knowing who or which web sites have links pointing to your web pages? Knowing this information may be useful: –Know which web sites are taking advantage of your web pages (may be a good or bad thing) –Find a better cache location/hierarchy to serve your web pages, etc. This information of course should be provided by the client (optional”). The method used for this purpose is: –Referer: http://www.csie.nctu.edu.tw/office. 16- 24

Preferred Language When we connect to the Google search engine, why is it so smart to return a Chinese web page? instead of a English web page? Of course, the client needs to provide its language preference explicitly. Otherwise, the Google has no way to know which language you prefer to see. Method designed for this purpose: –Accept-Language: ch, en 16- 25

Dynamic Pages Some web pages are generated dynamically. The content of these pages may depend on the user (e.g., cookies) identity or user input, not totally on the URL. These pages usually thus are not cacheable. There are many ways to generate dynamic web pages. One of them is through CGI (common gateway interface). CGI is not a programming language, but rather a specification that allows a HTTP server to exchange information with other programs (php, perl script, C program, …..) 16- 26

Form and CGI Example Login User Name : Password: 16- 27

Virtual Hosts It is economic for a web hosting company to use just one web server to serve many web sites, rather than one server for each web site. For example: www.abc.com, www.xyz.com, and www.nctu.edu.tw are all served by the same server. To do this, we can configure the web hosting machine with many IP addresses (Although it has only one physical network interface card), then tell the web server which IP address maps to which web site. Of course, you need to configure the DNS server so that it returns one of these IP addresses for www.abc.com. 16- 28

Virtual Hosts Example 16- 29

Access Audio/Video Servers Nowadays most web browsers can support downloading audio and video files. To play audio/video files, however, a web browser usually needs a helper program which understands the complicated format of these audio/video files. The problem with implementing this function is that before the browser contacts the web server, it does not know the format of a download A/V file. Thus it can not fork an appropriate helper program to receive the downloaded file. 16- 30

A Naive Implementation Instead, the browser needs to receive the whole A/V file, then fork the helper program, then transfer the file to the helper program. This design is inefficient, especially when the file is very huge. 16- 31

A Better Implementation for Whole Download To solve the problem, the downloaded file is a short metafile describing the format of the A/V file. The browser, upon knowing the format of the A/V file, forks the helper program and asks the helper program to receive and playback the A/V file. In this case, the A/V file is stored on the same web server and is transferred back to the browser as normal html files (by TCP) by the same web server. 16- 32

A Better Implementation for Whole Download 16- 33

A Better Implementation for Streaming If the media is a stream such as a movie or music clip and the transfer is a streaming transfer (i.e., the helper can start playing the media without receiving the whole file), the normal Web server does not know how to deal with streaming To solve this problem, we need to run up a streaming server and let the helper program to contact the streaming server directly. The streaming server usually uses UDP to send these A/V packets to the helper program. 16- 34

A Better Implementation for Streaming 16- 35

HTTP and Web Server 16-1. HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol.

Similar presentations

Presentation on theme: "HTTP and Web Server 16-1. HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HTTP and Web Server 16-1. HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol.

Similar presentations

Presentation on theme: "HTTP and Web Server 16-1. HTTP HTTP is the protocol used by Web applications to exchange information. HTTP is a simple stateless request-response protocol."— Presentation transcript:

Similar presentations

About project

Feedback