Carnegie Mellon 15-213/18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden (c) 1998 - 2010. All Rights Reserved. All.

Slides:



Advertisements
Similar presentations
Hypertext Transfer PROTOCOL ----HTTP Sen Wang CSE5232 Network Programming.
Advertisements

TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
World Wide Web Basics Original version by Carolyn Watters (Dalhousie U. Computer Science)
HTTP – HyperText Transfer Protocol
Lecture 7, : The Internet, Summer : The Internet Lecture 7: Web Services I David O’Hallaron School of Computer Science and Department.
1 HTTP – HyperText Transfer Protocol Part 1. 2 Common Protocols In order for two remote machines to “ understand ” each other they should –‘‘ speak the.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
Web architecture Dr Jim Briggs Web architecture.
Web Services Nov 26, 2002 Topics HTTP Serving static content Serving dynamic content class27.ppt “The course that gives CMU its Zip!”
Lecture 7 TELNET Protocol & HyperText Transfer Protocol CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger.
The World Wide Web and the Internet Dr Jim Briggs 1WUCM1.
Chapter 2 Application Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July.
1 The HyperText Transfer Protocol: HTTP Nick Smith Stuart Alley Tara Tjaden.
Definitions, Definitions, Definitions Lead to Understanding.
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
Hypertext Transport Protocol CS Dick Steflik.
 What is it ? What is it ?  URI,URN,URL URI,URN,URL  HTTP – methods HTTP – methods  HTTP Request Packets HTTP Request Packets  HTTP Request Headers.
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
HTTP; The World Wide Web Protocol
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
COMP3016 Web Technologies Introduction and Discussion What is the Web?
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
HTTP Protocol Specification
FTP (File Transfer Protocol) & Telnet
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
HyperText Transfer Protocol (HTTP).  HTTP is the protocol that supports communication between web browsers and web servers.  A “Web Server” is a HTTP.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
TCP/IP Protocol Suite 1 Chapter 22 Upon completion you will be able to: World Wide Web: HTTP Understand the components of a browser and a server Understand.
Application Layer 2 Figures from Kurose and Ross
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
Copyright (c) 2010, Dr. Kuanchin Chen1 The Client-Server Architecture of the WWW Dr. Kuanchin Chen.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
Networking Programming --Web Server(I) 1. Outline Web History Web Servers –HTTP Protocol –Web Content –CGI Suggested Reading: –
Web Services Topics HTTP Serving static content
WebServer A Web server is a program that, using the client/server model and the World Wide Web's Hypertext Transfer Protocol (HTTP), serves the files that.
The HyperText Transfer Protocol. History HTTP has been in use since 1990 (HTTP/0.9) HTTP/1.0 was defined in RFC 1945 (May 1996) and included metainformation.
Hui Zhang, Fall Computer Networking Web, HTTP, Caching.
HTTP Hypertext Transfer Protocol
HTTP Hypertext Transfer Protocol RFC 1945 (HTTP 1.0) RFC 2616 (HTTP 1.1)
1 Welcome to CSC 301 Web Programming Charles Frank.
Web Client-Server Server Client Hypertext link TCP port 80.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Appendix E: Overview of HTTP ©SoftMoore ConsultingSlide 1.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
CITA 310 Section 2 HTTP (Selected Topics from Textbook Chapter 6)
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1 Fundamentals.
Web Services November 30, 2004 Topics HTTP Serving static content Serving dynamic content “The course that gives CMU its Zip!” class26.ppt.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
CS 6401 The World Wide Web Outline Background Structure Protocols.
Overview of Servlets and JSP
LURP Details. LURP Lab Details  1.Given a GET … call a proxy CGI script in the same way you would for a normal CGI request  2.This UDP perl.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
HTTP Protocol Amanda Burrows. HTTP Protocol The HTTP protocol is used to send HTML documents through the Internet. The HTTP protocol sends the HTML documents.
Tiny http client and server
HTTP – An overview.
Web Development Web Servers.
HTTP Protocol Specification
Tutorial (4): HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
IS333D: MULTI-TIER APPLICATION DEVELOPMENT
HTTP Hypertext Transfer Protocol
EE 122: HyperText Transfer Protocol (HTTP)
HTTP Hypertext Transfer Protocol
Presentation transcript:

Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden (c) All Rights Reserved. All work contained herein is copyrighted and used by permission of the authors. Contact for permission or for more 23 rd Lecture, 20 April 2010 Web Services

Carnegie Mellon Today HTTP Web Servers Proxies

Carnegie Mellon Client-server web communication Clients and servers communicate using the HyperText Transfer Protocol (HTTP)  Client and server establish TCP connection  Client requests content  Server responds with requested content  Client and server close connection (usually) Current version is HTTP/1.1  RFC 2616, June, 1999 

Web History 1945:  Vannevar Bush, “As we may think”, Atlantic Monthly, July,  Describes the idea of a distributed hypertext system.  A “memex” that mimics the “web of trails” in our minds. “Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

Web History 1989:  Tim Berners-Lee (CERN) writes internal proposal to develop a distributed hypertext system.  Connects “a web of notes with links.”  Intended to help CERN physicists in large projects share and manage information 1990:  Tim BL writes a graphical browser for Next machines.

Web History (cont) 1992  NCSA server released  26 WWW servers worldwide 1993  Marc Andreessen releases first version of NCSA Mosaic browser  Mosaic version released for (Windows, Mac, Unix).  Web (port 80) traffic at 1% of NSFNET backbone traffic.  Over 200 WWW servers worldwide  Andreessen and colleagues leave NCSA to form “Mosaic Communications Corp” (predecessor to Netscape).

Internet Hosts  How many of the 2 32 IP addresses have registered domain names?

Carnegie Mellon Web Content Web servers return content to clients  content: a sequence of bytes with an associated MIME (Multipurpose Internet Mail Extensions) type Example MIME types  text/htmlHTML document  text/plainUnformatted text  image/gifBinary image encoded in GIF format  image/jpegBinary image encoded in JPEG format  video/mpegVideo encoded with MPEG

– , F’09 Static and Dynamic Content The content returned in HTTP responses can be either static or dynamic. Static content: content stored in files and retrieved in response to an HTTP request Examples: HTML files, images, audio clips. Dynamic content: content produced on-the-fly in response to an HTTP request Example: content produced by a program executed by the server on behalf of the client. Bottom line: All Web content is associated with a “file” that is managed by the server.

Carnegie Mellon URL: Universal Resource Locator URL: A name to identify an object managed by a server URLs for static content:     Identifies a file called index.html, managed by a Web server at that is listening on port 80 URLs for dynamic content:   Identifies an executable file called find, managed by a Web server at that has 2 input parameters: – s, which has a value of “all” – q, which has a value of “the net” (note whitespace changed to ‘+’ because of URL syntax rules)

– , F’09 How Clients and Servers Use URLs Example URL: Clients use prefix ( to infer: What kind of server to contact (Web server) Where the server is ( What port it is listening on (80) Servers use suffix (/index.html) to: Determine if request is for static or dynamic content. No hard and fast rules for this. Convention: executables reside in cgi-bin directory Find file on file system. Initial “/” in suffix denotes home directory for requested content. Minimal suffix is “/”, which all servers expand to some default home page (e.g., index.html).

Carnegie Mellon Anatomy of an HTTP Transaction unix> telnet 80Client: open connection to server Trying Telnet prints 3 lines to the terminal Connected to aol.com. Escape character is '^]'. GET / HTTP/1.1Client: request line host: required HTTP/1.1 HOST header Client: empty line terminates headers. HTTP/ OKServer: response line MIME-Version: 1.0Server: followed by five response headers Date: Mon, 08 Jan :59:42 GMT Server: NaviServer/2.0 AOLserver/2.3.3 Content-Type: text/htmlServer: expect HTML in the response body Content-Length: 42092Server: expect 42,092 bytes Server: empty line (“\r\n”) terminates hdr Server: first HTML line in response body...Server: 766 lines of HTML not shown. Server: last HTML line in response body Connection closed by foreign host.Server: closes connection unix>Client: closes connection and terminates unix> telnet 80Client: open connection to server Trying Telnet prints 3 lines to the terminal Connected to aol.com. Escape character is '^]'. GET / HTTP/1.1Client: request line host: required HTTP/1.1 HOST header Client: empty line terminates headers. HTTP/ OKServer: response line MIME-Version: 1.0Server: followed by five response headers Date: Mon, 08 Jan :59:42 GMT Server: NaviServer/2.0 AOLserver/2.3.3 Content-Type: text/htmlServer: expect HTML in the response body Content-Length: 42092Server: expect 42,092 bytes Server: empty line (“\r\n”) terminates hdr Server: first HTML line in response body...Server: 766 lines of HTML not shown. Server: last HTML line in response body Connection closed by foreign host.Server: closes connection unix>Client: closes connection and terminates

Carnegie Mellon HTTP Requests HTTP request is a request line, followed by zero or more request headers, followed by a blank line (CRLF), followed by an optional message body Request line describes the object that is desired  Method is a verb describing the action to be taken (details on next slide)  Request-URI is typically the URL naming the object desired  A URL is a type of URI (Uniform Resource Identifier)  See  HTTP-Version is the HTTP version of request (HTTP/1.0 or HTTP/1.1) Request = Request-Line *(header CRLF) CRLF [message-body] Request = Request-Line *(header CRLF) CRLF [message-body] Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Carnegie Mellon HTTP Requests (cont) HTTP methods:  GET : Retrieve an object (web page, etc)  Workhorse method (99% of requests)  “Conditional GET”if header includes If-Modified-Since, If-Match, etc.  POST : Send data to the server (can also be used to retrieve dynamic content)  Arguments for dynamic content are in the request body  HEAD : Retrieve metadata about an object (validity, modification time, etc)  Like GET but no data in response body  OPTIONS : Get server or object attributes  PUT : Write a file to the server!  DELETE : Delete an object (file) on the server!  TRACE : Echo request in response body  Useful for debugging Request = Request-Line *(header CRLF) CRLF [message-body] Request = Request-Line *(header CRLF) CRLF [message-body] Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Carnegie Mellon HTTP Requests (cont) Request Headers provide additional information to the server  Modify the request in some form Examples:  Host:  If-Modified-Since: Mon, 19 Aug :43:31 GMT  User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7...Safari/533.4  Connection: Keep-Alive  Transfer-Encoding: chunked Message Body contains data objects  Data for POST, PUT, TRACE methods Request = Request-Line *(header CRLF) CRLF [message-body] Request = Request-Line *(header CRLF) CRLF [message-body] header = Header-Name COLON SP Value

Carnegie Mellon HTTP Responses HTTP response format is similar to request, except first line is a “status line”  HTTP-Version is the HTTP version of request (HTTP/1.0 or HTTP/1.1)  Status-code is numeric status, Reason-Phrase is corresponding English  200 OK Request was handled without error  301Moved Permanently Object is in new location  404Not found Server couldn’t find the file Response headers, similar to request headers  Content-Type: text/html  Content-Length: (in bytes)  Location: new-folder/new-object-name.html Response = status-line *(header CRLF) CRLF [message-body] Response = status-line *(header CRLF) CRLF [message-body] status-line = HTTP-Version SP Status-code SP Reason-Phrase CRLF

Carnegie Mellon HTTP Versions HTTP/1.0 (1990) got the web up and running  HTTP/1.0 uses a new TCP connection for each transaction (i.e. request-reply) HTTP/1.1 (1999) added performance features  Supports persistent connections  Multiple transactions over the same connection, amortizing TCP startup  Connection: Keep-Alive  Supports pipelined connections  Multiple transactions at the same time over single a TCP connection  Requires HOST header  Host:  Supports chunked encoding (described later)  Transfer-Encoding: chunked  Adds additional support for caching web objects

Carnegie Mellon GET Request to from Safari GET /index.shtml HTTP/1.1Host: Mozilla/5.0 (Macintosh...Safari...Accept: */*Accept-Language: en-usAccept- Encoding: gzip, deflateCookie: __unam=74ba...webstats- cmu=cmu Connection: keep-aliveCRLF (\r\n) Request-URI is just the suffix, not the entire URL

Carnegie Mellon GET Response From Server HTTP/ OKDate: Tue, 20 Apr :50:27 GMTServer: Apache/ (Unix)... mod_ssl/ OpenSSL/0.9.6m+Keep-Alive: timeout=5, max=99Connection: Keep- AliveTransfer-Encoding: chunkedContent-Type: text/htmlCRLF (\r\n)... Carnegie Mellon University...

Carnegie Mellon HTTPS Provides encrypted channel for Web Requests/Responses Implemented as normal HTTP over Secure Socket Layer (SSL) Abstracts away security issues from Web Server Developer  Need to match certificate with host makes virtual hosting a challenge HTTPS HTTP TCP IP HTTP SSL TCP IP

Carnegie Mellon Today HTTP and Static Content Web Servers Proxies

Carnegie Mellon Serving Content Content is either Static or Dynamic  Static content is generally served from a file system and rarely changes  Dynamic content varies from request to request and is generated per request  Client doesn’t know the difference /images/logos/ps_logo2.png /#q=Carnegie+Mellon Static ContentDynamic Content

Carnegie Mellon Static Content Generally very fast  Limited by IO speed of disk/network Pages are served directly from the file system  Or memory  Or one large file (Facebook’s Haystack) Facebook’s Haystack image server writes many images to one contiguous file, where they can be quickly read by the web server

Carnegie Mellon Dynamic Content – Single Process Web server generates certain pages by itself Pros  [Potentially] very fast Cons  Limited to single language  Upgrade path can be complicated  Bugs can crash the server Who actually does this?  Almost no one  Except Facebook  HipHop compiles PHP to C++ with its own server built in

Carnegie Mellon Dynamic Content – Common Gateway Interface First widely used approach to the problem Similar to executing a shell remotely  Server forks  Executes a program in cgi-bin/ directory with request as input  Server forwards output of program back to Client Pros  Simple interface  Language independent Cons  Slow  Resource intensive Who actually does this?  Almost no one, anymore

Carnegie Mellon Dynamic Content – Embedded Interpreter Language interpreter is loaded into the web server Pros  Works well for Apache ecosystem  Possible to support many languages  Allows web programming in easier, safe languages Cons  Module systems complicate web server architectures  Modules must be written for each web server Who actually does this?  Apache’s mod_py, mod_php, etc.  Many others

Carnegie Mellon Dynamic Content – Out-of-Process Handlers “Handler” program stays alive Communication is performed over sockets Pros  Simple interface  Language independent  Far less process maintenance cost Cons  Slightly slower than Single Process Who actually does this?  FastCGI: Apache, Lighttpd, Nginx, many others  Other: Mongrel2

Carnegie Mellon Today HTTP Web Servers Proxies

Carnegie Mellon Proxies A proxy is an intermediary between a client and an origin server  To the client, the proxy acts like a server  To the server, the proxy acts like a client Client Proxy Origin Server 1. Client request 2. Proxy request 3. Server response 4. Proxy response

Carnegie Mellon Why Proxies? Can perform useful functions as requests and responses pass by  Caching, logging, anonymization, filtering

Carnegie Mellon Putting it Together

Carnegie Mellon Servicing Web Page Request

Carnegie Mellon Client ➙ Proxy GET HTTP/1.1 Host: www-2.cs.cmu.edu User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/ Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/ *;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive CRLF (\r\n) GET HTTP/1.1 Host: www-2.cs.cmu.edu User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/ Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/ *;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive CRLF (\r\n) The browser sends a complete URL for the Request-URI

Carnegie Mellon Proxy ➙ Server GET /~bryant/test.html HTTP/1.1 Host: www-2.cs.cmu.edu User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/ Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/ *;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive CRLF (\r\n) GET /~bryant/test.html HTTP/1.1 Host: www-2.cs.cmu.edu User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/ Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/ *;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive CRLF (\r\n) The proxy sends a Request-URI that is just the object path

Carnegie Mellon Server ➙ Proxy ➙ Client Chunked Transfer Encoding  Alternate way of specifying content length  Each “chunk” prefixed with chunk length, postfixed with CRLF  Used to allow the server to start sending data without knowing the final size of the entire content  Also, headers (like security hashes that must be calculated) can be sent after the dynamic data to which they refer has been already been sent  See HTTP/ OK Date: Mon, 29 Nov :27:15 GMT Server: Apache/ (Unix) mod_ssl/ OpenSSL/0.9.6 mod_pubcookie/a5/ Transfer-Encoding: chunked Content-Type: text/html \r\n HTTP/ OK Date: Mon, 29 Nov :27:15 GMT Server: Apache/ (Unix) mod_ssl/ OpenSSL/0.9.6 mod_pubcookie/a5/ Transfer-Encoding: chunked Content-Type: text/html \r\n

Carnegie Mellon Server ➙ Proxy ➙ Client (cont) 2ec Some Tests Current Teaching: Bryant's teaching Introduction to Computer Systems (Fall '04). Nonexistent file Nonexistent host Fun Downloads Google CMU Yahoo NFL Back to Randy Bryant's home page CRLF (\r\n) 0\r\n \r\n 2ec Some Tests Current Teaching: Bryant's teaching Introduction to Computer Systems (Fall '04). Nonexistent file Nonexistent host Fun Downloads Google CMU Yahoo NFL Back to Randy Bryant's home page CRLF (\r\n) 0\r\n \r\n First Chunk: 0x2ec = 748 bytes Second Chunk: 0 bytes (indicates last chunk)

Carnegie Mellon For More Information Study the Tiny Web server described in your text  Tiny is a sequential Web server (one request at a time)  Serves static and dynamic content to real browsers  text files, HTML files, GIF and JPEG images  220 lines of commented C code  Also comes with an implementation of the CGI script for the addition portal See the HTTP/1.1 standard:   RFCs are standards documents, but still remarkably readable

Carnegie Mellon Summary HTTP Web Servers Proxies Next Time:  Concurrency -- doing lots of stuff at the same time