COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 10: Web 10/6/20151Distributed Systems - COMP 655
10/6/2015Distributed Systems - Comp 6552 The Web Origin and overview of the web Drill-down on distributed system aspects –Communication –Processes –Naming –Synchronization –Replication (especially caching) –Fault tolerance –Security
10/6/2015Distributed Systems - Comp 6553 Origin of the web CERN (European particle physics lab) Purpose: facilitate document sharing –Large user community –Geographically dispersed Founder: Tim Berners-LeeTim Berners-Lee Use exploded in late 90’s –Graphical user interfaces (Mosaic and descendants) –Huge amounts of content –Search engines –Interactive pages
10/6/2015Distributed Systems - Comp 6554 Definition of the Web Many standards –HTMLHTML –HTTPHTTP –DNSDNS –URL, URI, URNURLURIURN –XMLXML –DOMDOM W3C IETF
10/6/2015Distributed Systems - Comp 6555 A word about RFCs Standards track –Proposed standard –Draft standard (at least two independent and interoperable implementations) –Internet standard (also has STD number, for example IP is STD-005 and RFC-0791)STD-005RFC-0791 “Off-track” –Experimental –Informational –Historic(al) See RFC 2026 for detailsRFC 2026
10/6/2015Distributed Systems - Comp 6556 Yet more words about RFCs Before using an RFC, check the Obsolete RFC listObsolete RFC list or find it on the Active RFC listActive RFC list I use the RFC index at faqs.org because I find it a bit easier to use than the IETF’s list. Remember, if there’s a conflict, IETF is the authority.RFC index at faqs.orgIETF’s list
10/6/2015Distributed Systems - Comp 6557 Overall structure
10/6/2015Distributed Systems - Comp 6558 What’s in a web page? Client-side script
10/6/2015Distributed Systems - Comp 6559 Some web pages are XML
10/6/2015Distributed Systems - Comp XML document type definition
10/6/2015Distributed Systems - Comp Other document types
10/6/2015Distributed Systems - Comp CGI – early Web interaction
10/6/2015Distributed Systems - Comp Problems with CGI Process per request Wide variety in server-side runtime environments Solutions –Server-side scripting (JSP, ASP, PHP) –Servlets
10/6/2015Distributed Systems - Comp Problems with browsers Browser-based user interfaces tend to be clunky and limited Solutions: –Client-side scripting –Applets –More recently, AJAX An example: ajax.htmlhttp:// ajax.html See for more informationhttp://en.wikipedia.org/wiki/AJAX
10/6/2015Distributed Systems - Comp Server-side scripts and servlets
10/6/2015Distributed Systems - Comp Nothing’s perfect What Web technology has big problems with server-side page generation?
10/6/2015Distributed Systems - Comp Communcation on the web: HTTP TCP-based client/server protocol –Create connection –Send request –Send response –Close connection HTTP 1.1 reduces connection overhead with persistent connections
10/6/2015Distributed Systems - Comp HTTP connections non-persistentpersistent
10/6/2015Distributed Systems - Comp HTTP request types
10/6/2015Distributed Systems - Comp HTTP request example GET /xyzzy HTTP/1.1 Connection: Keep-Alive Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, applicat ion/vnd.ms-excel, application/msword, application/x-shockwave- flash, */* Accept-Language: en-us Host: laptop:1215 If-Modified-Since: Sun, 27 Jun :58:28 GMT User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) typepathprotocol headers
10/6/2015Distributed Systems - Comp HTTP header types
10/6/2015Distributed Systems - Comp Processes Browsers Proxies Apache web server framework
10/6/2015Distributed Systems - Comp Browser with plug-in
10/6/2015Distributed Systems - Comp Web proxy Most browsers today support ftp. However, proxies are still used for shared caching.
10/6/2015Distributed Systems - Comp Apache
10/6/2015Distributed Systems - Comp Server cluster – simple minded
Web Server Clusters A scalable content-aware cluster of Web servers
10/6/2015Distributed Systems - Comp Web naming URI URL URN
10/6/2015Distributed Systems - Comp URI examples from RFC 2396RFC 2396 ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services -- http scheme for Hypertext Transfer Protocol services -- mailto scheme for electronic mail addresses news:comp.infosystems. -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET Protocol More examples on page 670
10/6/2015Distributed Systems - Comp Naming – URL – how to access
10/6/2015Distributed Systems - Comp Naming – URN – true resource identifier RFC 2648RFC 2648 defines a URN namespace for IETF documents. RFC 2141 defines URN syntax. RFC 3406 is a BCP (Best Current Practice) for defining URN namespaces. RFC 2141 RFC 3406
10/6/2015Distributed Systems - Comp Activity – hitting a web page Check your understanding: draw a UML sequence diagram showing the interaction of key software elements when a browser hits a web page containing graphics Assume the web page and the images are on different servers “Classes” in the diagram should include –Browser –DNS resolver –DNS server –Server for the page –Server for the images
10/6/2015Distributed Systems - Comp Not much to synchronize … Generally, web clients don’t exchange information with other clients, and servers don’t exchange with other servers Most documents have a single author – few write/write conflicts However, WebDAV is a simple locking and versioning schemeWebDAV –Locks are connection-independent –Handling abandoned locks is left to implementation
10/6/2015Distributed Systems - Comp Replication – client and proxy Virtually all browsers can cache Many organizations run proxy servers Some proxies can cooperate
10/6/2015Distributed Systems - Comp Security on the Web If using client authentication NOTE: using both public and private key encryption, for performance reasons NOTE: client has to use same server for entire session