Distributed Systems - Comp 655 The Web Origin and overview of the web Drill-down on distributed system aspects Communication Processes Naming Synchronization Replication (especially caching) Fault tolerance Security 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Origin of the web CERN (European particle physics lab) Purpose: facilitate document sharing Large user community Geographically dispersed Founder: Tim Berners-Lee Use exploded in late 90’s Graphical user interfaces (Mosaic and descendants) Huge amounts of content Search engines Interactive pages 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Definition of the Web Many standards HTML HTTP DNS URL, URI, URN XML DOM W3C IETF 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 A word about RFCs Standards track Proposed standard Draft standard (at least two independent and interoperable implementations) Internet standard (also has STD number, for example IP is STD-005 and RFC-0791) “Off-track” Experimental Informational Historic(al) See RFC 2026 for details 1/2/2019 Distributed Systems - Comp 655
Yet more words about RFCs Before using an RFC, check the Obsolete RFC list or find it on the Active RFC list I use the RFC index at faqs.org because I find it a bit easier to use than the IETF’s list. Remember, if there’s a conflict, IETF is the authority. 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Overall structure 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 What’s in a web page? Client-side script 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Some web pages are XML 1/2/2019 Distributed Systems - Comp 655
XML document type definition 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Other document types 1/2/2019 Distributed Systems - Comp 655
CGI – early Web interaction 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Problems with CGI Process per request Wide variety in server-side runtime environments Solutions Server-side scripting (JSP, ASP, PHP) Servlets 1/2/2019 Distributed Systems - Comp 655
Problems with browsers Browser-based user interfaces tend to be clunky and limited Solutions: Client-side scripting Applets More recently, AJAX An example: http://www.javarss.com/ajax/j2ee-ajax.html See http://en.wikipedia.org/wiki/AJAX for more information 1/2/2019 Distributed Systems - Comp 655
Server-side scripts and servlets 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Nothing’s perfect What Web technology has big problems with server-side page generation? 1/2/2019 Distributed Systems - Comp 655
Communcation on the web: HTTP TCP-based client/server protocol Create connection Send request Send response Close connection HTTP 1.1 reduces connection overhead with persistent connections 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 HTTP connections non-persistent persistent 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 HTTP request types 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 HTTP request example type path protocol GET /xyzzy HTTP/1.1 Connection: Keep-Alive Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, applicat ion/vnd.ms-excel, application/msword, application/x-shockwave-flash, */* Accept-Language: en-us Host: laptop:1215 If-Modified-Since: Sun, 27 Jun 2004 00:58:28 GMT User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) headers 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 HTTP header types 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Processes Browsers Proxies Apache web server framework 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Browser with plug-in 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Web proxy Most browsers today support ftp. However, proxies are still used for shared caching. 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Apache modules www.apache.org 1/2/2019 Distributed Systems - Comp 655
Server cluster – simple minded 1/2/2019 Distributed Systems - Comp 655
Server cluster - clever 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Web naming URI URL URN 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 URI examples from RFC 2396 ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET Protocol More examples on page 670 1/2/2019 Distributed Systems - Comp 655
Naming – URL – how to access 1/2/2019 Distributed Systems - Comp 655
Naming – URN – true resource identifier RFC 2648 defines a URN namespace for IETF documents. RFC 2141 defines URN syntax. RFC 3406 is a BCP (Best Current Practice) for defining URN namespaces. 1/2/2019 Distributed Systems - Comp 655
Activity – hitting a web page Check your understanding: draw a UML sequence diagram showing the interaction of key software elements when a browser hits a web page containing graphics Assume the web page and the images are on different servers “Classes” in the diagram should include Browser DNS resolver DNS server Server for the page Server for the images 1/2/2019 Distributed Systems - Comp 655
Not much to synchronize … Generally, web clients don’t exchange information with other clients, and servers don’t exchange with other servers Most documents have a single author – few write/write conflicts However, WebDAV is a simple locking and versioning scheme Locks are connection-independent Handling abandoned locks is left to implementation 1/2/2019 Distributed Systems - Comp 655
Replication – client and proxy Many organizations run proxy servers Some proxies can cooperate Virtually all browsers can cache 1/2/2019 Distributed Systems - Comp 655
Replication – server side Server clusters Mirror sites Content delivery networks (CDNs) For example, Akamai 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 CDN operation In Akamai’s CDN, embedded document URLs get resolved to “closest” CDN server 1/2/2019 Distributed Systems - Comp 655
Distributed Systems - Comp 655 Security on the Web NOTE: using both public and private key encryption, for performance reasons NOTE: client has to use same server for entire session If using client authentication 1/2/2019 Distributed Systems - Comp 655