Download presentation
Presentation is loading. Please wait.
1
Web basics HTTP –http://www.ietf.org/rfc/rfc2616.txt –http://www2002.org/CDROM/refereed/444/ URI/L/Ns –http://www.ietf.org/rfc/rfc2396.txt HTML –http://www.w3.org/TR/html401/
2
HTTP operation Basic (top) vs. with Intermediaries User Agent Request Response Origin Server User Agent Origin Server Request chain Response chain Intermediaries: Proxies, gateways, tunnels
3
HTTP Terminology User Agent (UA): program acting on behalf of user. Resource: data object or service identified by a URI. Origin server (OS): server originating a resource Connection: transport session initiated by UA (but not always direct to OS). Typically TCP or SSL.
4
HTTP Terminology Message: formatted sequence of bytes: –Request: from client to server –Response: from server to client Message = startline + headers + body
5
Request and response messages GET /index.html HTTP/1.1 Host: www.hello.ucsc.edu User-Agent: Mozilla HTTP/1.1 200 OK Content-Length: 45 Content-Language: en-us Content-Type: text/html Hello world
6
Requests GET, HEAD, POST PUT, DELETE OPTIONS, TRACE, CONNECT
7
Common request headers Host (required), User-Agent Referer Authorization If-Modified-Since, Cache-Control Accept[-Language/-Charset/-Encoding]
8
Common response codes 200 OK 301 Moved permanently, 307 Moved tmp 400 Bad request 401 Unauthorized, 403 Forbidden 404 Not found 500 Internal Server Error
9
Common response headers Content-Type, Content-Length, Content- Language Date, Last-Modified, Expires Location [for 3xx responses] Server
10
Response generation Theory (top) vs. practice ResourceVariantInstanceEntityMessage Selection (negotiation, UA optimization) Content encoding (gzip) Instance manipulations (range, delta) Transfer encoding (chunking, encryption) ResourceVariant/InstanceMessage Selection (UA optimization) Understanding the full model is necessary for a good understanding of caching, but we are going to ignore caching
11
Cookies Not part of official HTTP spec, but see: –http://www.ietf.org/rfc/rfc2109.txt –http://www.ietf.org/rfc/rfc2965.txt Adding state to “stateless” protocol OS adds Set-Cookie header to response: –Set-Cookie: sid=113a8fbc;version=1;path=/ UA adds Cookie header to future requests: –Cookie: sid=113a8fbc;$version=1;$path=/
12
URI/L/N Universal Resource… –Name: a persistent identifier (Under development) –Locator: (perhaps transient) locator information Typically: address plus access method –Identifier: either a URN or URL RFC2396 provides syntactic rules that all URIs must obey
13
HTTP URLs http://host:port/path?query –“Fragments” are not strictly part of URLs –Relative URIs Canonicalization –Aggressively avoid false distinctions –But always keep a working URL
14
HTML Do a bit of review on the way frames and Javascript work
15
Problems for Archiving Links obscured by increasing use of Flash, Javascript, DHTML, PDF, Word, … Soft-404’s, 30x’s (Big pain!!) –Great example of non-cooperation Browser-specific content Servers lie about content –E.g., incorrect or missing Content-Type
16
Problems for Archiving Aliasing –Material is copied –Host has multiple names (www.foo.com and foo.com typically the same) –Resource has multiple names (e.g., case- insensitivity)
17
Problems for archiving And this ignores spamming!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.