Chapter 9. HTTP Protocol: Text-based protocol vs TCP and many others Basically a couple of requests (GET and POST) and a much larger variety of replies.

Slides:



Advertisements
Similar presentations
Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
Advertisements

HTTP HyperText Transfer Protocol. HTTP Uses TCP as its underlying transport protocol Uses port 80 Stateless protocol (i.e. HTTP Server maintains no information.
HTTP – HyperText Transfer Protocol
Web basics HTTP – – URI/L/Ns – HTML –
16-Jun-15 HTTP Hypertext Transfer Protocol. 2 HTTP messages HTTP is the language that web clients and web servers use to talk to each other HTTP is largely.
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
How the web works: HTTP and CGI explained
Chapter 2 Application Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July.
Web, HTTP and Web Caching
Hypertext Transfer Protocol Information Systems 337 Prof. Harry Plantinga.
2/9/2004 Web and HTTP February 9, /9/2004 Assignments Due – Reading and Warmup Work on Message of the Day.
Hypertext Transport Protocol CS Dick Steflik.
 What is it ? What is it ?  URI,URN,URL URI,URN,URL  HTTP – methods HTTP – methods  HTTP Request Packets HTTP Request Packets  HTTP Request Headers.
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
HTTP Protocol Specification
FTP (File Transfer Protocol) & Telnet
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
Mail (smtp), VoIP (sip, rtp)
HyperText Transfer Protocol (HTTP).  HTTP is the protocol that supports communication between web browsers and web servers.  A “Web Server” is a HTTP.
CSC 2720 Building Web Applications Getting and Setting HTTP Headers (With PHP Examples)
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
TCP/IP Protocol Suite 1 Chapter 22 Upon completion you will be able to: World Wide Web: HTTP Understand the components of a browser and a server Understand.
2: Application Layer1 CS 4244: Internet Software Development Dr. Eli Tilevich.
Application Layer 2 Figures from Kurose and Ross
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
2: Application Layer1 Internet apps: their protocols and transport protocols Application remote terminal access Web file transfer streaming multimedia.
Week 11: Application Layer1 Web and HTTP First some jargon r Web page consists of objects r Object can be HTML file, JPEG image, Java applet, audio file,…
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
Copyright (c) 2010, Dr. Kuanchin Chen1 The Client-Server Architecture of the WWW Dr. Kuanchin Chen.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
WWW, HTTP, GET, POST, Cookies Svetlin Nakov Telerik Corporation
1 Introductory material. This module illustrates the interactions of the protocols of the TCP/IP protocol suite with the help of an example. The example.
Proxy Lab Recitation I Monday Nov 20, 2006.
1 HTTP EECS 325/425, Fall 2005 September Chapter 2: Application layer r 2.1 Principles of network applications m app architectures m app requirements.
Web Server Design Week 8 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 3/3/10.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Web Server Design Week 4 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/03/10.
CIS679: Lecture 13 r Review of Last Lecture r More on HTTP.
1-1 HTTP request message GET /somedir/page.html HTTP/1.1 Host: User-agent: Mozilla/4.0 Connection: close Accept-language:fr request.
Form Data Encoding GET – URL encoded POST – URL encoded
WWW: an Internet application Bill Chu. © Bei-Tseng Chu Aug 2000 WWW Web and HTTP WWW web is an interconnected information servers each server maintains.
Web Server Design Assignment #2: Conditionals & Persistence Due: 02/24/2010 Old Dominion University Department of Computer Science CS 495/595 Spring 2010.
Appendix E: Overview of HTTP ©SoftMoore ConsultingSlide 1.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP.
CITA 310 Section 2 HTTP (Selected Topics from Textbook Chapter 6)
Web Server Design Week 7 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/24/10.
Web Technologies Lecture 1 The Internet and HTTP.
Web Server Design Week 13 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 4/7/10.
HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol.
Web Server.
5 th ed: Chapter 17 4 th ed: Chapter 21
Overview of Servlets and JSP
LURP Details. LURP Lab Details  1.Given a GET … call a proxy CGI script in the same way you would for a normal CGI request  2.This UDP perl.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 7 Omar Meqdadi Department of Computer Science and Software Engineering University of.
Web Server Design Week 6 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/17/10.
Week 11: Application Layer 1 Web and HTTP r Web page consists of objects r Object can be HTML file, JPEG image, Java applet, audio file,… r Web page consists.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
REST API Design. Application API API = Application Programming Interface APIs expose functionality of an application or service that exists independently.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
Python: Programming the Google Search (Crawling) Damian Gordon.
Hypertext Transfer Protocol (HTTP) COMP6218 Web Architecture Dr Nicholas Gibbins –
Tiny http client and server
The Hypertext Transfer Protocol
IS333D: MULTI-TIER APPLICATION DEVELOPMENT
HTTP Hypertext Transfer Protocol
HTTP Hypertext Transfer Protocol
Presentation transcript:

Chapter 9

HTTP Protocol: Text-based protocol vs TCP and many others Basically a couple of requests (GET and POST) and a much larger variety of replies (200 OK and more, over 40 in total). Replies are divided into 5 groups:  Informational (1xx)  Success (2xx)  Redirection (3xx)  Client Error (4xx)  Server Error (5xx) HTTP messages come with a plain text header with individual entries terminated by \r\n. The entire header is terminated by \r\n. It is followed by an optional contents.

HTTP GET: Getting the RFC defining HTTP Exercise: Find out the meaning of each of the header fields and values.

HTTP 200 OK: This is the response to the previous GET Body to follow:

Reading the Header in Python: The Python code that generates the previous traffic. Spr14Homework]$ python Python (default, Nov , 16:18:42) [GCC (Red Hat )] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib,urllib2 >>> url=" >>> rawreply=urllib2.urlopen(url).read()

URL vs URI: URIs identify a resource. URLs both identify and indicate how to fetch a resource. For example: Characters that are not alphanumerics on in the set “$-_.+!*'(),” must be expressed as % hostport path elementscript paramters

Parsing URLs: Parsing methods These break up a complex URL into its component parts Given a URL, break it apart >>> from urlparse import urlparse, urldefrag, parse_qs, parse_qsl >>> p = urlparse(' >>> p ParseResult(scheme='http', netloc='example.com:8080', path='/Nord%2FLB/logo', params='', query='shape=square&dpi=96', fragment='') >>> scheme://netloc/path;parameters?query#fragment NOTE: This is a tuple, not a dictionary Example: 'path' could point to a servlet and the 'parameters' could tell us which version of the servlet to use – 1.1, 1.2, etc

Parsing Continued: Suppose you want key-value pairs for everything parse_qs vs parse_qsl: the latter preserves order in a dictionary by spitting out an ordered list (l) Anchors are also part of a URL but not sent to the server. They are used by the client to locate a tag in the reply document. You can split them off a URL using urldefrag(): >>> p.query 'shape=square&dpi=96' >>> parse_qs(p.query) {'shape': ['square'], 'dpi': ['96']} >>> u = ' >>> urldefrag(u) (' 'urlparse.urldefrag')

Rebuilding a URL from a ParseResult: It is always nice to go backwards. Since the parse query is the only complicated thing we can build a query from scratch and then create a new ParseResult object with our query >>> p.geturl() ' >>> import urllib >>> import urlparse >>> query=urllib.urlencode({'company':'Nord/LB','report': 'sales'}) >>> query 'report=sales&company=Nord%2FLB' >>> p = urlparse.ParseResult('https','example.com:8080','data/path',None,query,None) >>> p.geturl() '

Relative URLs: If a URL is “relative” then it is relative to some “base”. The base is found from the URL used to fetch the page containing the relative link. Relative URLs are particularly useful if an entire sub-tree of a website is moved to a new parent directory. All the relative links continue to function without any editing.

urlparse.urljoin(): urljoin() combines a relative URL with a base to give a single absolute URL. >>> import urlparse >>> # a relative joined to a path that ends in / gets added on >>> urlparse.urljoin(' 'grants') ' >>> urlparse.urljoin(' 'mission') ' >>> # a relative joined to a path that does not end in / replaces >>> # the last part of the path >>> urlparse.urljoin(' 'grants') ' >>> # relatives can represent “current location” as '.' >>> urlparse.urljoin(' './mission') ' >>> urlparse.urljoin(' './mission') ' >>> # relatives can represent “parent of current” as '..' >>> urlparse.urljoin(' '../news') '

urlparse.urljoin(): urljoin() combines a relative URL with a base to give a single absolute URL. >>> # relatives won't take you out of the top level of the >>> # current website >>> urlparse.urljoin(' '../news') ' >>> # relatives that start with / start at the top level of this website >>> urlparse.urljoin(' '/dev/') ' >>> urlparse.urljoin(' '/dev/') ' >>> # if the relative is an “absolut” then urljoin() simply replaces >>> # one absolute with another >>> urlparse.urljoin(' ' Useful and implies we can always apply urljoin() to the current path plus the next link

Easy Access to HTTP headers: Every time we send off a GET or POST and get a reply we'd like to be able to see the headers. urllib2 has an open() method that can be customized with handlers of your own devising. You accomplish this by subclassing a couple of urllib2 and httplib classes and then overriding methods in this classes so that your methods are invoked instead of the default ones. In order that the correct functionality actually occurs it is important that your methods also invoke the superclass methods.

Opening a link with your own opener: >>> from verbose_http import VerboseHTTPHandler >>> import urllib, urllib2 >>> opener = urllib2.build_opener(VerboseHTTPHandler) >>> opener.open(' GET / HTTP/1.1 Accept-Encoding: identity Host: serendipomatic.org Connection: close User-Agent: Python-urllib/ Response HTTP/ OK Date: Mon, 24 Mar :08:00 GMT Server: Apache/ (Amazon) Vary: Cookie Set-Cookie: csrftoken=HybJc4Gx5GgYsucjhF3f34aF7rMtzmcU; \ expires=Mon, 23-Mar :08:00 GMT; Max-Age= ; Path=/ Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 <addinfourl at whose fp = \ >

verbose_handler.py: Replace standard open() with one of your own specialized to use your versions of the following object methods: #!/usr/bin/env python # Foundations of Python Network Programming - Chapter 9 - verbose_http.py # HTTP request handler for urllib2 that prints requests and responses. import StringIO, httplib, urllib2 # your VerboseHTTPHandler extends urllib2.HTTPHandler class VerboseHTTPHandler(urllib2.HTTPHandler): def http_open(self, req): # overrides HTTPHandler.http_open return self.do_open(VerboseHTTPConnection, req)

verbose_handler.py: Replace standard open() with one of your own specialized to use your versions of the following object methods: #!/usr/bin/env python # Foundations of Python Network Programming - Chapter 9 - verbose_http.py # HTTP request handler for urllib2 that prints requests and responses. import StringIO, httplib, urllib2 class VerboseHTTPResponse(httplib.HTTPResponse): def _read_status(self): s = self.fp.read() print '-' * 20, 'Response', '-' * 20 print s.split('\r\n\r\n')[0] self.fp = StringIO.StringIO(s) return httplib.HTTPResponse._read_status(self) class VerboseHTTPConnection(httplib.HTTPConnection): response_class = VerboseHTTPResponse def send(self, s): print '-' * 50 print s.strip() httplib.HTTPConnection.send(self, s)

GET: In the begin there was only GET. info is an object with many fields. >>> from verbose_http import VerboseHTTPHandler >>> import urllib, urllib2 >>> opener = urllib2.build_opener(VerboseHTTPHandler) >>> info = opener.open(' GET /rfc/rfc2616.txt HTTP/1.1 Accept-Encoding: identity Host: Connection: close User-Agent: Python-urllib/2.7 >>> info.code, info.msg (200, 'OK') >>> sorted(info.headers.keys()) ['accept-ranges', 'connection', 'content-length', 'content-type', 'date', 'etag', 'last-modified', 'server', 'vary'] >>> info.headers['content-type'] 'text/plain' >>>

Why Host header? Originally each IP address supported only one web server at port 80. Over time we needed to put multiple sites at the same (IP,PortNum) pair. So the httpd further demultiplexes the GET request using the Host header field to determine which website the request should be delivered to. This is called virtual hosting.

GET (cont): info is also set up to be read like a file starting to read right after the end of the header >>> info.read(200) '\n\n\n\n\n\n Network Working Group R. Fielding\n Request for Comments: 2616 UC Irvine\n Obsoletes: 2068 ' >>>

Return Codes: 200 range – things are OK 300 range – some kind of redirection 400 range – client problem (invalid URL, for exampe) 500 range – server problem

300s: 301 – Moved Permanently: the 'Location' header field has the new, permanent address. 303 – See Other: In response to an original POST or PUT, go to the 'Location' URL and issue a GET. The original request is considered to have succeeded. 304 – Not Modified: The request came with a page timestamp and the website hasn't changed since that time so the requester already has the most recent copy. 307 – Temporary Redirect: Like 303 but issue the original POST again.

301: >>> info = opener.open(' GET / HTTP/ Response HTTP/ Moved Permanently Location: GET / HTTP/1.1 Accept-Encoding: identity Host: Response HTTP/ OK Date: Mon, 07 Apr :18:06 GMT...

301 (cont): >>> info = opener.open(' GET / HTTP/1.1 Accept-Encoding: identity Host: Connection: close User-Agent: Python-urllib/ Response HTTP/ Moved Permanently content-length: 0 date: Mon, 07 Apr :27:27 UTC location: server: tfe set-cookie: guest_id=v1%3A ; Domain=.twitter.com; Path=/; Expires=Wed, 06-Apr :27:27 UTC Connection: close >>> NOTE: This time urllib2 doesn't follow the redirect; perhaps because it is going to a secure site. NOTE: Obviously Google and Twitter have different opinions of 'WWW'.

Handle You Own Redirect: >>> class NoRedirectHandler(urllib2.HTTPRedirectHandler):... def http_error_302(self,req,fp,code,msg,headers):... return... http_error_301 = http_error_303 = http_error_307 = http_error_ >>> no_redirect_opener = urllib2.build_opener(NoRedirectHandler) >>> no_redirect_opener.open(' urllib2.HTTPError: HTTP Error 301: Moved Permanently >>>

Title: XXX

Title: XXX