Lecture 13 Dynamic Web Servers & Common Gateway Interface CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger.

1 Lecture 13 Dynamic Web Servers & Common Gateway Interface CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger

2 Web Server  Talks HTTP  Looks at METHOD, URI to determine what the client wants.  For GET, URI often is just the path of a file  relative to some directory on the web server 2 Dynamic Web Servers

3 GET /foo/blah Dynamic Web Servers 3 usrbinwwwetcfoofungif / blah

4 In the good old days... Years ago  WWW was made up of (mostly) static documents.  Each URL corresponded to a single file stored on some hard disk. Today  Many of the documents on the WWW are built at request time.  URL doesn’t correspond to a single file. 4 Dynamic Web Servers

5 Dynamic Documents  Dynamic Documents can provide:  automation of web site maintenance  customized advertising  database access  shopping carts  date and time service …… 5 Dynamic Web Servers

6 Web Programming  Writing programs that create dynamic documents has become very important.  There are a number of general approaches:  Create custom server for each service desired. Each is available on different port.  Have web server run external programs.  Develop a real smart web server SSI, scripting, server APIs. 6 Dynamic Web Servers

7 Custom Server  Write a TCP server that watches a “well known” port for requests.  Develop a mapping from http requests to service requests.  Send back HTML (or whatever) that is created/selected by the server process.  Have to handle http errors, headers, etc. 7 Dynamic Web Servers

8 An Example Custom Server  We want to provide a time and date service.  Anyone in the world can find out the date and time  according to our computer!!!  We don’t care what is in the http request, our reply doesn’t depend on it.  We assume the request comes from a browser that wants the content formatted as an HTML document. 8 Dynamic Web Servers

9  Listen on a well known TCP port.  Accept a connection.  Find out the current time and date  Convert time and date to a string  Send back some http headers (Content-Type)  Send the string wrapped in HTML formatting.  Close the connection. WWW based time and date server 9 loop forever Dynamic Web Servers

10 Another Example: Counter  Keep track of how many times our server is hit each day.  Report on the number of hits our server got on any day in the past!  The reply now does depend on the request.  We have to remember that the request comes from a HTTP client,  so we need to accept HTTP requests. 10 Dynamic Web Servers

11 Time & Date Hit Server  Each request comes as a string (URI) specifying a resource.  Our requests will look like this: /mm/dd/yyyy  An example URL for our service:  We will get a request like: GET /02/10/2000 HTTP/1.1 11 Dynamic Web Servers

12 New code  Record the “hit” in database.  Read request - parse request to month,day,year  Lookup hits for month,day,year in database.  Send back some http headers (Content-Type)  Create HTML table and send back to client.  Close the connection. 12 Dynamic Web Servers

13 Drawbacks to Custom Server Approach  We might have lots of ideas custom services.  Each requires dedicated address (port)  Each needs to include: basic TCP server code parsing HTTP requests error handling headers access control 13 Dynamic Web Servers

14 Another Approach  Take a general purpose Web server (that can handle static documents) and  have it process requested documents as it sends them to the client.  The documents could contain commands that the server understands  the server includes some kind of interpreter. 14 Dynamic Web Servers

15 Example Smart Server  Have the server read each HTML file as it sends it to the client.  The server could look for this: some command  The server doesn’t send this part to the client, instead it interprets the command and sends the result to the client.  Everything else is sent normally. 15 Dynamic Web Servers

16 Example Document Home Page Welcome to include fancygraphic The current time is time. Today is date. Visit our sponser: random sponsor 16 Dynamic Web Servers

17 Real Life - Server Side Includes  Many real web servers support this idea  but not the syntax we’ve shown.  Server Side Includes (SSI) provides a set of commands that a server will interpret.  Typically the server is configured to look for commands only in specially marked documents  so normal documents aren’t slowed down 17 Dynamic Web Servers

18 SSI Directives  SSI commands are called directives  Directives are embedded in HTML comments.  A comment looks like this:  A directive looks like this: 18 Dynamic Web Servers

19 Some SSI Directives  SSI servers keep a number of useful things in environment variables: DOCUMENT_NAME, DOCUMENT_URL  echo : inserts the value of an environment variable into the page. This page is located at. 19 Dynamic Web Servers

20 SSI Directives  include : inserts the contents of a text file.  flastmod : inserts the time and date that a file was last modified. Last modified: 20 Dynamic Web Servers

21 SSI Directives (cont.)  exec : runs an external program and inserts the output of the program. Current users: 21 Danger! Danger! Danger! Dynamic Web Servers

22 More Power  Some servers support elaborate scripting languages.  Scripts are embedded in HTML documents, the server interprets the script:  Microsoft Active Server Pages (ASP) JScript, VBScript, PerlScript  Netscape LiveWire JavaScript, SQL connection library.  There are others... 22 Dynamic Web Servers

23 Server Mapping and APIs  Some servers include a programming interface that allows us to extend the capabilities of the server by writing modules.  Specific URLs are mapped to specific modules instead of to files.  We could write our server as a module and merge it with the web server. 23 Dynamic Web Servers

24 External Programs  Another approach is to provide a standard interface between external programs and web servers.  We can run the same program from any web server.  The web server handles all the http, we focus on the special service only.  It doesn’t matter what language we use to write the external program. 24 Dynamic Web Servers

25 Common Gateway Interface  CGI is a standard interface to external programs supported by most (if not all) web servers.  The interface that is defined by CGI includes:  Identification of the service external program  Mechanism for passing the request to the external program. 25 Dynamic Web Servers


27 CGI Programming  We will focus on CGI programming.  CGI programs are often written in scripting languages (perl, tcl, etc.),  we will concentrate on C 27 CGI

28 CGI Programming 28 CLIENT HTTP SERVER CGI Program http request http response setenv(), dup(), fork(), exec(),... CGI

29 Common Gateway Interface  CGI is a standard mechanism for:  Associating URLs with programs that can be run by a web server.  A protocol (of sorts) for how the request is passed to the external program.  How the external program sends the response to the client. 29 CGI

30 CGI URLs  There is some mapping between URLs and CGI programs provided by a web sever.  The exact mapping is not standardized web server admin can set it up  Typically:  requests that start with /CGI-BIN/, /cgi-bin/ or /cgi/, etc. refer to CGI programs not to static documents. 30 CGI

31 Request CGI program  The web server sets some environment variables with information about the request.  The web server fork() s and the child process exec() s the CGI program.  The CGI program gets information about the request from environment variables. 31 CGI

32 STDIN, STDOUT  Before calling exec(), the child process sets up pipes so that  stdin comes from the web server and  stdout goes to the web server.  In some cases part of the request is read from stdin.  Anything written to stdout is forwarded by the web server to the client. 32 CGI

33 33 HTTP SERVER CGI Program stdin stdout Environment Variables CGI


35 Request Method: Get  GET requests can include a query string as part of the URL: GET /cgi-bin/login?mgunes HTTP/1.0 35 Request Method Resource Name Delimiter Query String CGI

36 /cgi-bin/login?mgunes  The web server treats everything before the ‘?’ delimiter as the resource name  In this case the resource name is the name of a program.  Everything after the ‘?’ is a string that is passed to the CGI program. 36 CGI

37 Simple GET queries - ISINDEX  You can put an tag inside an HTML document.  The browser will create a text box that allows the user to enter a single string.  If an ACTION is specified in the ISINDEX tag, when the user presses Enter,  a request will be sent to the server specified as the ACTION. 37 CGI

38 ISINDEX Example Enter a string: Press Enter to submit your query.  If you enter the string “blahblah”,  the browser will send a request to the http server at that looks like this: GET /search.cgi?blahblah HTTP/1.1 38 CGI

39 What the CGI sees  The CGI Program gets REQUEST_METHOD using getenv : char *method; method = getenv(“REQUEST_METHOD”); if (method==NULL) … /* error! */ 39 CGI

40 Getting the GET  If the request method is GET: if (strcasecmp(method,”get”)==0)  The next step is to get the query string from the environment variable QUERY_STRING char *query; query = getenv(“QUERY_STRING”); 40 CGI

41 Send back http Response and Headers:  The CGI program can send back a http status line : printf(“HTTP/1.1 200 OK\r\n”);  and headers: printf(“Content-type: text/html\r\n”); printf(“\r\n”); 41 CGI

42 Important!  CGI program doesn’t have to send a status line  the http server will do this for you if you don’t.  CGI program must always send back at least one header line indicating the data type of the content (usually text/html ).  The web server will typically throw in a few header lines of it’s own  Date, Server, Connection 42 CGI

43 Simple GET handler int main() { char *method, *query; method = getenv(“REQUEST_METHOD”); if (method==NULL) … /* error! */ query = getenv(“QUERY_STRING”); printf(“Content-type: text/html\r\n\r\n”); printf(“ Your query was %s \n”, query); return(0); } 43 CGI

44 URL-encoding  Browsers use an encoding when sending query strings that include special characters.  Most nonalphanumeric characters are encoded as a ‘%’ followed by 2 ASCII encoded hex digits. ‘=‘ (which is hex 3D) becomes “%3D” ‘&’ becomes “%26”  The space character ‘ ‘ is replaced by ‘+’. Why? (think about project 2 parsing…)  The ‘+’ character is replaced by “%2B” “foo=6 + 7” becomes “foo%3D6+%2B+7” 44 CGI

45 Security!!!  It is a very bad idea to build a command line containing user input!  What if the user submits: “ ; rm -r *; ” grep ; rm -r *; /usr/dict/words 45 CGI

46 Beyond ISINDEX - Forms  Many Web services require more than a simple ISINDEX.  HTML includes support for forms:  lots of field types  user answers all kinds of annoying questions  entire contents of form must be stuck together and put in QUERY_STRING by the Web server. 46 CGI

47 Form Fields  Each field within a form has a name and a value.  The browser creates a query that  includes a sequence of “ name=value” substrings and  sticks them together separated by the ‘&’ character.  If user types in “Mehmet H.” as the name and “none” for occupation,  the query would look like this: “name=Mehmet+H%2E&occupation=none” 47 CGI

48 HTML Forms  Each form includes a METHOD that determines what http method is used to submit the request.  Each form includes an ACTION that determines where the request is made. 48 CGI

49 An HTML Form Name: Occupation: 49 CGI

50 What a CGI will get  The query (from the environment variable QUERY_STRING) will be  a URL-encoded string containing the name,value pairs of all form fields.  The CGI must decode the query and separate the individual fields. 50 CGI

51 HTTP Method: POST  The HTTP POST method delivers data from the browser as the content of the request.  The GET method delivers data (query) as part of the URI.  HTML Form using POST  Set the form method to POST instead of GET. 51 CGI

52 GET vs. POST  When using forms it’s generally better to use POST:  there are limits on the maximum size of a GET query string (environment variable)  a post query string doesn’t show up in the browser as part of the current URL. 52 CGI

53 CGI reading POST  If REQUEST_METHOD is a POST,  the query is coming in STDIN.  The environment variable CONTENT_LENGTH tells us how much data to read. 53 CGI

54 Possible Problem char buff[100]; char *clen = getenv(“CONTENT_LENGTH”); if (clen==NULL) /* handle error */ int len = atoi(clen); if (read(0,buff,len)<0) … /* handle error */ pray_for(!hacker); 54 CGI

55 CGI Method summary  GET:  REQUEST_METHOD is “GET”  QUERY_STRING is the query  POST:  REQUEST_METHOD is “POST”  CONTENT_LENGTH is the size of the query  query can be read from STDIN 55 CGI

