CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan
Web Application Modeling So far we have discussed various approaches for modeling, analyzing and verifying web applications We have seen two main approaches: –Model driven development approaches where the application is specified or enhanced using a formal model For example: WebML, navigation state machines –Reverse engineering approaches where a formal model is extracted fro the application For example: Extracting a state machine model for navigation by analyzing the links that are inserted in web pages
Model Driven Development Approach Model driven development approach enables –Specification of the behavior of the application at a high level of abstraction, making it easier to develop applications. –The actual implementation can be automatically or semi- automatically generated from the high level models –Separation of concerns can be achieved by specifying different concerns about the application (such as the data model or the navigation constraints) using different specification mechanisms However, model driven development requires the developers to learn and use the modeling languages There is a concern about the mapping between the actual implementation and model (they have to maintained together)
Reverse Engineering Approach Reverse engineering approaches does not require developers to learn a new specification language Since reverse engineering approaches extract a model directly from the code, there is no maintenance issues (when the application changes, we can extract a new model) However, reverse engineering is hard: –Extracting sound models using static analysis can lead to very approximate models that do not contain much information or can be undecidable for more precise models –Extracting models by observing runtime behavior is not sound and cannot be used to guarantee correctness
How About a Language Based Approach? Both model driven development and reverse engineering approaches can be considered software engineering approaches Another approach would be to use a programming language based approach Can we model the problems that appear in Web applications in programming language terms and possibly suggest solutions using programming language mechanisms (such as type checking)?
A Language Based Approach Today I will discuss the following paper which presents a language based approach for modeling and analyzing navigation problems in Web applications: “Modeling Web Interactions and Errors,” S. Krishnamurthi, R. B. Findler, P. Graunke, and M. Felleisen.
Web Applications A Web program’s execution consists of a series of interactions between a Web browser and a Web server When a browser submits a http request whose URL points to a Web program, the server invokes the program with the request using some protocol –GCI, Java servlets, ASP.NET It then waits from the program to terminate and turns the program’s output into a response that the browser can display, i.e., it returns a Web page. Each such program is called a “script” since they only read some inputs and write some output
Web Applications This simple request-response style programming using scripts makes design of multi-stage Web interactions difficult A multi-stage interactive Web program consists of many scripts each handling one request –These scripts communicate with each other via external media since they must remember the earlier part of the interaction –Forcing scripts to communicate this way causes problems since they lead to unstated and easily violated invariants
Web Applications Use of the Web browser creates further complications –A browser is designed to let a user navigate a web of hyperlinked nodes –When a user uses this power to navigate an interaction with an application many unexpected scenarios can happen User can backtrack to an earlier stage of the interaction User can duplicate a page and generate parallel interactions
A Language Based Approach We will first describe a formal model that captures the essence of Web application behavior Then we will investigate the use of language based techniques to address the navigation problems
A Formal Model A Web application (W) consists of –a server (S) and –a client (C) Server consists of –a storage, and –a dispatcher Dispatcher contains –a table (P) of programs that associates URLs with programs and –an evaluator that applies programs from the table to the submitted form
A Formal Model Every page is simply a form (F) that contains –the URL to which the form is submitted, and –a set of form fields A field name is a value that can be edited by the client The client stores the –the current form and –the sequence of all the forms that have been visited by the client so far (cached pages)
Web Program Behavior The behavior of the Web program is described using three types of actions: –Fill-form: This corresponds to client editing values of fields in the current form. The modified form becomes the current form and is added to the cache –Switch: Makes a form from the cache the current form –Submit: dispatches on the current form’s URL to find a program in the table P. This program accesses the server state and the current form and updates the server state and generates a new form which becomes the current form
A Simple Web Programming Language A simple functional programming language can be specified to characterize the basic operations that are required to write a web application: –Extract a field from a form –Construct a new form –Modify fields of a form To allow stateful programming we can introduce read and write operations that allow read and write access to the server storage
Navigation Problems Two navigation problems can be characterized formally in this model: –Script communication problem: Where a script accepts a different type of form than what is delivered to it. For example, the script tries to access a field that does not exist in the form –HTTP observer problem: Since the http protocol does not allow a proper implementation of the observer pattern (which enables independent observers to be notified of state changes) a page received by the client can become outdated when the MVC model changes in the server.
Script Communication Problem and Types The main issue in script communication problem is type mismatch between the forms generated and consumed by different scripts Since these scripts are loosely coupled programs, there is no standard type checking mechanism that can be used to make sure that these type mismatches do not happen Checking all scripts together is not feasible since they are developed incrementally and may reside on different Web servers and may be written using different programming languages
An Incremental Type System for Web Applications The proposed solution is the following: –When the Web server receives a request for a URL that is not already in its table, it installs the relevant program –Before installing the relevant program it checks that there is no type mismatch with the input form and the installed program (internal consistency check) –Furthermore it generates type constraints that this new installed program imposes on other programs in the server that it interacts (there become external consistency checks) If either the internal or external check fails the program is rejected resulting in an error
A Simple Typed Web Programming Language The simple functional Web programming language can be extended with types by requiring type declarations for function arguments The type system for this language shows how external type checking can be done –While traversing the program, the type system generates a set of type constraints on external programs –Each constraints state a condition such as: a program associated with a particular URL should consume Web forms of a particular type
Solving Script Communication Problem with Type Checking Using type checking with this incremental system it can be guaranteed that –scripts do not get stuck when they are processing appropriately typed forms –Server does not apply the scripts to forms with wrong types
Solving the http observer problem with timestamps Server keeps track of the number of processes submissions (this represent time) The external storage is changed so that it maps locations to values + timestamp for the last write The server also maintains the set of all storage locations read or written during the execution of a script (called a carrier set CS) –When severs sends a page to the consumer, it adds the current time stamp and this set of locations as an extra hidden field
Solving the http observer problem with timestamps A form with carrier set CS and time stamp T submitted to a server is out of date if and only if any of the locations in CS have a timestamp at the server that is greater than T A runtime error can be generated when out of date forms are submitted preventing execution of scripts with out of date data –This approach solves the example problem of booking an unintended flight However, this approach can also generate false positives (for example a page counter value may make the form out of date) –So the programmers must specify which reads or writes are relevant, and an error is generated only when a relevant field is out of date