Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Model Property Inference and Repair

Similar presentations


Presentation on theme: "Data Model Property Inference and Repair"— Presentation transcript:

1 Data Model Property Inference and Repair
Jaideep Nijjar and Tevfik Bultan {jaideepnijjar, Verification Lab Department of Computer Science University of California, Santa Barbara ISSTA 2013

2 Motivation Web applications influence every aspect of our lives
Our dependence on web applications keep increasing It would be nice if we can improve the dependability of web applications

3 Acknowledgement: NSF Support

4 Three-Tier Architecture
Browser Web Server Backend Database

5 Three-Tier Arch. + MVC Pattern
Browser Web Server Controller Views Model Backend Database

6 Model-View-Controller Pattern
DB Model View MVC has become the standard way to structure web applications Ruby on Rails Zend for PHP CakePHP Struts for Java Django for Python Benefits of the MVC pattern: Separation of concerns Modularity Abstraction Controller Our Focus: MVC-Based Web Applications The user makes a request via her browser. Controllers receive user input and initiate a response by making calls on model objects. Models talk to the database, store and validate data, perform the business logic Controllers generate Views; the UI --- HTML, CSS, Javascript, etc The controller returns the view to the user. The MVC pattern has really become the standard way… Many MVC frameworks out there for the popular web application development languages. For example, Become popular for the many benefits you get from using it: … Our idea: exploit these design principles when doing modeling and verification of MVC web apps In particular, we focus on verifying the Model, or Data Model User Request User Response

7 Data Model Data model is the heart of the web application
It specifies the set of objects and the associations (i.e., relations) between them Using an Object-Relational-Mapping the data model is mapped to the back-end datastore Any error in the data model can have serious consequences for the dependability of the application

8 Our Data Model Analysis Approach
Active Records Data Model (Schema + Constraints) Data Model + Inferred Properties Failing Properties Model Extraction Property Inference Verification Repair Generation Orphan Prevention Transitive Relations Delete Propagation Inference is based on data model schema, extracted from the object-relational mapping ... we infer a set of properties about the data model that we expect to hold Verification component is our earlier work. Including bounded and unbounded verification, etc. The heuristics for inferring properties are described next. Data Model Schema Inferred Properties

9 Outline Motivation Overview of Our Approach Rails Data Models
Basic Relations Options to Extend Relations Formalization of Semantics Verification Techniques Property Inference Property Repair Experiments Conclusions and Future Work

10 A Rails Data Model Example
Role class User < ActiveRecord::Base has_and_belongs_to_many :roles has_one :profile, :dependent => :destroy has_many :photos, :through => :profile end class Role < ActiveRecord::Base has_and_belongs_to_many :users class Profile < ActiveRecord::Base belongs_to :user has_many :photos, :dependent => :destroy has_many :videos, :dependent => :destroy, :conditions => "format='mp4'" class Tag < ActiveRecord::Base belongs_to :taggable, :polymorphic => true class Video < ActiveRecord::Base belongs_to :profile has_many :tags, :as => :taggable class Photo < ActiveRecord::Base ... * * User 1 1 * 0..1 * 1 Photo Profile 1 1 format=.‘mp4’ A sample Active Records specification for a social networking application where users have profiles which store their photo and video files. The photos and videos can be tagged by users, and users can have different roles. (Note: body of Photo class same as Video class) * 1 Taggable Video * Tag

11 Rails Data Models Data model analysis verification: Analyzing the relations between the data objects Specified in Rails using association declarations inside the ActiveRecord files Three basic relations One-to-one One-to-many Many-to-many Extensions to the basic relationships using Options :through, :conditions, :polymorphic, :dependent

12 Three Basic Relations in Rails
One-to-One . One-to-Many class User < ActiveRecord::Base has_one :profile end . class Profile < ActiveRecord::Base belongs_to :user end User 1 0..1 Profile Relationships are expressed by adding a pair of declarations in the corresponding Rails models of the related objects. Rails automatically matches the relation name with the appropriate class by the name given Inheritance relation in Rails the notation ChildClass < ParentClass ActiveRecord::Base class contains all the database-connection functionality class Profile < ActiveRecord::Base has_many :videos end . class Video < ActiveRecord::Base belongs_to :profile end Profile 1 * Video

13 Three Basic Relations in Rails
Many-to-Many class User < ActiveRecord::Base has_and_belongs_to_many :users end class Role < ActiveRecord::Base has_and_belongs_to_many :roles User * * Role

14 Extensions to the Basic Relations
:through Option To express transitive relations :conditions Option To relate a subset of objects to another class :polymorphic Option To express polymorphic relationships :dependent Option On delete, this option expresses whether to delete the associated objects or not For the sake of time, I will only be focusing on :through and :dependent option for the duration of the talk.

15 The :through Option Profile * User Photo
class User < ActiveRecord::Base has_one :profile has_many :photos, :through => :profile end class Profile < ActiveRecord::Base belongs_to :user has_many :photos class Photo < ActiveRecord::Base belongs_to :profile :through option can be set on either the has_one or has_many declaration. This is example of has_many. Dashed line is actually the join of the other two. So for ease of access of book editions through Author class rather than going through the book class. Profile User Photo * 0..1 1

16 The :dependent Option class User < ActiveRecord::Base has_one :profile, :dependent => :destroy end class Profile < ActiveRecord::Base belongs_to :user has_many :photos, :dependent => :destroy Photo Profile User 1 0..1 1 * The User class has the :dependent option set on its :profilerelation. Thus, when a User object is deleted, the associated profile object will be deleted. Further, since the :dependent option is set to :destroy, any relation with the :dependent option set in the Profile class will also have its objects deleted. In this scenario, the delete also gets propagated to the photos associated with that deleted Profile object. If instead of :destroy the :dependent option in User was set to :delete, all related Profile objects would be deleted, but not any Photoss associated with the Profile. We can see how this may lead to dangling references if this option is not set correctly. Note that options can be combined, like using the :dependent option with the :conditions option. :delete directly delete the associated objects without looking at its dependencies :destroy first checks whether the associated objects themselves have associations with the :dependent option set and recursively propagates the delete to the associated objects

17 Formalizing Data Model Semantics
Formal data model: M = <S, C, D> S: Data model schema The sets and relations of the data model, e.g., { Photo, Profile, Role, Tag, Video, User} and the relations between them C: Constraints on the relations Cardinality constraints, transitive relations, conditional relations, polymorphic relations D: Dependency constraints about deletions Express conditions on two consecutive instances of a relation such that deletion of an object from the first instance leads to the other instance We can construct a formal data model representing the objects and their relationships in a RoR application. We define a formal data model to be a 3-tuple <S, C, D> where S is the data model schema, identifying the sets and relations of the data model, C is a set of relational constraints and D is a set of dependency constraints. The relational constraints in C express all the constraints on the relations. Ex) of the :conditions option, which limits the relation to those objects that meet a certain criteria. In this example, Video objects are only related to a Profile object if their format field is mp4. The formalization of this constraint defines a set of objects (oV ′ ) that is a subset of the Video objects (oV ) (corresponding to Video objects with format field “mp4") and restricts the relation between the Profile and Video objects (rP−V ) to that subset. The dependency constraints in D express conditions on two consecutive instances of a relation such that deletion of an object from one of them leads to the other instance by deletion of possibly more objects (based on the :dependent option).

18 Outline Motivation Overview of Our Approach Rails Data Models
Verification Techniques Bounded Verification Unbounded Verification Property Inference Property Repair Experiments Conclusions and Future Work

19 Verification Overview
Active Records Model Extraction Bound Properties Data model + properties bound Bounded Verification Unbounded Verification Alloy Translator SMT-LIB Translator formula formula Alloy Analyzer SMT Solver Input: Active Records + Properties Formal data model: based on formalization just presented Verification component. Bounded verif has extra input, the bound, which is the maximum number of instances of each object to instantiate during verification. Unbounded: for all instances of the formal data model. Z3 Unknown is a possible output (for unbounded only) since theory of uninterpreted functions is undecidable. Yay for counterexamples  sample data model instance for which prop does not hold (nice thing about these solvers, since not all will do that, e.g. theorem provers) Next part of talk: translation from formal data model to input language of solvers instance or unsat Results Interpreter Results Interpreter instance or unsat or unknown Property Verified Property Failed + Counterexample Unknown

20 Sample Translation to Alloy
class User < ActiveRecord::Base has_one :profile end class Profile < ActiveRecord::Base belongs_to :user sig Profile {} sig User {} one sig State { profiles: set Profile, users: set User, relation: Profile lone -> one User } The keyword sig is used in Alloy to define a set of objects. Thus, a sig is created for each class in the input Rails data model. In this example, a sig is declared for the Profile and User classes. We also create a State sig, which we use to define the state of a data model instance. Since we only need to instantiate exactly one State object when checking properties, we prepend the sig declaration with a multiplicity of one. The State sig contains fields to hold the set of all objects and related object pairs. In this example, the State sig contains three fields. The first is named profiles and is a binary relation between State and Profile objects. The field uses the multiplicity operator set, meaning ’zero or more’. In other words, the state of a data model instance may contain zero or more Profile objects. The State sig contains a similar field for User objects. Finally, the one to one relation between Profile and User objects is translated as another field in the State sig. Named relation, it is defined to be a mapping between Profile and User objects. It uses the multiplicity operators to constrain the mapping between Profiles and Users to be one-to-one. IDAVER does this translation automatically.

21 Sample Translation to SMT-LIB
class User < ActiveRecord::Base has_one :profile end class Profile < ActiveRecord::Base belongs_to :user (declare-sort User) (declare-sort Profile) (declare-fun relation (Profile) User) (assert (forall ((p1 Profile)(p2 Profile)) (=> (not (= p1 p2)) (not (= (relation p1) (relation p2) )) ) )) Types in SMT-LIB are declared using the declare-sort command. We use this command to declare types for User and Profile. The relation is translated as an uninterpreted function. Uninterpreted functions are created in SMT-LIB using the declare-fun command. We use this command to declare an uninterpreted function named relation whose domain is Profile and range is User. Since functions can map multiple elements in the domain to the same element in the range, and we instead desire a one-to-one relation relation, we constrain the function to be one-to-one to obtain the desired semantics. This constraint is expressed using the assert command, as shown above.

22 Property Inference: Motivation
Verification techniques require properties as input Effectiveness depends on quality of properties written Manually writing properties is time-consuming, error-prone, lacks thoroughness Requires familiarity with the modeling language We propose techniques for automatically inferring properties about the data model of a web application Inference is based on the data model schema A directed, annotated graph that represents the relations Effectiveness: a verification tool cannot find an error if a property that exposes the error is not provided as input .. Lotta disadvantages of verification is the property-specification aspect. Many errors can be missed, ppl don’t use verif cuz of the manual effort required And then these properties can be checked using a verification technique like the ones I discussed earlier Schema is extracted from the object-relational mapping

23 Data Model Schema Example
Extracted from the ORM of a customer-relation management application. Nodes = object classes. Edges = different relation types Explore the structure of the graph and look for patterns. Based on these patterns, we infer properties.

24 Outline Motivation Overview of Our Approach Rails Data Models
Verification Techniques Property Inference Orphan Prevention Pattern Transitive Relation Pattern Delete Propagation Pattern Property Repair Experiments Conclusions and Future Work

25 Property Inference: Overview
Identify patterns in the data model schema graph that indicates that certain property should hold in the data model Extract the data model schema from the ActiveRecords declarations Search for the identified patterns in the data model schema graph If a match is found, report the corresponding property

26 Orphan Prevention Pattern
For objects of a class that has only one relation: when the object it is related to is deleted but the object itself is not, such an object becomes orphaned Orphan chains can also occur Heuristic looks at all one-to-many or one-to-one relations to identify all potential orphans or orphan chains Infers that deleting an object does not create orphans Orphan: one incoming relation (when delete an object, possible orphans deleted along with it) 1 . . . n . . .

27 Transitive Relation Pattern
Looks at one-to-one or one-to-many relations in schema Finds paths of relations that are of length > 1 If there is a direct edge between the first and last node of the path, infer that this edge should be transitive, i.e. the composition of the others 1 2 . . . n

28 Delete Propagation Pattern
Look at schema with all relations removed that are many-to-many or transitive Remove cycles in graph by collapsing strongly connected components to a single node Assign levels to all nodes indicating its depth in the graph Root node(s), those with no incoming edges, are at level 0 Remaining nodes are at level 1 more than the maximum of their predecessors Propagate deletes if levels between nodes is one We have three different heuristics for inferring three types of properties. The intuition here is that if the difference between the levels of the nodes is greater than one, then there could be other classes between these two classes that are related to both of them and therefore propagating the delete could lead to inconsistencies between the relations. A directed graph is called strongly connected if there is a path from each node in the graph to every other node. 1 2 c 1 2 3 4 level=0 level=1 level=2

29 Repair Generation Check the inferred properties on the formal data model If a property fails we can point out the option that needs to be set in the data model to make sure that the inferred property holds For delete propagates and orphan prevention patterns: Set the dependency option accordingly to propagate the deletes For transitive property: Set the through option accordingly to make a relation composition of two other relations

30 Repair Examples 1 class User < ActiveRecord::Base
2 has_one :preference, :conditions => "is_active=true”, 3 :dependent => :destroy 4 has_many :contexts, :dependent => :destroy 5 has_many :todos, :through => :contexts 6 end 7 class Preference < ActiveRecord::Base 8 belongs_to :user 9 end 10 class Context < ActiveRecord::Base 11 belongs_to :user 12 has_many :todos, :dependent => :delete 13 end 14 class Todo < ActiveRecord::Base 15 belongs_to :context 16 # belongs_to: user 17 has_and_belongs_to_many :tags 18 end 19 class Tag < ActiveRecord::Base 20 has_and_belongs_to_many :todos 21 end

31 Outline Motivation Overview of Our Approach Rails Data Models
Verification Techniques Property Inference Property Repair Experiments Conclusions and Future Work

32 Experiments on Five Applications
LOC Classes Data Model LovdByLess 3787 61 13 Substruct 15639 85 17 Tracks 6062 44 FatFreeCRM 12069 54 20 OSR 4295 41 15

33 A Social Networking Application
LovdByLess: A social networking application Users can write blog entries Users can comment on a friend’s blog entry Friend deletes blog entry (Can also be update for ‘Amy uploaded a picture’) When User deletes blog entry, this is the error that occurs (after) How does error occur: deletePropagates[Blog, Comment] -> comments not deleted when blog entry deleted!! DM error! Comment entries still exist in db! -> further, this causes bug in app. when listing the user's 'recent activity' on her dashboard,

34 A Social Networking Application
A friend writes a blog entry User comments on the friend’s blog entry Friend deletes the blog entry (Can also be update for ‘Amy uploaded a picture’) When User deletes blog entry, this is the error that occurs (after) How does error occur: deletePropagates[Blog, Comment] -> comments not deleted when blog entry deleted!! DM error! Comment entries still exist in db! -> further, this causes bug in app. when listing the user's 'recent activity' on her dashboard,

35 A Failing Inferred Property
deletePropagates property inferred for LovdByLess delete should propagate Blog Comment LovdByLess: social networking application deletePropagates[Blog, Comment] What happens is associated comment not deleted. When UI gets it, it can’t find the associated blog to create the link and proper display string (created on the fly). So it compensates by displaying an empty string.

36 A Todo List Application
Tracks: A todo list application Todos can be organized by Contexts Users can also create Recurring Todos Delete the Context. Then edit the Recurring Todo. Tracks: a todo list application Todos can be organized by Contexts (e.g. School, Work, Home) Also has recurring todos (so let's say you create a recurring todo like "feed the dog" and put it in the Home context.) Delete a Context. Try to edit the recurring todo. Application crash.

37 A Failing Inferred Property
Data Model and Application Error: deletePropagates property inferred for Tracks delete should propagate Context RecurringTodo Tracks, a todo list application. Associated RecurringTodo should have been deleted (Todos are!) but delete isn’t propagated. :dependent option not set correctly. Thus when go to modify the RecurringTodo, application crashes because it isn’t expecting a null reference to Context.

38 False Positives deletePropagates property inferred for FatFreeCRM
But in FatFreeCRM it is valid to have a contact not associated with any account Account Contact delete should propagate Delete account should propagate to associated Contacts

39 False Positives transitive property inferred for LovdByLess
Just not a transitive relation due to semantics of the application User ForumTopic ForumPost It is not necessary that users must post to forum topics that they created, as transitivity requires

40 Experiment Results Application Property Type # Inferred # Timeout
# Failed LovdByLess deletePropagates 13 10 noOrphans transitive 1 Substruct 27 16 2 4 Tracks 15 6 12 FatFreeCRM 32 19 5 OSR 7 0-32 properties inferred --> reasonable (Not too much, overwhelming) 3 timeouts

41 # Data Model & Application Errors # Failures Due to Rails Limitations
Property Type # Data Model & Application Errors # Data Model Errors # Failures Due to Rails Limitations # False Positives deletePropagates 1 9 noOrphans transitive 3 5 7 18 6 12 Of the properties that failed, they can be categorized into four different types 29% false positives  reasonable

42 Conclusions and Future Work
It is possible to extract formal specifications from MVC-based data models and analyze them We can automatically infer properties and find errors in real-world web applications Most of the errors come from the fact that developers are not using the ActiveRecords extensions properly This breaks the modularity, separation of concerns and abstraction principles provided by the MVC pattern We are working on analyzing actions that update the data store We are also investigating verifiable-model-driven development for data models

43 Related Work Automated Discovery of Likely Program Invariants
Daikon [Ernst et al, ICSE 1999] discovers likely invariants by observing the runtime behaviors of a program [Guo et al, ISSTA 2006] extends this style of analysis and applies it to the inference of abstract data types We analyze the static structure of an extracted data model to infer properties Static Verification of Inferred Properties [Nimmer and Ernst, 2001] integrate Daikon with ESC/Java, a static verification tool for Java programs We focus on data model verification in web applications Works related to property inference in particular Instead of observing the runtime behavior of programs, we analyze the static structure of an extracted data model to infer properties …our domain and technique is different.

44 Related Work Verification of Web Applications
[Krishnamurti et al, Springer 2006 ] focuses on correct handling of the control flow given the unique characteristics of web apps Works such as [Hallé et al, ASE 2010] and [Han et al, MoDELS 2007] use state machines to formally model navigation behavior In contrast to these works, we focus on analyzing the data model Formal Modeling of Web Applications WebAlloy [Chang, 2009]: user specifies the data model and access control policies; implementation automatically synthesized WebML [Ceri et al, Computer Networks 2000]: a modeling language developed specifically for modeling web applications; no verification In contrast, we perform model extraction (reverse engineering) Focus has largely been on **navigation** modeling and verification

45 Related Work Verification of Ruby on Rails applications
Rubicon [Near et al, FSE 2012] verifies the Controller whereas we verify the Data Model Requires manual specification of application behavior, whereas we verify manually written properties Limited to bounded verification Data Model Verification using Alloy [Cunha and Pacheco, SEFM 2009] maps relational database schemas to Alloy; not automated [Wang et al, ASWEC 2006] translates ORA-SS specifications to Alloy, and uses the Analyzer to produces instances of the data model to show consistency [Borbar et al, Trends 2005] uses Alloy to discover bugs in browser and business logic interactions * Rubicon requires specification files describing expected behavior for each Controller - Also uses Alloy Analyzer ORA-SS = Object-Relationship-Attribute model for Semi-Structured data Last two works limited to bounded verification ... A different class of bugs than the data model related bugs we focus on

46 Related Work Unbounded Verification of Alloy Specifications using SMT Solvers [Ghazi et al, FM 2011], approach not implemented More challenging domain since Alloy language contains constructs such as transitive closures Specification and Analysis of Conceptual Data Models [Smaragdakis et al ASE 2009, McGill et al ISSTA 2011] use Object Role Modeling to express data model and constraints Focus is on checking consistency and producing test cases efficiently 1. ...such as transitive closures which do not appear in the data models we extract 2. ORM is a popular conceptual modeling language. Goals of these works: - Consistency: checking whether any instances of a data model exist - Automatic generation of test data that respect constraints expressed in ORM Our focus is on extracting the data model from an existing application and performing verification (that desirable properties hold.) as opposed to test case generation from a model (reverse engineering vs forward engineering)

47 Questions?


Download ppt "Data Model Property Inference and Repair"

Similar presentations


Ads by Google