Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In collaboration with IBM Model-Based Crawling of Rich Internet Applications 1 CASCON 2011
Software Security Research Group (SSRG), University of Ottawa & IBM SSRG Members University of Ottawa Prof. Guy-Vincent Jourdan Prof. Gregor v. Bochmann Suryakant Choudhary(Master student) Emre Dincturk(PhD student) Khaled Ben Hafaiedh(PhD student) Seyed M. Mir Taheri(PhD student) Ali Moosavi (Master student) In collaboration with Research and Development, IBM ® Rational ® AppScan ® Enterprise Iosif Viorel Onut (PhD) 2
Software Security Research Group (SSRG), University of Ottawa & IBM Outline Introduction Problem Statement and Our Approach First Crawling Strategy: Hypercube Second Crawling Strategy: Menu Implementation and Results Conclusion 3
Software Security Research Group (SSRG), University of Ottawa & IBM © 2010 IBM Corporation Project Presentation Collaborators – IBM and uOtttawa Scope –Crawling RIA applications IBM Products –Rational AppScan Enterprise –Rational Policy Tester –Rational AppScan Standard Timeline –3 years in –July-August ’10 AppScan Enterprise prototype –June-July’ 11 AppScan Enterprise prototype 2
Software Security Research Group (SSRG), University of Ottawa & IBM © 2010 IBM Corporation Where you can find us? Technology Showcase –One of the 6 innovation impact projects Website –
© 2010 IBM Corporation IBM Rational AppScan Enterprise Edition Product overview IBM Security Solutions
© 2010 IBM Corporation IBM Rational AppScan Suite – Comprehensive Application Vulnerability Management 7 REQUIREMENTS CODE BUILD PRE-PROD PRODUCTION QA AppScan Standard AppScan Source AppScan Tester Security Requirements Definition AppScan Standard Security / compliance testing incorporated into testing & remediation workflows Security requirements defined before design & implementation Outsourced testing for security audits & production site monitoring Security & Compliance Testing, oversight, control, policy, audits Build security testing into the IDE Application Security Best Practices – Secure Engineering Framework Automate Security / Compliance testing in the Build Process SECURITY AppScan Build AppScan Enterprise AppScan Reporting Console AppScan onDemand Dynamic Analysis/Blackbox – Static Analysis/Whitebox -
© 2010 IBM Corporation IBM Security Solutions AppScan Enterprise Edition capabilities Large scale application security testing Client-server architecture designed to scale Multiple users running multiple assessments Centralized repository of all assessments Scheduling and automation of assessments REST-style API for automation and integrations Enterprise visibility of security risk High-level dashboards Detailed security issues reports, advisories and fix recommendations Correlation of results discovered using dynamic and static analysis techniques Over 40 compliance reports like PCI, GLBA, SOX Governance & collaboration User roles & access permissions Test policies Issue management Defect tracking systems integration 8
© 2010 IBM Corporation IBM Security Solutions 9 AppScan Enterprise Information Security Schedule and automate assessments Conduct assessments with AppScan Standard and AppScan Source and publish findings for remediation and trending Build automation Source code analysis for security issues as part of build verification Publish findings for remediation and trending Tools: AppScan Standard Edition AppScan Source Edition AppScan Enterprise Workflows Tools: AppScan Source for Automation AppScan Standard Edition CLI Compliance Officers Review compliance reports Management Review most common security issues View trends Assess risk Evaluate progress Development & QA Conduct assessments View assessment results Remediate issues Assign issue status
© 2010 IBM Corporation IBM Security Solutions 10 View detailed security issues reports Security Issues Identified with Static Analysis Security Issues Identified with Dynamic Analysis Aggregated and correlated results Remediation Tasks Security Risk Assessment
© 2010 IBM Corporation IBM Security Solutions 11 Obtain a high-level view of the security of your applications Compare the number of issues across teams and applications Identify top security issues and risks View trending of the number of issues by severity over time Monitor the progress of issue resolution
© 2010 IBM Corporation IBM Security Solutions 12 Assess regulatory compliance risk Over 40 compliance reports, including: –The Payment Card Industry Data Security Standard (PCI) –VISA CISP –Children Online Privacy Protection Act (COPPA) –Financial Services (GLBA) –Healthcare Services (HIPAA) –Sarbanes-Oxley Act (SOX)
Software Security Research Group (SSRG), University of Ottawa & IBM © 2010 IBM Corporation 13 Problem Statement Challenges developing the AppScan family of products: –Application languages/frameworks are constantly evolving –Deep static or dynamic analysis involves heavy computation Product Challenge: – More clients are moving to RIA (Rich Internet Applications) – Crawling/analyzing RIA applications is challenging due to their dynamic nature
Software Security Research Group (SSRG), University of Ottawa & IBM © 2010 IBM Corporation Problem Statement
Software Security Research Group (SSRG), University of Ottawa & IBM © 2010 IBM Corporation Problem Statement
Software Security Research Group (SSRG), University of Ottawa & IBM Introduction: Crawling Exploring a web application automatically –Discovering existing pages and their connections –The result is a model of the application Important for searching and testing applications 16
Software Security Research Group (SSRG), University of Ottawa & IBM Introduction: Traditional Web Applications Navigation is achieved using the links (URLs) Synchronous communication 17
Software Security Research Group (SSRG), University of Ottawa & IBM Introduction : RIAs More interactive and responsive web apps –Page changes via client-side code (JavaScript) –Asynchronous communication 18
Software Security Research Group (SSRG), University of Ottawa & IBM Introduction : Crawling Crawling extracts a “model” of application –States are “distinct” pages –Transitions are triggered by event executions 19
Software Security Research Group (SSRG), University of Ottawa & IBM Introduction: Crawling RIAs Events: JavaScript events such as onclick Reset: Going back to initial page by loading the URL of the application Extract the complete model by trying to discover every state as soon as possible –Minimize the number of resets –Minimize the number of event executions 20
Software Security Research Group (SSRG), University of Ottawa & IBM Our Approach: Model-Based Crawling When the crawl starts, we know nothing about the application except the initial state. Designing an efficient crawling strategy is not possible if we have no anticipation about the structure of the application. Use the anticipation (expected structure of the application) to guide the crawling During crawling, revise the strategy as we learn more about the application. 21
Software Security Research Group (SSRG), University of Ottawa & IBM Our Approach: Model-Based Crawling What kind of anticipation about the structure of the application ? Structural properties that we expect the application to have A meta-model representing a class of application models that have common behavioral characteristics How to come up with a meta-model ? Intuition and synthesis based on looking at a large number of web applications We have defined two meta-models: Hypercube meta-model and Menu meta-model How to select a meta-model for the crawl of an application ? By intuition after informal browsing through the application 22
Software Security Research Group (SSRG), University of Ottawa & IBM Our Approach: Model-Based Crawling - Strategy Designing a model-based crawling strategy Select a meta-model Design an efficient strategy to crawl applications that follow the meta-model. - Consider two phases: State Exploration: Give priority to exploring new states Transition Exploration: Traverse all transitions (not covered in the state exploration phase) Define the strategy for handling violations - if the discovered model of the application does not follow all the rules of the meta-model Why two phases ? Since in many cases the application is too complex to be crawled completely, it is important to explore, in the given time, as many states as possible. Unless we have explored all transitions in the application, we cannot be sure whether we have found all states. 23
Software Security Research Group (SSRG), University of Ottawa & IBM Hypercube Meta-Model We assume that the application model has two characteristics: –Enabled events: If the current state has n events, then the execution of one of these events leads to a state which contains exactly the same events, except the one that was executed. –Independence of the order of execution of events: The state reached after a certain set of events has been executed does not depend on the order in which they were executed. 24
Software Security Research Group (SSRG), University of Ottawa & IBM Example of a hypercube of dimension 4 (the initial state contains 4 enabled events) Hypercube Meta-Model 25
Software Security Research Group (SSRG), University of Ottawa & IBM Hypercube Strategy: State Exploration The minimal number of paths to cover all states of the hypercube can be calculated using an existing algorithm 2 n states can be covered by C(n,n/2) paths out of n! paths Case of n=4: 16 states covered by 6 paths out of 24 paths 26
Software Security Research Group (SSRG), University of Ottawa & IBM Hypercube Strategy: Transition Exploration Objective: Traverse every transition of the application at least once Some of the transitions are already traversed during the state exploration For the hypercube meta-model, the optimal transition exploration strategy is given by a simple approach: –From the initial state, go to the closest state that has an unexecuted event. –Start executing unexecuted events as long as the reached state has one. –When a state with no unexecuted event is reached, reset the application and start over from the initial state. 27
Software Security Research Group (SSRG), University of Ottawa & IBM Hypercube Strategy: Violations We distinguish the following types of violations: Unexpected Split: We are expecting to reach a known state, but the state actually reached is a new state. Unexpected Merge: We unexpectedly reach a known state. Appearing Events: The reached state has an event we did not expect Disappearing Events: Some events that we expect in the reached state are not enabled 28
Software Security Research Group (SSRG), University of Ottawa & IBM Hypercube Strategy: Violation Handling Example We update the anticipated model using the meta-model characteristics 29 Initial anticipated model Actual model discovered so far (Violation Appearing Events) Updated anticipated model
Software Security Research Group (SSRG), University of Ottawa & IBM “MENU” Meta – Model Hypothesis: –“The result of an event execution is independent of the state where the event has been executed and will always result in the same resultant state” –One-to-One mapping between Events and State S S’ S1 S2 E2 E1
Software Security Research Group (SSRG), University of Ottawa & IBM 31
Software Security Research Group (SSRG), University of Ottawa & IBM Menu Strategy Primary Goal : Find all the states of the application as soon as possible. State – Exploration Phase –Find all states that are anticipated by the meta model Transition - Exploration Phase –Execute events that are left unexecuted after the state-exploration phase
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase Goal : Find all the states as quickly as possibly Two Important Steps :- –Categorize events seen till now into sets with priorities. –Execute all the events from higher priority set before executing any event from lower priority set Assumption behind Event Categorization :- –There are likely more chances of discovering a new state by executing an event that has not been executed anywhere in the application than executing an event that has already been executed at some state and further by events that have been executed less as compared to another event with more execution history
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) Categorise Events based on Event Execution History Events Categories –Global Unexecuted Events –Not executed at any state discovered till now –Local Unexecuted Events –Not executed local to the state but has been executed at some other state Local Unexecuted Events –Uncategorized Events –Events executed once –Menu Events –Events executed twice and resulted in same resultant state – Events following our assumption –Non Menu Event –Events executed twice and resulted in different resultant state – Events defaulting our assumption
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) An Aggressive Path Construction Policy –Goal : Find a new state, if it exists in every step of execution –A single path corresponds to execution of one or more events G = (V,E), V = set of states discovered, E = set of Transitions discovered (events execution) –Instance of application discovered so far From Current state, find the shortest path to the state having the next event to execute.
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) Follow Hypothesis: –We assume the Every Events in the Application Follow our Hypothesis (Except for Non-Menu Events) We assume the results of uncategorized and menu events seen at any intermediate state. –We assume the result to be the same what we have seen in previous executions.
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) States : –S0 : Current State –S1, S2 : Intermediate states in the path –S3 : State to execute next event Events: –E0 : Known transition (Executed event at State S0) –E1 : Uncategorized event (Note: E1 has not been executed at S1) –E2 : Menu event (Note : E2 has not been executed at S2) E0 E1 E2 Unexecuted Event Results are assumed for these event executions E0 E1 E2
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) Violations: –Assumption is correct or incorrect := Categorize events (If the assumption involved uncategorized events correct menu Category ; Incorrect Non-Menu Category –Assumption is Incorrect –New State := We found a new state at a shorter path –Known State := (worst case) We drop all information and continue from this state as usual E2 Unexecuted Event E0 E1 E2
Software Security Research Group (SSRG), University of Ottawa & IBM State Exploration Phase (Cont.) We never execute Menu events explicitly. (In state-exploration phase) –State-exploration phase tries to execute all events found, except for the Categorized Menu Events. –However, we keep track of these events and their assumed resultant states. (Based on execution history of these events) Next Phase – Transition Exploration Phase –The transition exploration phase verifies all the assumptions made about these unexecuted categorized menu events.
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Exploration Phase Try to Validate all the Assumptions made about event execution results made during the first phase. –We do not expect to find any new state in this phase if the application follows our hypothesis. After state exploration phase, the only events remaining are the ones we categorized as menu events, i.e. the ones for whom consecutive executions resulted in same resultant state Build a graph, G’’ with remaining events as edges and incident states as vertices.
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Exploration Phase (cont.) Situation: –A graph with unexecuted events as edges, and assumed resultant states for these events. Requirement : –Verify all assumptions i.e. execute all the events –Travel all the edges of the Graph, G’’ at minimum cost (Minimum event executions and resets) –Synonymous to the problem of edge-traversal of graph – Chinese Postman Tour –Chinese Postman Tour – Given a graph, G find a least cost path that goes through every edge of the graph.
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Exploration Phase (cont.) Violations: –S2: State to execute next event in the sequence –E1, E2 : Events execution sequence. (Only E1 is menu event i.e. has assumption about the resultant state) –E1 Violation, S3 Resultant state S0 S1 S2 S3 E1 E2 Ex E1 Next event to execute, Ez
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Exploration Phase (cont.) We Ignore any Violations –We try to Re-Align to the sequence of events execution specified by the tour If E1 results in violation state S3, we find path from S3 to S2 and continue with the transition exploration phase. S0 S1 S2 S3 E1 E2 Ex E1 Next event to execute, Ez
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Exploration Phase (cont.) –New State : –The execution of an event in transition exploration phase might result in new state discovery –Go back to state discovery phase – to follow our primary goal. S0 S1 New E1 State Exploration Phase Transition Exploration Phase Transitions Remaining Finish New State NN Y Y
Software Security Research Group (SSRG), University of Ottawa & IBM The Hypercube and Menu strategies are implemented in a prototype of Rational AppScan Performance evaluation on several real RIAs and test applications Comparison with Breadth-First and Depth-First crawling strategies Implementation and Results
Software Security Research Group (SSRG), University of Ottawa & IBM Implementation and Results It is reasonable to use the “shortest path” from the current state to the state which has the next transition to explore We need an estimation for the cost of event executions and reset. For simplicity we assume each event has the same cost. But a reset is usually more time consuming than an average event execution. An input parameter for an implementation of a strategy is the cost of reset (“R”)
Software Security Research Group (SSRG), University of Ottawa & IBM Implementation and Results “R” is an estimation of the relative cost of a reset compared to an average event execution. Can be different for different applications We measure “R” by calculating the ratio of the average time it takes to reset and the average time to execute a random collection of events in the application. Each strategy implemented in our tool, including Depth-First and Breadth-First, uses R for determining the shortest path
Software Security Research Group (SSRG), University of Ottawa & IBM Implementation and Results Clipmarks: A real RIA for sharing parts of the webpages with other users
Software Security Research Group (SSRG), University of Ottawa & IBM Efficiency Comparison We compare strategies based on –State Discovery: The total cost used by the strategy to discover all (or a fraction of) the states. –Transition Discovery: The total cost used by the strategy to discover all (or a fraction of) the transitions. The total cost for discovering states (or transitions) –Cost = (number of event executions) + R*(number of resets) R = 18 for Clipmarks For state discovery, the optimal cost is also presented
Software Security Research Group (SSRG), University of Ottawa & IBM ```` State Discovery (Logarithmic Scale)
Software Security Research Group (SSRG), University of Ottawa & IBM Transition Discovery (Logarithmic Scale)
Software Security Research Group (SSRG), University of Ottawa & IBM Conclusions Introduced Model-Based Crawling: –Two meta-models and corresponding crawling strategies Presented a performance evaluation of the strategies on a real RIA Some of our current directions are –Investigating what other meta-models can be used, –The concepts of “important” states and events that should be given exploration priority, –A “sister project” on Distributed Crawling and Security Assessment of Rich Internet Applications – exploring the possibilities of using concurrent crawlers, possibly running in the cloud.
Software Security Research Group (SSRG), University of Ottawa & IBM Thank You
Software Security Research Group (SSRG), University of Ottawa & IBM Thank You