Software Security Research Group (SSRG),

Software Security Research Group (SSRG),
University of Ottawa In collaboration with IBM Solving Some Modeling Challenges when Testing Rich Internet Applications for Security reviews :

SSRG Members University of Ottawa Prof. Guy-Vincent Jourdan
Prof. Gregor v. Bochmann Suryakant Choudhary (Master student) Emre Dincturk (PhD student) Khaled Ben Hafaiedh (PhD student) Seyed M. Mir Taheri (PhD student) Ali Moosavi (Master student) In collaboration with Research and Development, IBM® Security AppScan® Enterprise Iosif Viorel Onut (PhD) Introduce the group and Explain what are you working.! We are a research group from University of Ottawa and we are working on efficient crawling strategies for rich internet applications. We also work closely with IBM Research group. We were working on making IBM Web application security assessment tool, IBM Security AppScan being able to access and analyse newer web application technologies.

IBM Rational AppScan Enterprise Edition
IBM Security Solutions IBM Rational AppScan Enterprise Edition Product overview 3

SECURITY REQUIREMENTS CODE BUILD QA PRE-PROD PRODUCTION
4 Dynamic Analysis/Blackbox – Static Analysis/Whitebox - IBM Rational AppScan Suite – Comprehensive Application Vulnerability Management SECURITY REQUIREMENTS CODE BUILD QA PRE-PROD PRODUCTION AppScan Enterprise AppScan onDemand AppScan Reporting Console Security Requirements Definition AppScan Source AppScan Tester AppScan Standard AppScan Standard AppScan Build Security requirements defined before design & implementation Automate Security / Compliance testing in the Build Process Security / compliance testing incorporated into testing & remediation workflows Security & Compliance Testing, oversight, control, policy, audits Outsourced testing for security audits & production site monitoring Build security testing into the IDE Application Security Best Practices – Secure Engineering Framework

AppScan Enterprise Edition capabilities
Large scale application security testing Client-server architecture designed to scale Multiple users running multiple assessments Centralized repository of all assessments Scheduling and automation of assessments REST-style API for automation and integrations Enterprise visibility of security risk High-level dashboards Detailed security issues reports, advisories and fix recommendations Correlation of results discovered using dynamic and static analysis techniques Over 40 compliance reports like PCI, GLBA, SOX Governance & collaboration User roles & access permissions Test policies Issue management Defect tracking systems integration AppScan Enterprise Edition capabilities 5

AppScan Enterprise Workflows
Management Review most common security issues View trends Assess risk Evaluate progress Development & QA Conduct assessments View assessment results Remediate issues Assign issue status Compliance Officers Review compliance reports AppScan Enterprise Build automation Source code analysis for security issues as part of build verification Publish findings for remediation and trending Information Security Schedule and automate assessments Conduct assessments with AppScan Standard and AppScan Source and publish findings for remediation and trending I didn’t add in the Analist to the list cause we really don’t have anything for them. Tools: AppScan Source for Automation AppScan Standard Edition CLI Tools: AppScan Standard Edition AppScan Source Edition 6 6 6

View detailed security issues reports
Security Issues Identified with Static Analysis Security Issues Identified with Dynamic Analysis Aggregated and correlated results Remediation Tasks Security Risk Assessment 7 7

Obtain a high-level view of the security of your applications
Compare the number of issues across teams and applications Identify top security issues and risks View trending of the number of issues by severity over time Monitor the progress of issue resolution 8 8

Assess regulatory compliance risk
Over 40 compliance reports, including: The Payment Card Industry Data Security Standard (PCI) VISA CISP Children Online Privacy Protection Act (COPPA) Financial Services (GLBA) Healthcare Services (HIPAA) Sarbanes-Oxley Act (SOX) 9 9

Introduction: Traditional Web Applications
Navigation is achieved using the links (URLs) Synchronous communication Initially, the web applications were mostly static HTML pages on the client sides. Each of these web pages had a unique Uniform Resource Locator (URL) to access. Client or a browser would generate a request using these urls and the server in turn will create the response and reply back . The client will then replace the previous content with the new one. The communication is synchronous and the client has to wait for the response

Introduction : Rich Internet Applications
More interactive and responsive web apps Page changes via client-side code (JavaScript) Asynchronous communication In recent years, Rich Internet Applications (RIAs) have become the new trend for web applications defying the notion of web applications running exclusively on the server side One of the most important ones is the migration of server-side application logic to client-side scripting—most often in the form of Javascript. RIA introduced two important changes : 1. Modification of the web page dynamically using client side scritps and 2. Asynchronous communication with the server These changes defined the notion of one URL per web page of the application. Now we can have complex web applications addressed by a single URL. These changes made the web applications more usable and interactive though also introduced new challenges. One of the imporatant ones is the difficulty to automatically crawl these web applications.

Crawling and web application security testing
All parts of the application must be discovered before we analyze for security. Why automatic crawling algorithm are important for security testing ? Most RIAs are too large for manual exploration Efficiency Coverage The concerns on the security of the web applications have grown along with their popularity. One of the response to these concern about security issues was the development of automated tools for testing web applications for security. It is clear that effectiveness of a security scanner depends not only on the quality and coverage of the test cases but also on how efficient it is at discovering the pages (client-states) of the application. This activity of automatic exploration of the web application is called crawling. The result of crawling is called a “model” of the application. This model simply contains the discovered client-states and ways to move from one state to the other within the application. Only after obtaining a model of the application can a security scanner know which states exists and how to reach them in order to apply the security tests. Thus, crawling is a necessary step for security scanning as well as it is necessary for content indexing and testing for any other purpose such as accessibility. Traditional crawling techniques are no longer sufficient for crawling Rich internet applications. Most RIAs are complex web applications with very large state space. Manual crawling is impossible or at the least very time consuming to discover every web page of the application for purpose of analysing the complete web application.

What we present… Techniques and Approaches to make web application security assessment tools perform better How to improve the performance? Make them efficient by analysing only what’s important and ignore irrelevant information Making rich internet applications accessible to them. Primarily motivated by the aim of making security scanners usable on RIAs, our research group has been working in collaboration with IBM to design efficient RIA crawling Techniques.

Web application crawlers
Main components: Crawling strategy Algorithm which guides the crawler State equivalence Algorithm which indicates what should be considered new Web application crawler is a computer program that browses a web application in an automated, methodical manner or in an orderly fashion. Its intended to interact with the web application in a way user would. Crawling strategy: Guides the exploration process of the web crawler. Which URL i.e. HTML link or event from current web page should be explored next. State equivalence in simple terms helps the web crawler to decide if it has already visited a web page or not.

State Equivalence Client states
Decides if two client states of an application should be considered different or the same. Why important? Infinite runs or state explosion Incomplete coverage of the application During crawling, the crawler navigates through the states of the application by following the found links or executing an event on the current state. When the crawler reaches a state, it must know if it had already been to that state before. Otherwise, the crawler would regard each state as a new one and it could never finish crawling and build a meaningful model, it would just keep exploring the same states over and over again. The choice of an appropriate equivalence relation should be considered very carefully. If an equivalence evaluation method is too stringent (like equality), then it may result in too many states being produced, essentially resulting in state explosion, long runs and in some cases infinite runs. On the contrary, if the equivalence relation is too lax, we may end up with client states that are merged together while, in reality, they are different, leading to an incomplete, simplified model Further, the choice of state equivalence helps to introduce the purpose of crawl into the crawling strategy. Example. Search engine and security testing.

Techniques Load-Reload: Discovering non-relevant dynamic content of web pages Identifying Session Variables and Parameters We will be discussing about techniques, that will help improve the state equivalence definition.

1. Load-Reload: Discovering non-relevant dynamic content of web pages
Extracting the relevant information from a page. Web pages often contain bits of content that change very often but are not important in terms of making two states non-equivalent. When determining whether or not two states are equivalent, there is a desire to be able to ignore these constantly changing but irrelevant portions of the page. This is important since failing to identify data that should be ignored could cause an equivalence function to evaluate to false when it otherwise would not. Thus one of the important challenges when defining state equivalence functions is to exclude from the content considered in the equivalence function the portion of the page/DOM that may introduce false positives. The most common current solution to the problem is to manually configure the crawler on a case by case basis, to make it ignore certain types of objects that are known to change over time, such as session ids and cookies. This is highly inefficient, and is also inaccurate, since most of the time this list is incomplete. Another solution is to use regular expressions to identify in the DOM the portions of the content that can be ignored. The main problem with the latter solution is the difficulty of creating the regular expressions and the fact that they are different for different sites. Automating the detection of irrelevant page sections is desired since those differences are also page-specific and as a consequence, the irrelevant parts vary from page to page even within the same website

What we propose Reload the web page (URL) to determine the parts of the content that are relevant. Calculate Delta (X): Content that changed between the two loads.

What we propose (2) Delta(X): X is any web page and Delta(X) is collection of xpaths of the contents that are not relevant E.g. Delta(X) = {html\body\div\,

Example

Example (2)

What we propose (3) Delta (X) Is purpose and application dependent
Few computing techniques: Use proxies Manual identification to supplement automatic detection algorithm etc. Alternatively, one could keep track of the parts in the DOM that do not change in time and consider only those for the DOM comparison method. Those common parts of the DOM would in this case act like a mask to the current DOM. Regardless of the technique, the effect will be the same (i.e. the Delta(X) will be excluded from the data sent to the DOM comparison function).

2. Identifying Session Variables and Parameters
What is a session? A session is a conversation between the server and a client. Why should a session be maintained? HTTP is Stateless: When there is a series of continuous request and response from a same client to a server, the server cannot identify from which client it is getting requests. Web sites usually track users as they download different pages on the web site. User tracking is useful for identifying user behavior, such as identifying purchasing behavior by tracking the user through various page requests on a shopping oriented website.

Identifying Session Variables and Parameters (2)
Session tracking methods: User authorization Hidden fields URL rewriting Cookies Session tracking API Problems that are addressed: Redundant crawling: Might result in crawler trap or infinite runs. Session termination problem: Incomplete coverage of the application if application requires session throughout the access. Although URL-rewriting is not the most common, many web application servers offer built-in functionality, to allow the application to run with browser clients that do not accept cookies. This might result in multiple urls (only differentiating in the session identifiers) for the same web page. Another problem faced by automated crawlers, not being able to detect session identifiers is session termination. If the client fails to provide the correct session identifiers, the web application will terminate the session and the crawl operation will result in poor application coverage. The current solutions for these problems are not reliable. They are based on heuristics, such as known session id name patterns or entropy of the values. There are two sources of weakness with the current solutions: 1) they rely on expert knowledge to create; 2) they depend on common practices that servers use to populate these values. That is, the current solutions require human intervention (not automated) and cannot be effective in case of a server that does not use one of the common practices.

What we propose Two recordings of the log-in sequence are done on the same website, using the same user input (e.g. same user name and password) and the same user actions. The proposed algorithm requires that the two recordings of the log-in sequence are done on the same website, using the same user input (e.g. same user name and password) and the same user actions. Failure to respect this requirement will lead to invalid results. It is also important that the session is invalidated between the two recording actions or that the second recording action will occur when the crawling has reached an out-of-session state.

Example

3. Crawling Strategies For RIAs
Crawling extracts a “model” of the application that consists of States, which are “distinct” web pages Transitions are triggered by event executions Strategy decides how the application exploration should proceed

Standard Crawling Strategies
Breadth-First and Depth-First They are not flexible They do not adapt themselves to the application Breadth-First often goes back to the initial page Increases the number of reloads (loading the URL) Depth-First requires traversing long paths Increases the number of event executions breadth first are not flexible. They will follow the logic regardless of the application or information available. With breadth first search, its costly to go back in RIAs.

What we propose Model Based Crawling
Model is an assumption about the structure of the application Specify a good strategy for crawling any application that follows the model. Specify how to adapt the crawling strategy in case that the application being crawled deviates from the model.

What we propose (2) Existing models: Hypercube Model Probability Model
Independent events The set of enabled events at a state are the same as the initial state except the ones executed to reach it. Probability Model Statistics gathered about event execution results are used to guide the application exploration strategy

Conclusion Crawling is essential for automated security testing of web applications We introduced two techniques to enhance security testing of web applications Identifying and ignoring irrelevant web page contents Identifying and ignoring session information We have worked on new crawling algorithms

Thank You !

Demonstration Rich Internet Application Security Testing
- IBM® Security AppScan® Enterprise

DEMO – IBM® Security AppScan® Enterprise
IBM Security AppScan Enterprise is an automated web application scanner We added RIA crawling capability on a prototype of AppScan We will demo how the coverage of the tool increases with RIA crawling capability

DEMO – Test Site (Altoro Mutual)

DEMO – Results Without RIA Crawling

DEMO - Results With RIA Crawling

Software Security Research Group (SSRG),

Similar presentations

Presentation on theme: "Software Security Research Group (SSRG),"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Security Research Group (SSRG),

Similar presentations

Presentation on theme: "Software Security Research Group (SSRG),"— Presentation transcript:

Similar presentations

About project

Feedback