TEMPLATE DESIGN © 2008 www.PosterPresentations.com Crawling is the process of automatically exploring a web application to discover the states of the application.

Slides:



Advertisements
Similar presentations
AfiFarm 4 – The New Generation Dealers Meeting, Dead Sea, Israel, 2008.
Advertisements

1 MDV, April 2010 Some Modeling Challenges when Testing Rich Internet Applications for Security Kamara Benjamin, Gregor v. Bochmann Guy-Vincent Jourdan,
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
Interception of User’s Characteristics on the Web Michal Barla PeWe FIIT STU BA.
IBM WebSphere Portal © 2008 IBM Corporation 1 Deliver an Irresistible User Experience  Provides an interactive user experience  No programming needed,
Project 1 Introduction to HTML.
1 Chapter 12 Working With Access 2000 on the Internet.
Xyleme A Dynamic Warehouse for XML Data of the Web.
JavaScript, Third Edition
Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
Chapter 11 Adding Media and Interactivity. Flash is a software program that allows you to create low-bandwidth, high-quality animations and interactive.
© 2011 Delmar, Cengage Learning Chapter 11 Adding Media and Interactivity with Flash and Spry.
Chapter 1 Introduction to HTML, XHTML, and CSS
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
INTRODUCTION TO DHTML. TOPICS TO BE DISCUSSED……….  Introduction Introduction  UsesUses  ComponentsComponents  Difference between HTML and DHTMLDifference.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Introduction to AJAX AJAX Keywords: JavaScript and XML
INFO415 Approaches to System Development: Part 2
TEMPLATE DESIGN © GroupNotes: Encouraging Proactive Student Engagement in Lectures through Collaborative Note-taking on.
TEMPLATE DESIGN © Efficient Crawling of Complex Rich Internet Applications Ali Moosavi, Salman Hooshmand, Gregor v. Bochmann,
Databases and LINQ Visual Basic 2010 How to Program 1.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.
Solving Some Modeling Challenges when Testing Rich Internet Applications for Security Software Security Research Group (SSRG), University of Ottawa In.
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.
In addition to Word, Excel, PowerPoint, and Access, Microsoft Office® 2013 includes additional applications, including Outlook, OneNote, and Office Web.
JavaScript II ECT 270 Robin Burke. Outline JavaScript review Processing Syntax Events and event handling Form validation.
THROUGH DIVERSITY EFFECTIVENESS AIR Forum 2006 May 18, 2006 Dynamic Charts: An approach to making institutional data available through graphical means.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved. 1 2 Welcome Application Introducing the Visual Basic 2008 Express Edition IDE.
Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.
Section 17.1 Add an audio file using HTML Create a form using HTML Add text boxes using HTML Add radio buttons and check boxes using HTML Add a pull-down.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Chapter One An Introduction to Visual Basic 2010 Programming with Microsoft Visual Basic th Edition.
1 Welcome to CSC 301 Web Programming Charles Frank.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >
Building Rich Web Applications with Ajax Linda Dailey Paulson IEEE – Computer, October 05 (Vol.38, No.10) Presented by Jingming Zhang.
WebSphere Portal Technical Conference U.S Creating Rich Internet (AJAX) Applications with WebSphere Portlet Factory.
Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.
The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Fall 2006 Florida Atlantic University Department of Computer Science & Engineering COP 4814 – Web Services Dr. Roy Levow Part 2 – Ajax Fundamentals.
Rails & Ajax Module 5. Introduction to Rails Overview of Rails Rails is Ruby based “A development framework for Web-based applications” Rails uses the.
TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.
REAL WORLD AJAX MARCH TIBCO USER CONFERENCE / 2004 Enterprise Rich Internet Applications with AJAX Real World Case Studies with TIBCO General Interface™
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
12 Chapter 12: Advanced Topics in Object-Oriented Design Systems Analysis and Design in a Changing World, 3 rd Edition.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
INT222 - Internet Fundamentals Shi, Yue (Sunny) Office: T2095 SENECA COLLEGE.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Crawling Rich Internet Applications: The State of the Art Software Security Research Group (SSRG) University of Ottawa In collaboration with IBM Suryakant.
TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.
 Web pages originally static  Page is delivered exactly as stored on server  Same information displayed for all users, from all contexts  Dynamic.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Chapter 11 Adding Media and Interactivity. Chapter 11 Lessons Introduction 1.Add and modify Flash objects 2.Add rollover images 3.Add behaviors 4.Add.
Event Handling & AJAX IT210 Web Systems. Question How do we enable users to dynamically interact with a website? Answer: Use mouse and keyboard to trigger.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Prof. James A. Landay University of Washington Spring 2008 Web Interface Design, Prototyping, and Implementation Rich Internet Applications: AJAX, Server.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Web Technology (NCS-504) Prepared By Mr. Abhishek Kesharwani Assistant Professor,UCER Naini,Allahabad.
A s s i g n m e n t W e e k 7 : T h e I n t e r n e t B Y : P a t r i c k O b i s p o.
SpyProxy SpyProxy Execution-based Detection of MaliciousWeb Content Execution-based Detection of MaliciousWeb Content Hongjin, Lee.
Expense Tracking System Developed by: Ardhita Maharindra Muskan Regmi Nir Gurung Sudeep Karki Tikaprem Gurung Date: December 05 th, 2008.
Ajax & Client-side Dynamic Web Gunwoo Park (Undergraduate)
TEMPLATE DESIGN © Automatic Classification of Parameters and Cookies Ali Reza Farid Amin 1, Gregor v. Bochmann 1, Guy-Vincent.
Software Security Research Group (SSRG),
Web Design and Development
Presentation transcript:

TEMPLATE DESIGN © Crawling is the process of automatically exploring a web application to discover the states of the application. A crawling strategy is an algorithm that decides how crawling proceeds. For example, the Breadth-First and the Depth-First are the standard crawling strategies. The result of crawling is “a model” of the application. A model is a directed graph that represents the discovered states and the connections between them. Model-based Crawling of Rich Internet Applications M. Emre Dincturk, Suryakant Choudhary, Guy-Vincent Jourdan, Gregor v. Bochmann, Iosif Viorel Onut School of Electrical Engineering and Computer Science - University of Ottawa Introduction – RIAs and Crawling Rich Internet Applications (RIAs) are a new generation of web applications that are more interactive and responsive than the traditional web applications. The key factor in RIAs that enhances the user interaction is the ability to execute code (such as JavaScript ) on the client-side (web browser) to modify the current page into a new one. The client-side code execution is triggered by either user- interaction events (e.g. mouse clicks) or time events (e.g. time-outs). In addition, using technologies like AJAX (Asynchronous JavaScript and XML), a RIA can communicate with the server asynchronously. Methodology – Model-based Crawling Our approach to design efficient crawling strategies is called “Model-based Crawling” whose main idea is  to define a meta-model which is a set of assumptions about the behavior of the application,  to design a crawling strategy which is optimized for the case that the application follows these assumptions, and  to specify how to adapt the strategy when crawling applications that do not satisfy these assumptions. Results We currently have three crawling strategies using model-based crawling approach. It is important to note that although each model-based strategy makes some initial assumptions about the application behavior, any RIA (including the ones that does not follow these assumptions) can be crawled with any model-based strategy. 1. The Hypercube Strategy is based on the Hypercube meta-model whose assumptions are:  the state reached by a sequence of events from the initial state is independent of the order of the events, and  the enabled events at a state are those at the initial state minus those executed to reach that state. Acknowledgments This work is supported in part by IBM and the Natural Science and Engineering Research Council of Canada. DISCLAIMER The views expressed in this poster are the sole responsibility of the authors and do not necessarily reflect those of the Center for Advanced Studies of IBM. Motivation and Aim Conclusion & Future Work  The experimental results show that the model-based crawling strategies are more efficient than the other existing crawling strategies.  Some of the areas that we are currently working on are techniques that will allow us to define a meta-model for the application during crawling, crawling algorithms for complex RIAs, which are the applications that have a very large state-space, and cannot be crawled exhaustively in a reasonable time (such as the widget based applications or the applications that display a large catalogue of similar items), techniques to prevent a crawler being stuck in a particular part of the application so that a bird-eye view of the application can be obtained earlier, and improving our crawler prototype so that it will be capable of crawling a large number of real RIAs. Implementation Our algorithms are implemented in a prototype of IBM ® Security AppScan ® Enterprise. Example: Figure 2 shows a page in a RIA. The RIA is a file manager for the Web. In this page, currently a file is selected. When the info button (highlighted with a red border) is clicked, the page is modified into a new one (shown in Figure 3). In a RIA, each distinct page is called a client-state (or simply a state ) of the application. Figure 1. Asynchronous Communication Pattern in RIAs Figure 2. A page in a RIA Figure 3. The page after the info button is clicked Two important motivations for crawling are  Content Indexing : To make the content of an application searchable by the Web users.  Testing : To detect security vulnerabilities, to find accessibility issues or to test functionality. The traditional crawlers only follow links (URLs) to discover new pages. As a result, large portions of RIAs, which are only reachable by executing events, are not searchable and testable. For RIAs, event-based crawling techniques are needed.  We aim at crawling RIAs efficiently. We define the crawling efficiency as the ability to discover the states of the application as soon as possible (using as few events and resets as possible).  The standard crawling strategies, the Breadth-First and the Depth-First, are not efficient for RIAs since they are not designed towards the specific needs of RIAs. Model-based Crawling Strategies With these assumptions, the model of an application is assumed to be a hypercube structure. Hypercube strategy is an optimal strategy for the applications following the hypercube meta-model. Figure 4. A four dimensional hypercube is an anticipated model for an application whose initial state has 4 events 2. The Menu Strategy is based on the Menu meta-model whose assumption is:  the result of an event execution is independent of the state the event is executed and always results in the same resultant state. The Menu strategy first categorizes each event into one of three categories, which are menu, self-loop and other, and prioritizes them based on their category and their closeness to the current state. Figure 6. (right) A page in a RIA with menu events (within the red border). Figure 5. (top) A model showing two “menu” events (e 1 and e 2 ) and a “self-loop” event (e 3 ). 3. The Probability Strategy is based on a statistical model which assumes that  an event which was often observed to lead to new states in the past will be more likely to lead to new states in the future. The strategy is simply to prioritize events based on their probability of discovering a new state and their closeness to the current state. The probability of an event is estimated dynamically during crawling, using the following Bayesian formula where S(e) is the number of times the event e discovered a new state, N(e) is the number of times e is executed from different states, and p s and p n are pre-set parameters to set an initial probability. We have also implemented a tool to visualize the extracted model of an application.  Crawling Efficiency (costs of discovering all states) Figure 9. Performances of different strategies to discover the states on 4 applications (log scale) Figure 7. Architecture of our RIA crawler Figure 8. Data-flow diagram for the visualization tool Cost of Reset: Table 1. The costs at the time crawl terminates  Costs of Complete Crawls