TEMPLATE DESIGN © 2008 www.PosterPresentations.com Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

Slides:

Advertisements

Similar presentations

An Overview of ABFT in cloud computing

Advertisements

CONCEPTUAL WEB-BASED FRAMEWORK IN AN INTERACTIVE VIRTUAL ENVIRONMENT FOR DISTANCE LEARNING Amal Oraifige, Graham Oakes, Anthony Felton, David Heesom, Kevin.

Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.

IEEE/FIPA WG Mobile Agents Ulrich Pinsdorf Fraunhofer-Institute IGD, Germany Dept. Security Technology

Building Cloud-ready Video Transcoding System for Content Delivery Networks(CDNs) Zhenyun Zhuang and Chun Guo Speaker: 饒展榕.

1 MDV, April 2010 Some Modeling Challenges when Testing Rich Internet Applications for Security Kamara Benjamin, Gregor v. Bochmann Guy-Vincent Jourdan,

IP Traceback in Cloud Computing Through Deterministic Flow Marking Mouiad Abid Hani Presentation figures are from references given on slide 21. By Presented.

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

Embedded Web Hyung-min Koo. 2 Table of Contents Introduction of Embedded Web Introduction of Embedded Web Advantages of Embedded Web Advantages of Embedded.

Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.

Sanzaru Capability-Based Interactions for Web Applications Raluca Sauciuc Shaunak Chatterjee University of California, Berkeley Motivation Limitations.

Optimal Crawling Strategies for Web Search Engines Wolf, Sethuraman, Ozsen Presented By Rajat Teotia.

Web Programming Language Dr. Ken Cosh Week 1 (Introduction)

Client/Server Architectures

CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

TEMPLATE DESIGN © Efficient Crawling of Complex Rich Internet Applications Ali Moosavi, Salman Hooshmand, Gregor v. Bochmann,

Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.

Solving Some Modeling Challenges when Testing Rich Internet Applications for Security Software Security Research Group (SSRG), University of Ottawa In.

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.

2011/08/09 Sunwook Bae. Contents Paper Info Introduction Overall Architecture Resource Management Evaluation Conclusion References.

An Introduction to IBM Systems Director

Consensus-based Distributed Estimation in Camera Networks - A. T. Kamal, J. A. Farrell, A. K. Roy-Chowdhury University of California, Riverside

A Web Crawler Design for Data Mining

Secure Search Engine Ivan Zhou Xinyi Dong. Introduction  The Secure Search Engine project is a search engine that utilizes special modules to test the.

CH2 System models.

DISTRIBUTED COMPUTING

Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.

Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.

1 Vulnerability Analysis and Patches Management Using Secure Mobile Agents Presented by: Muhammad Awais Shibli.

Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.

Composing Adaptive Software Authors Philip K. McKinley, Seyed Masoud Sadjadi, Eric P. Kasten, Betty H.C. Cheng Presented by Ana Rodriguez June 21, 2006.

The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:

Jon Perez, Mikel Azkarate-askasua, Antonio Perez

Locating Mobile Agents in Distributed Computing Environment.

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

A security framework combining access control and trust management for mobile e-commerce applications Gregor v.Bochmann, Zhen Zhang, Carlisle Adams School.

Building Rich Web Applications with Ajax Linda Dailey Paulson IEEE – Computer, October 05 (Vol.38, No.10) Presented by Jingming Zhang.

Using SaaS and Cloud computing For “On Demand” E Learning Services Application to Navigation and Fishing Simulator Author Maha KHEMAJA, Nouha AMMARI, Fayssal.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

DynaRIA: a Tool for Ajax Web Application Comprehension Dipartimento di Informatica e Sistemistica University of Naples “Federico II”, Italy Domenico Amalfitano.

PIMRC 2007 A lightweight approach for providing Location Based Content Retrieval Anastasios Zafeiropoulos, Emmanuel Solidakis, Stavroula Zoi, Nikolaos.

Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.

1 Gregor v. Bochmann, University of Ottawa ICTSS 2015 Sharjah and Dubai (UAE), November 2015 Gregor v. Bochmann School of Electrical Engineering and Computer.

Crawling Rich Internet Applications: The State of the Art Software Security Research Group (SSRG) University of Ottawa In collaboration with IBM Suryakant.

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

Web Technologies Lecture 6 State preservation. Motivation How to keep user data while navigating on a website? – Authenticate only once – Store wish list.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

TEMPLATE DESIGN © Crawling is the process of automatically exploring a web application to discover the states of the application.

TUMOR BURDEN ANALYSIS ON CT BY AUTOMATED LIVER AND TUMOR SEGMENTATION RAMSHEEJA.RR Roll : No 19 Guide SREERAJ.R ( Head Of Department, CSE)

1 EUROPEAN COMMISSION Tempus JEP – – 2006 Supporting and facilitating active uptake to Information and Communication Technologies for University.

National Aeronautics and Space Administration Jet Propulsion Laboratory March 17, 2009 Workflow Orchestration: Conducting Science Efficiently on the Grid.

VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.

UbiCrawler : a scalable fully distributed Web crawler P. Boldi, B. Codenotti, M. Santini, and S. Vigna, SPE Vol.34 No.2 pages , Feb Kyoung.

1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.

A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.

TEMPLATE DESIGN © Automatic Classification of Parameters and Cookies Ali Reza Farid Amin 1, Gregor v. Bochmann 1, Guy-Vincent.

WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.

Software Security Research Group (SSRG),

Authors: Jiang Xie, Ian F. Akyildiz

Improving searches through community clustering of information

UbiCrawler: a scalable fully distributed Web crawler

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Steven Whitham Jeremy Woods

Model-Driven Analysis Frameworks for Embedded Systems

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Presentation transcript:

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional websites in which every call to server would change the whole DOM and the page URL, RIA relies on small AJAX updates that does not necessarily modify the page URL:  Traditional distributed crawlers rely heavily on URL in order to partition the search space. Underlying assumption for this strategy is a one to one correspondence between the URL and the state of DOM which does not hold in RIA.  Therefore we propose to partition the search space based on events. Crawling Strategy: Reduce the workload by choosing the events to execute using Greedy algorithm. Crawling Efficiency : Discover states as soon as possible, using Probabilistic model. Reference: [Benjamin 2010] K. Benjamin, G. v. Bochmann, G.-V. Jourdan and V. Onut, Some modeling challenges when testing Rich Internet Applications for security, First Intern. Workshop on Modeling and Detection of Vulnerabilities (MDV 2010), Paris, France, April pages. Pdist-RIA Crawler: A P2P architecture to crawl RIAs Seyed M. Mirtaheri, Gregor v. Bochmann, Guy-Vincent Jourdan, Iosif Viorel Onut School of Information Technology and Engineering - University of Ottawa Introduction Rich Internet Applications (RIAs) allow better user interaction and responsiveness than traditional web applications. Thanks to new technologies like AJAX (Asynchronous JavaScript and XML), Rich Internet Applications can communicate with the server asynchronously. This allows continuous user interactions. Figure 1: AJAX enabled RIA communication pattern. Test Beds Acknowledgments This work is supported in part by Center for Advanced Studies, IBM Canada. DISCLAIMER The views expressed in this poster are the sole responsibility of the authors and do not necessarily reflect those of the Center for Advanced Studies of IBM. Motivation and Aim Future Work We are currently working on distributed crawling of RIA in a cloud environment. We plan to add fault tolerance to our strategy so that if some of the nodes crash rest of the nodes continue without interruption. Once we have a working implementation of the system we plan to optimize it based on different infrastructure parameters such as cost of communication or the processing power available to different nodes. Experimental Results Security of RIA and automating security testing are important, ongoing, and growing concerns. One important aspect of this automation is the crawling of RIAs i.e. reaching all possible states of the application from the initial state. Being able to do so automatically is also valuable for search engines and accessibility assessment. Background Crawling Strategy :  Breath-First Search  Bounded Depth-First Search  Based on page weight  Model Based Crawling  Hypercube Model  Menu Model  Greedy algorithm  Probabilistic model Partitioning strategies : Mostly use server related matrix as primary tool to partition search space:  Page URL  Server IP address  Server geographical location  [ Loo 2004] describes distributed web crawling by hashing the URL Architecture of AJAX Crawl (Appeared in [Duda 2009]) Figure 2: [Loo 2004] System Architecture References: [Amalfitano 2010] D. Amalfitano, A.R. Fasoline, P. Tramontana, Techniques and tools for Rich Internet Application testing, in Proc. WSE, 2010, pp [Mesbah 2008] A. Mesbah and A. van Deursen, A Component- and Push-based Architectural Style for Ajax Applications. 2008, Journal of Systems and Software (JSS) 81(12): [Duda 2009] C. Duda, G. Frey, D. Kossmann, R. Matter, C. Zhou, AJAX Crawl: Making AJAX Applications Searchable, Proc. IEEE Intern. Conf. on Data Engineering, Shanghai, 2009, pp. 78- [Loo 2004] B.T. Loo, O. Cooper, S.Krishnamurthy, Distributed Web Crawling over DHTs, Technical report, University of California, Berkeley, 2004, [Exposto 2008] J. Exposto, J. Macedo, A. Pina, A. Alves and J. Rufino, Efficient partitioning strategies for distributed web crawling, in Information Networking: Towards Ubiquitous Networking and Services, Springer LNCS 5200 (2008), pp [Boldi 2003] P. Boldi, B. Codenotti, M. Santini, S.Vigna, UbiCrawler: A Scalable Fully Distributed Web Crawler, Software: Practice & Experience, Vol. 34, 2003, p [Dincturk 2014] Dincturk, Mustafa Emre and Jourdan, Guy-Vincent and Bochmann, Gregor von and Onut, Iosif Viorel: Model Based Crawling, ACM Transactions on the WEB [Dincturk 2012] Dincturk, Mustafa Emre and Choudhary, Suryakant and Bochmann, Gregor von and Jourdan, Guy-Vincent and Onut, Iosif Viorel: A Statistical Approach for Efficient Crawling of Rich Internet Applications, Proceedings of the 12th international conference on Web engineering Crawling of RIA applications is an expensive and time consuming process due to their large number of states. To accelerate this operation we distribute the operation over many nodes in an elastic cloud environment. Algorithm Crawling Efficiency Nodes act autonomously and independently. Each node starts at Init state, when get tasks go to Active state, when has nothing to do goes to idle state, and finally terminates when termination order arrives. Proposed Architecture A virtual ring is created based on breath-first-search traversal of the nodes. A termination token goes around this wring that keeps the list of states IDs and the number of states visited by each node. When all states are visited on all nodes, a termination order is broadcasted. Crawling efficiency measures how early in the crawl new states are discovered: Conclusion Distributed Greedy algorithm has the best performance in terms to total time it takes to crawl a website. Distributed Probabilistic model is the most efficient algorithm and discovers states early in the crawl. Four target web applications used to measure the performance of the distributed web crawler: