Presentation is loading. Please wait.

Presentation is loading. Please wait.

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Similar presentations


Presentation on theme: "Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering"— Presentation transcript:

1 Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

2 Introduction-Motivation
Data stream processing and analysis Data stream: data arrive continuously and need to be processed in real-time Data Stream Applications: Online network Intrusion Detection Sensor networks Network Fault Management System for Telecommunication Network Elements Computer Vision Based Surveillance Common features of data streams Continuous arrival Enormous volume Real-time constraints Data sources could be distributed

3 Introduction-Motivation
Network Fault Management System analyzing alarm message streams Switch Network X Network Fault Management System

4 Introduction-Motivation
Computer Vision Based Surveillance

5 Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Switch Network X

6 Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network

7 Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired

8 Introduction-Motivation
From point of view of the developers who are interested in applications of data streams Would like to concentrate on applications themselves Would not like to focus efforts on Grid computing Adaptation function

9 Introduction-Our Approach
A Middle-ware that is based on Grid standards and tools and provides self-adaptation functionality The middleware is referred to as GATES (Grid-based AdapTive Execution on Stream) Automatically distributed to proper computing nodes Automatically self-adaptive to varying environment without implementing certain algorithms

10 System Architecture and Design (From Application Perspective)
Breaking down a task into several sub-tasks so that the sub-tasks can consist of a pipeline Implementing each sub-task in Java Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E. specify how many stages (sub-tasks) the pipeline has specify where the codes that are implementing the sub-tasks reside Launch the application by running a java program (StreamClient.class) provided by the GATES

11 System Architecture and Design (Architecture)

12 System Architecture and Design (Architecture)
Stage A Stage B Stage C A B C :Grid services of the GATES :Stages of an application :Queues between Grid services :Buffers for applications

13 System Architecture and Design (Example)
Public class Sampling-Stage implements StreamProcessing{ void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

14 Self-adaptation Algorithm
Given a queue’s long-term factor at each stage, we want to improve the method of adjusting values of an adaptation parameter Should the adaptation parameter be modified, and if so, in which direction? How to find a new value (update the value) of the adaptation parameter

15 Enhanced Self-adaptation Algorithm
Should the adaptation parameter be modified, and if so, in which direction? The answer is related to load status of queues at two consecutive stages

16 Enhanced Self-adaptation Algorithm
Performance Parameter A B C A B C A B C A B C A B C A B C A B C A B C Convergent States A B C Non-Convergent States

17 Enhanced Self-adaptation Algorithm
Summary of Load States

18 Enhanced Self-adaptation Algorithm
How to determine the new value for the adaptation parameter Linear update: increase or decrease by a fixed value Hard to find a proper fixed value Previous method Binary tree search

19 Enhanced Self-adaptation Algorithm
Left Border Current Value New Value Right Border Left Border Current Value Right Border

20

21 Data Mining Applications & System Evaluation
Two Data mining applications Clustream: Clustering data arriving in data streams

22 Data Mining Applications & System Evaluation
Dist-Freq-Counting: finding frequent itemsets from distributed streams

23 Data Mining Applications & System Evaluation

24 Data Mining Applications & System Evaluation

25 Data Mining Applications & System Evaluation

26 Data Mining Applications & System Evaluation

27 Data Mining Applications & System Evaluation

28 Data Mining Applications & System Evaluation

29 Data Mining Applications & System Evaluation

30 Data Mining Applications & System Evaluation

31 Data Mining Applications & System Evaluation

32 Resource Allocation Schemes
Problem Definition Grid resource scheduling for Pipelined processing and real-time distributed streaming applications Mapping workflows onto Grid is a NP-complete problem Static Part: the resource allocation problem for GATES is to determine a deployment configuration Dynamic Part

33 Static Allocation Scheme
Static allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed

34 Static Allocation Scheme
Examples of deployment configurations

35 Related work Grid Resource Allocation Condor Realtor ACDS etc.
Main Differences: our work focuses on Grid resource allocation for workflow applications Adaptation Through a Middleware Cheng et al.’s adaptation framework SWiFT Conductor DART ROAM Main Differences: our work focuses on general supports for adaptation in run-time

36 Summary Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints Grid resource allocation schemes


Download ppt "Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering"

Similar presentations


Ads by Google