Download presentation
Presentation is loading. Please wait.
Published byPhilomena Park Modified over 6 years ago
1
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
2
Introduction-Motivation
Data stream processing and analysis Data stream: data arrive continuously and need to be processed in real-time Data Stream Applications: Online network Intrusion Detection Sensor networks Network Fault Management System for Telecommunication Network Elements Computer Vision Based Surveillance Common features of data streams Continuous arrival Enormous volume Real-time constraints Data sources could be distributed
3
Introduction-Motivation
Network Fault Management System analyzing alarm message streams Switch Network X Network Fault Management System
4
Introduction-Motivation
Computer Vision Based Surveillance
5
Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Switch Network X
6
Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network
7
Introduction-Motivation
Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired
8
Introduction-Motivation
From point of view of the developers who are interested in applications of data streams Would like to concentrate on applications themselves Would not like to focus efforts on Grid computing Adaptation function
9
Introduction-Our Approach
A Middle-ware that is based on Grid standards and tools and provides self-adaptation functionality The middleware is referred to as GATES (Grid-based AdapTive Execution on Stream) Automatically distributed to proper computing nodes Automatically self-adaptive to varying environment without implementing certain algorithms
10
System Architecture and Design (From Application Perspective)
Breaking down a task into several sub-tasks so that the sub-tasks can consist of a pipeline Implementing each sub-task in Java Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E. specify how many stages (sub-tasks) the pipeline has specify where the codes that are implementing the sub-tasks reside Launch the application by running a java program (StreamClient.class) provided by the GATES
11
System Architecture and Design (Architecture)
12
System Architecture and Design (Architecture)
Stage A Stage B Stage C A B C :Grid services of the GATES :Stages of an application :Queues between Grid services :Buffers for applications
13
System Architecture and Design (Example)
Public class Sampling-Stage implements StreamProcessing{ … void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();
14
Self-adaptation Algorithm
Given a queue’s long-term factor at each stage, we want to improve the method of adjusting values of an adaptation parameter Should the adaptation parameter be modified, and if so, in which direction? How to find a new value (update the value) of the adaptation parameter
15
Enhanced Self-adaptation Algorithm
Should the adaptation parameter be modified, and if so, in which direction? The answer is related to load status of queues at two consecutive stages
16
Enhanced Self-adaptation Algorithm
Performance Parameter A B C A B C A B C A B C A B C A B C A B C A B C Convergent States A B C Non-Convergent States
17
Enhanced Self-adaptation Algorithm
Summary of Load States
18
Enhanced Self-adaptation Algorithm
How to determine the new value for the adaptation parameter Linear update: increase or decrease by a fixed value Hard to find a proper fixed value Previous method Binary tree search
19
Enhanced Self-adaptation Algorithm
Left Border Current Value New Value Right Border Left Border Current Value Right Border
21
Data Mining Applications & System Evaluation
Two Data mining applications Clustream: Clustering data arriving in data streams
22
Data Mining Applications & System Evaluation
Dist-Freq-Counting: finding frequent itemsets from distributed streams
23
Data Mining Applications & System Evaluation
24
Data Mining Applications & System Evaluation
25
Data Mining Applications & System Evaluation
26
Data Mining Applications & System Evaluation
27
Data Mining Applications & System Evaluation
28
Data Mining Applications & System Evaluation
29
Data Mining Applications & System Evaluation
30
Data Mining Applications & System Evaluation
31
Data Mining Applications & System Evaluation
32
Resource Allocation Schemes
Problem Definition Grid resource scheduling for Pipelined processing and real-time distributed streaming applications Mapping workflows onto Grid is a NP-complete problem Static Part: the resource allocation problem for GATES is to determine a deployment configuration Dynamic Part
33
Static Allocation Scheme
Static allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed
34
Static Allocation Scheme
Examples of deployment configurations
35
Related work Grid Resource Allocation Condor Realtor ACDS etc.
Main Differences: our work focuses on Grid resource allocation for workflow applications Adaptation Through a Middleware Cheng et al.’s adaptation framework SWiFT Conductor DART ROAM Main Differences: our work focuses on general supports for adaptation in run-time
36
Summary Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints Grid resource allocation schemes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.