Presentation is loading. Please wait.

Presentation is loading. Please wait.

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Similar presentations


Presentation on theme: "GATES: A Grid-Based Middleware for Processing Distributed Data Streams"— Presentation transcript:

1 GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Liang Chen, Kolagatla Reddy, Gagan Agrawal Department of Computer Science and Engineering The Ohio State University {chenlia, reddyk,

2 Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….

3 Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources

4 Motivating Application
Network Fault Management System Switch Network Network Fault Management System X

5 Motivating Application (2)
Computer Vision Based Surveillance

6 Motivating Application (3)
Tatabe et al. CCGRID 2002

7 Features of Distributed Streaming Processing Applications
Data sources could be distributed Over a WAN Continuous data arrival Enormous volume Probably can’t communicate it all to one site Results from analysis may be desired at multiple sites Real-time constraints A real-time, high-throughput, distributed processing problem

8 Motivation Challenges & Possible Solutions
Challenge1: Data, Communication, and/or Compute- Intensive Switch Network X

9 Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network

10 Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired

11 Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing

12 Roadmap GATES Architecture and API Adaptation algorithm Evaluation
Related work Conclusion On-going & Future work

13 GATES Grid-based AdapTive Execution on Streams
Targets (distributed) processing of (distributed) data streams Built on OGSA model Self adaptation to meet real-time constraint on processing

14 GATES and Grid-Standards
Internet Globus-OGSA GATES Applications Web service

15 Using GATES Break down the analysis into several sub-tasks that make a pipeline Implement each sub-task in Java Write an XML configuration file for the sub-tasks to be automatically deployed. Launch the application by running a java program (StreamClient.class) provided by the GATES

16 System Architecture

17 Adaptation for Real-time Processing
Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) Sampling Rate Size of summary structure Application developers can expose these parameters and a range of values

18 API for Adaptation Public class Sampling-Stage implements StreamProcessing{ void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

19 Self-Adaptation Approach
Stage A Stage B Stage C A B C :Buffers :Queues :Grid services of the GATES :Stages of an application

20 Query Theory and Heuristic algorithm
Adaptation algorithm Goal Issues No specific information about applications Filtering out short-term bursts and sensitive to long-term behaviors Quickly find converged values of adjustment parameters Basic idea A B C Query Theory and Heuristic algorithm

21 Adaptation algorithm Equations

22 Evaluation Two applications Three experiments were conducted
A counting sample application A computational steering application Three experiments were conducted The First one was running counting sample applications on the GATES the other two were running computational steering applications

23 The Experiment One: Non-adaptive Vs. Adaptive version
Performance comparison Network Bandwidth (Kilo-Byte sec.) 40 (sec.) 80 120 160 Adaptive Version (Kilo-Byte/Sec.) 1 462.3 612.9 459.9 671 463.5 10 187.7 193.3 509.1 302.1 234.9 100 246.4 466.7 296.2 371.6 387.1 1000 240.4 298.8 307.7 478 399.9 Accuracy comparison Network Bandwidth (Kilo-Byte/Sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) Adaptive Version (Kilo-Byte/Sec.) 1 0.891 0.962 0.981 0.987 0.986 10 0.896 0.963 0.983 0.992 100 0.887 0.957 0.979 0.988 0.974 1000 0.879 0.989

24 Self-Adaptation with Different Processing Requirements

25 Self-Adaptation with Different Data Generation Rates

26 Related work dQUOB (dynamic QUery Objects) DataCutter
A lot of work on adaptation Adaptation for real-time processing of streams Streaming database systems Support DB Operations, usually centralized

27 Conclusion High-volume, distributed, stream processing is in our future Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints

28 On-going and Future Work
Continuous (dynamic) resource discovery & monitoring Resource Reallocation (self-mobility) Larger application (time-varying visualization) Generalize Adaptation Algorithm More evaluation studies


Download ppt "GATES: A Grid-Based Middleware for Processing Distributed Data Streams"

Similar presentations


Ads by Google