GATES: A Grid-Based Middleware for Processing Distributed Data Streams

GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Liang Chen, Kolagatla Reddy, Gagan Agrawal Department of Computer Science and Engineering The Ohio State University {chenlia, reddyk,

Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….

Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources

Motivating Application
Network Fault Management System Switch Network Network Fault Management System X

Motivating Application (2)
Computer Vision Based Surveillance

Motivating Application (3)
Tatabe et al. CCGRID 2002

Features of Distributed Streaming Processing Applications
Data sources could be distributed Over a WAN Continuous data arrival Enormous volume Probably can’t communicate it all to one site Results from analysis may be desired at multiple sites Real-time constraints A real-time, high-throughput, distributed processing problem

Motivation Challenges & Possible Solutions
Challenge1: Data, Communication, and/or Compute- Intensive Switch Network X

Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network

Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired

Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing

Roadmap GATES Architecture and API Adaptation algorithm Evaluation
Related work Conclusion On-going & Future work

GATES Grid-based AdapTive Execution on Streams
Targets (distributed) processing of (distributed) data streams Built on OGSA model Self adaptation to meet real-time constraint on processing

GATES and Grid-Standards
Internet Globus-OGSA GATES Applications Web service

Using GATES Break down the analysis into several sub-tasks that make a pipeline Implement each sub-task in Java Write an XML configuration file for the sub-tasks to be automatically deployed. Launch the application by running a java program (StreamClient.class) provided by the GATES

System Architecture

Adaptation for Real-time Processing
Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) Sampling Rate Size of summary structure Application developers can expose these parameters and a range of values

API for Adaptation Public class Sampling-Stage implements StreamProcessing{ … void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

Self-Adaptation Approach
Stage A Stage B Stage C A B C :Buffers :Queues :Grid services of the GATES :Stages of an application

Query Theory and Heuristic algorithm
Adaptation algorithm Goal Issues No specific information about applications Filtering out short-term bursts and sensitive to long-term behaviors Quickly find converged values of adjustment parameters Basic idea A B C Query Theory and Heuristic algorithm

Adaptation algorithm Equations

Evaluation Two applications Three experiments were conducted
A counting sample application A computational steering application Three experiments were conducted The First one was running counting sample applications on the GATES the other two were running computational steering applications

The Experiment One: Non-adaptive Vs. Adaptive version
Performance comparison Network Bandwidth (Kilo-Byte sec.) 40 (sec.) 80 120 160 Adaptive Version (Kilo-Byte/Sec.) 1 462.3 612.9 459.9 671 463.5 10 187.7 193.3 509.1 302.1 234.9 100 246.4 466.7 296.2 371.6 387.1 1000 240.4 298.8 307.7 478 399.9 Accuracy comparison Network Bandwidth (Kilo-Byte/Sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) Adaptive Version (Kilo-Byte/Sec.) 1 0.891 0.962 0.981 0.987 0.986 10 0.896 0.963 0.983 0.992 100 0.887 0.957 0.979 0.988 0.974 1000 0.879 0.989

Self-Adaptation with Different Processing Requirements

Self-Adaptation with Different Data Generation Rates

Related work dQUOB (dynamic QUery Objects) DataCutter
A lot of work on adaptation Adaptation for real-time processing of streams Streaming database systems Support DB Operations, usually centralized

Conclusion High-volume, distributed, stream processing is in our future Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints

On-going and Future Work
Continuous (dynamic) resource discovery & monitoring Resource Reallocation (self-mobility) Larger application (time-varying visualization) Generalize Adaptation Algorithm More evaluation studies

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Similar presentations

Presentation on theme: "GATES: A Grid-Based Middleware for Processing Distributed Data Streams"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Similar presentations

Presentation on theme: "GATES: A Grid-Based Middleware for Processing Distributed Data Streams"— Presentation transcript:

Similar presentations

About project

Feedback