Download presentation
Presentation is loading. Please wait.
Published byPierce Chase Modified over 5 years ago
1
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Liang Chen, Kolagatla Reddy, Gagan Agrawal Department of Computer Science and Engineering The Ohio State University {chenlia, reddyk,
2
Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….
3
Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources
4
Motivating Application
Network Fault Management System Switch Network Network Fault Management System X
5
Motivating Application (2)
Computer Vision Based Surveillance
6
Motivating Application (3)
Tatabe et al. CCGRID 2002
7
Features of Distributed Streaming Processing Applications
Data sources could be distributed Over a WAN Continuous data arrival Enormous volume Probably can’t communicate it all to one site Results from analysis may be desired at multiple sites Real-time constraints A real-time, high-throughput, distributed processing problem
8
Motivation Challenges & Possible Solutions
Challenge1: Data, Communication, and/or Compute- Intensive Switch Network X
9
Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network
10
Motivation Challenges & possible Solutions
Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired
11
Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing
12
Roadmap GATES Architecture and API Adaptation algorithm Evaluation
Related work Conclusion On-going & Future work
13
GATES Grid-based AdapTive Execution on Streams
Targets (distributed) processing of (distributed) data streams Built on OGSA model Self adaptation to meet real-time constraint on processing
14
GATES and Grid-Standards
Internet Globus-OGSA GATES Applications Web service
15
Using GATES Break down the analysis into several sub-tasks that make a pipeline Implement each sub-task in Java Write an XML configuration file for the sub-tasks to be automatically deployed. Launch the application by running a java program (StreamClient.class) provided by the GATES
16
System Architecture
17
Adaptation for Real-time Processing
Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) Sampling Rate Size of summary structure Application developers can expose these parameters and a range of values
18
API for Adaptation Public class Sampling-Stage implements StreamProcessing{ … void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();
19
Self-Adaptation Approach
Stage A Stage B Stage C A B C :Buffers :Queues :Grid services of the GATES :Stages of an application
20
Query Theory and Heuristic algorithm
Adaptation algorithm Goal Issues No specific information about applications Filtering out short-term bursts and sensitive to long-term behaviors Quickly find converged values of adjustment parameters Basic idea A B C Query Theory and Heuristic algorithm
21
Adaptation algorithm Equations
22
Evaluation Two applications Three experiments were conducted
A counting sample application A computational steering application Three experiments were conducted The First one was running counting sample applications on the GATES the other two were running computational steering applications
23
The Experiment One: Non-adaptive Vs. Adaptive version
Performance comparison Network Bandwidth (Kilo-Byte sec.) 40 (sec.) 80 120 160 Adaptive Version (Kilo-Byte/Sec.) 1 462.3 612.9 459.9 671 463.5 10 187.7 193.3 509.1 302.1 234.9 100 246.4 466.7 296.2 371.6 387.1 1000 240.4 298.8 307.7 478 399.9 Accuracy comparison Network Bandwidth (Kilo-Byte/Sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) Adaptive Version (Kilo-Byte/Sec.) 1 0.891 0.962 0.981 0.987 0.986 10 0.896 0.963 0.983 0.992 100 0.887 0.957 0.979 0.988 0.974 1000 0.879 0.989
24
Self-Adaptation with Different Processing Requirements
25
Self-Adaptation with Different Data Generation Rates
26
Related work dQUOB (dynamic QUery Objects) DataCutter
A lot of work on adaptation Adaptation for real-time processing of streams Streaming database systems Support DB Operations, usually centralized
27
Conclusion High-volume, distributed, stream processing is in our future Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints
28
On-going and Future Work
Continuous (dynamic) resource discovery & monitoring Resource Reallocation (self-mobility) Larger application (time-varying visualization) Generalize Adaptation Algorithm More evaluation studies
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.