GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Slides:



Advertisements
Similar presentations
Cyberinfrastructure for Coastal Forecasting and Change Analysis
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
Tracking Irregularly Moving Objects based on Alert-enabling Sensor Model in Sensor Networks 1 Chao-Chun Chen & 2 Yu-Chi Chung Dept. of Information Management.
1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.
Research Overview Gagan Agrawal Associate Professor.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal The Ohio State University.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Outline Introduction. Changes made to the Tycho design from last time (June 2005). Example Tycho setup. Tycho benchmark motivations and methodology. Some.
Hiba Tariq School of Engineering
LEAD-VGrADS Day 1 Notes.
GWE Core Grid Wizard Enterprise (
QianZhu, Liang Chen and Gagan Agrawal
Author: Daniel Guija Alcaraz
Grid Computing.
Grid Portal Services IeSE (the Integrated e-Science Environment)
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Selectivity Estimation of Big Spatial Data
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Supporting Fault-Tolerance in Streaming Grid Applications
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Data Stream Management System (DSMS)
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
Load Shedding in Stream Databases – A Control-Based Approach
Presented by LINGLING MENG( ), XUN XU( )
Department of Computer Science Northwestern University
An Adaptive Middleware for Supporting Time-Critical Event Response
Mobile Agents M. L. Liu.
Smita Vijayakumar Qian Zhu Gagan Agrawal
Jigar.B.Katariya (08291A0531) E.Mahesh (08291A0542)
Congestion Control in SDN-Enabled Networks
Final Review Bina Ramamurthy 1/13/2019 BR.
QuaSAQ: Enabling End-to-End QoS for Distributed Multimedia Databases
Middleware, Services, etc.
Resource Allocation in a Middleware for Streaming Data
Automated Analysis and Code Generation for Domain-Specific Models
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
Overview of Workflows: Why Use Them?
Resource Allocation for Distributed Streaming Applications
Quality-aware Middleware
Congestion Control in SDN-Enabled Networks
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
Topology Optimization through Computer Aided Software
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

GATES: A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen, Kolagatla Reddy, Gagan Agrawal Department of Computer Science and Engineering The Ohio State University {chenlia, reddyk, agrawal}@cis.ohio-state.edu

Streaming Data Model Continuous data arrival and processing Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….

Summary/Limitations of Current Work Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources

Motivating Application Network Fault Management System Switch Network Network Fault Management System X

Motivating Application (2) Computer Vision Based Surveillance

Motivating Application (3) Tatabe et al. CCGRID 2002

Features of Distributed Streaming Processing Applications Data sources could be distributed Over a WAN Continuous data arrival Enormous volume Probably can’t communicate it all to one site Results from analysis may be desired at multiple sites Real-time constraints A real-time, high-throughput, distributed processing problem

Motivation Challenges & Possible Solutions Challenge1: Data, Communication, and/or Compute- Intensive Switch Network X

Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network

Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired

Need for a Grid-Based Stream Processing Middleware Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing

Roadmap GATES Architecture and API Adaptation algorithm Evaluation Related work Conclusion On-going & Future work

GATES Grid-based AdapTive Execution on Streams Targets (distributed) processing of (distributed) data streams Built on OGSA model Self adaptation to meet real-time constraint on processing

GATES and Grid-Standards Internet Globus-OGSA GATES Applications Web service

Using GATES Break down the analysis into several sub-tasks that make a pipeline Implement each sub-task in Java Write an XML configuration file for the sub-tasks to be automatically deployed. Launch the application by running a java program (StreamClient.class) provided by the GATES

System Architecture

Adaptation for Real-time Processing Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) Sampling Rate Size of summary structure Application developers can expose these parameters and a range of values

API for Adaptation Public class Sampling-Stage implements StreamProcessing{ … void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

Self-Adaptation Approach Stage A Stage B Stage C A B C :Buffers :Queues :Grid services of the GATES :Stages of an application

Query Theory and Heuristic algorithm Adaptation algorithm Goal Issues No specific information about applications Filtering out short-term bursts and sensitive to long-term behaviors Quickly find converged values of adjustment parameters Basic idea A B C Query Theory and Heuristic algorithm

Adaptation algorithm Equations

Evaluation Two applications Three experiments were conducted A counting sample application A computational steering application Three experiments were conducted The First one was running counting sample applications on the GATES the other two were running computational steering applications

The Experiment One: Non-adaptive Vs. Adaptive version Performance comparison Network Bandwidth (Kilo-Byte sec.) 40 (sec.) 80 120 160 Adaptive Version (Kilo-Byte/Sec.) 1 462.3 612.9 459.9 671 463.5 10 187.7 193.3 509.1 302.1 234.9 100 246.4 466.7 296.2 371.6 387.1 1000 240.4 298.8 307.7 478 399.9 Accuracy comparison Network Bandwidth (Kilo-Byte/Sec.) 40 (sec.) 80 (sec.) 120 (sec.) 160 (sec.) Adaptive Version (Kilo-Byte/Sec.) 1 0.891 0.962 0.981 0.987 0.986 10 0.896 0.963 0.983 0.992 100 0.887 0.957 0.979 0.988 0.974 1000 0.879 0.989

Self-Adaptation with Different Processing Requirements

Self-Adaptation with Different Data Generation Rates

Related work dQUOB (dynamic QUery Objects) DataCutter A lot of work on adaptation Adaptation for real-time processing of streams Streaming database systems Support DB Operations, usually centralized

Conclusion High-volume, distributed, stream processing is in our future Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints

On-going and Future Work Continuous (dynamic) resource discovery & monitoring Resource Reallocation (self-mobility) Larger application (time-varying visualization) Generalize Adaptation Algorithm More evaluation studies