1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.

Slides:

Advertisements

Similar presentations

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions

Technical Architectures

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.

Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

1 The Google File System Reporter: You-Wei Zhang.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

Design patterns. What is a design pattern? Christopher Alexander: «The pattern describes a problem which again and again occurs in the work, as well as.

A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Cluster Reliability Project ISIS Vanderbilt University.

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

Yuhui Chen; Romanovsky, A.; IT Professional Volume 10, Issue 3, May-June 2008 Page(s): Digital Object Identifier /MITP Improving.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou

High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.

App. TypeApp. Name Distributed or Parallel A parallel version of the Gaussian elimination application SAGE (SAIC's Adaptive Grid Eulerian hydrocode) Adaptive.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.

GSAF: A Grid-based Services Transfer Framework Chunyan Miao, Wang Wei, Zhiqi Shen, Tan Tin Wee.

Software Deployment and Mobility. Introduction Deployment is the placing of software on the hardware where it is supposed to run. Redeployment / migration.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

LODManager A framework for rendering multiresolution models in real-time applications J. Gumbau O. Ripollés M. Chover.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.

Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.

1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Efficient Evaluation of XQuery over Streaming Data

Introduction to Load Balancing:

Applying Control Theory to Stream Processing Systems

QianZhu, Liang Chen and Gagan Agrawal

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Supporting Fault-Tolerance in Streaming Grid Applications

Year 2 Updates.

An Adaptive Middleware for Supporting Time-Critical Event Response

Smita Vijayakumar Qian Zhu Gagan Agrawal

Multithreaded Programming

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Resource Allocation in a Middleware for Streaming Data

Resource Allocation for Distributed Streaming Applications

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

2 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation function Self-Adaptation Algorithm –Explain the algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

3 Introduction- Motivation What is data steam –Data stream: data arrive continuously –Enormous volume and must be processed online –Need to be processed in real-time –Data sources could be distributed Data Stream Applications: –Online network intrusion detection –Sensor networks –Network Fault Management system for telecommunication network elements

4 Introduction - Motivation Network Fault Management System (NFM) analyzing distributed alarm streams Switch Network X NFM (Network Fault Management) System

5 Introduction- Motivation Switch Network X Challenges –Data and/or computation intensive –System can be easily overloaded

6 Introduction- Motivation Possible solutions –Grid computing technologies –Automatically adjust processing rate Switch Network

7 Introduction- Motivation The needs for processing distributed data streams –A middleware running in Grid –Allocate Grid resources –Provide self-adaptation function

8 Introduction- Our Approach We implemented a middleware to meet the needs Five contributions of our work 1. Utilizing existing grid standards Liang Chen, K. Reddy and G. Agrawal “GATES: A Grid-Based Middleware for Processing Distributed Data Streams”.HPDC, Providing self-Adaptation functionality Liang Chen and G. Agrawal “Supporting Self-Adaptation in Streaming Data Mining Applications”. IPDPS, Supporting automatic resource allocation Liang Chen and G. Agrawal “A Static Resource Allocation Framework for Grid-Based Streaming Applications”. Concurrency Computation: Practice and Experience Journal, Volume 18, Issue 6, Pages Supporting efficient dynamic migration Liang Chen, Q. Zhu and G. Agrawal “A Supporting Dynamic Migration in Tightly Coupled Grid Applications”. SC Studying adaptive rendering application

9 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation algorithms Self-Adaptation Algorithm –Introduce the algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

10 System Architecture and Design (Architecture) Use Globus Toolkit 3.0, built on OGSA Allows users to specify their algorithms implemented in Java Take care of plugging user-defined algorithms into the system and running them in Grid. Applications need be broken down into a number of pipelined stages

11 ABC Stage A Stage BStage C :GATES services :Stages of an application :Queues between Grid services :Buffers for applications System Architecture and Design (Architecture) Application Stage A Stage B Stage C

12 Public class Second-Stage implements StreamProcessing { … void work(buffer in, buffer out) { … while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data); GATES.putToOutputBuffer (out, Inter-Results); } System Architecture and Design (GATES API Functions)

13 Adaptation Parameter Definition: –A parameter in an application –Changing the parameter’s value can change processing rate of the application, also impact accuracy of the processing Two kinds of adaptation parameters –Performance parameter –Accuracy parameter –Example Sampling rate is an accuracy parameter AccuracyProcessing rateAccuracy Parameter AccuracyProcessing ratePerformance Parameter

14 Pseudo Codes Again with Self-adaptation API Functions Public class Second-Stage implements StreamProcessing { … //Initialize sampling-rate Sampling-rate = (Max+ Min)/2; void work(buffer in, buffer out) { GATES.specifyAccuracyPara(Sampling-rate, Max, Min); while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data, Sampling-rate); GATES.putToOutputBuffer (out, Inter-Results); Sampling-rate = GATES.getSuggestedValue(); }

15 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation function Self-Adaptation Algorithm –Explain the algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

16 View the system as a pipeline To ensure real-time processing, a balanced pipeline is needed When average queue length is too small or too large, queue is under or over loaded. Pipeline is not balanced. Self-Adaptation Algorithm ABC When GATES.getSuggestedValue() is invoked, use the heuristic way to determine a new value for the adaptation parameter according to the measured lengths Measure the average lengths of the queues in the pipeline

17 Self-adaptation Algorithm The way we measure average queue length the heuristic way to adjust an adaptation parameter –Should the adaptation parameter be modified, and if so, in which direction? –How to find a new value (update the value) of the adaptation parameter

18 Self-adaptation Algorithm Should the adaptation parameter be modified, and if so, in which direction? –The answer is related to the pipeline’s load state.

19 Self-adaptation Algorithm Performance Parameter A B C A B C A B C A B C A B C A B C A B C A B C Convergent States Non-Convergent States :Overloaded :Properly-loaded :lightly-loaded A B C ABC

20 Self-adaptation Algorithm Summary of Load States

21 Self-adaptation Algorithm How to determine a new value for the adaptation parameter –Linear update: increase or decrease by a fixed value Hard to find a proper fixed value –Binary search

22 Self-adaptation Algorithm Left Border Current Value Right Border New Value Left Border Current Value Right Border

23

24 Self-adaptation Algorithm Two Data mining applications –Clustream: Clustering data-points in streams

25 Data Mining Applications & System Evaluation Dist-Freq-Counting: finding frequent itemsets from distributed streams

26 Data Mining Applications & System Evaluation

27 Data Mining Applications & System Evaluation

28 Data Mining Applications & System Evaluation

29 Data Mining Applications & System Evaluation

30 Data Mining Applications & System Evaluation

31 Data Mining Applications & System Evaluation

32 Data Mining Applications & System Evaluation

33 Data Mining Applications & System Evaluation

34 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation algorithms Self-Adaptation Algorithm –Explain the algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

35 Resource Allocation Schemes Problem Definition –Grid resource allocation for pipelined applications that process distributed streaming data in real-time is challenging –The scheme consists of two parts –Static Part: allocate resources before an application runs –Dynamic Part: re-allocate resources in run-time –A framework to monitor resources and support dynamic resource allocation

36 Static Allocation Scheme  Static allocation problem: determining a deployment configuration  Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline ? The number of instances of each stage ? How the instances connect to each other ? The node where each instance is placed

37 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation algorithms Improved Self-Adaptation –self-adaptation algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

38 Dynamic Migration- Motivation –Grid resources vary frequently –Dynamically allocating new resources and migrating applications to the new resources improve performance –Checkpointing is a classic method to support dynamic migration A snapshot of system’s running state Transmit to a remote site Restore execution context and restart processes –Disadvantages of checkpointing Platform dependent Inefficient Involve lots of implementation efforts –Our approach is base on Light-weight Summary Structure (LSS)

39 Dynamic Migration-LSS Processing Structure:... while(true) { read_data_from_streams(); process_data(); accumulate_intermediate_results(); reset_auxiliary_structures(); }... Data structure storing summary information is Light-weight summary structure Others are Auxiliary structures

40 Dynamic Migration-LSS Two observations with respect to LSS –The size of LSS is much smaller than that of the total memory –Auxiliary structures are usually reset at the end of each loop. Unnecessary to migrate auxiliary structures when migration occurs at the end of a loop LSS can be used to support dynamic migration –GAETS provides an API function to allocate a block of memory to be LSS –An application stores summary information to LSS –transmit only LSS at the end of the loop to a new node and restore the LSS at the new node

41 Dynamic Migration– supported by GATES

42 Dynamic Migration Advantages of using LSS –Efficient, only LSS is migrated –Not impact the accuracy of processing –Support migration across heterogeneous platforms –Reduce application developers’ efforts on making application capable of migration

43 Dynamic Migration

44 Dynamic Migration Evaluation –Three applications Counting sample –LSS stores intermediate top M frequently occurring numbers Clustream, clustering data points in streams –LSS stores micro-clusters computed at the second stage Dist-Freq-Counting, finding frequent itemsets in distributed streams. –LSS stores unprocessed itemsets

45 Dynamic Migration Memory usage of LSS

46 Dynamic Migration Migration using LSS is efficient

47 Dynamic Migration Migration using LSS is efficient

48 Dynamic Migration Benefits of migration in a dyamic environment

49 Dynamic Migration Memory usage of LSS

50 Dynamic Migration Migration using LSS is efficient

51 Dynamic Migration Migration using LSS is efficient

52 Dynamic Migration Benefits of migration in a dynamic environment

53 Dynamic Migration LSS migration does not impact processing accuracy –The counting sample application was used –Compared the average accuracy of the processing results from the non- migration and the migration versions, they are 97.28% and 97.51% accurate

54 Roadmap Introduction –Motivation –Our approach and challenges System Overview and Initial Evaluation –Introduce system architecture and design –Discuss the self-adaptation algorithms Self-Adaptation Algorithm –Explain the algorithm –Evaluate the system by using two data mining applications Resource Allocation Schemes Dynamic Migration –Motivation –Light-weight summary structure (LSS) –How applications utilize the dynamic migration –Evaluation Adaptive Volume Rendering Related work Conclusion and Future work

55 Adaptive Volume Rendering Motivation – Grid computing is needed Visualization involves large volumes of dataset We focus on streaming volume data Interactively visualizing volume data in real-time is needed –Computationally intensive –Resources consumed –Real-time processing can not be guaranteed The places where data are generated are distributed Typical client-server architecture is not scalable –Network bandwidths of wide-area networks are low –Computing capability of normal desktop is not enough Grid techniques would be a good solution –Divide the procedure into stages organized in a pipeline –Allocate nodes close to data source to pre-process volume data –The size of intermediate results is much smaller

56 Adaptive Volume Rendering Motivation – GATES is desirable –Automatic adaptation is desirable Volume rendering algorithms running on a grid need to be highly adaptive Adaptation usually achieved by manually adjusting adaptation parameters Such manual parameter adaptation is very challenging in a grid environment –Automatic resource allocation is desirable Grid environment is highly changeable –The GATES middleware could fulfill the needs Grid-based Provide the self-adaptation function to applications Automatically allocate Grid resources

57 Overall design –Two pipelined steps – the first step: Build octrees from volume data –Octree is a tree data structure, in which each internal node has up to 8 children –Here, we use an octree to represent multiresolution information for a volume –Procedure to build an octree for a volume is as follows: »Divide volume space into 8 subvolumes and create 8 children nodes »For each subvolume, calculate standard deviation of all voxels in the subvolume, and store the deviation to the corresponding child node »If the deviation is larger than a pre-defined value, divide the subvolume, repeat the above procedure. Otherwise, stop Adaptive Volume Rendering

58 Adaptive Volume Rendering Overall design –Two pipelined steps – the second step: Use an octree and its corresponding volume to render images Provided an error tolerance (or user-defined resolution), use DFS to traverse the octree and stop at the nodes where the deviation is less than the resolution or error tolerance. Project the corresponding 3D-subvolumes to an image

59 Adaptive Volume Rendering

60 Adaptive Volume Rendering Make the rendering self-adaptive –Two adaptation parameters used in the third stage Error Tolerance – performance parameter Image Size – accuracy parameter –Only one adaptation parameter can be adjusted by GATES. So we fix one and adjust the other

61

62 Adaptive Volume Rendering Experiment 1

63 Adaptive Volume Rendering 100kbps 150kbps 200kbps250kbps

64 Adaptive Volume Rendering Experiment 2

65 Adaptive Volume Rendering Experiment 3: compare the performance of two implementations –Java-imple –C-imple

66 Adaptive Volume Rendering Experiment 3: compare the performance of two implementations

67 Related Work Middleware for data stream processing –Data cutter, Stampede –Differences: in a cluster, no self-adaptation, no specifically for real-time processing Continuous query systems –STREAM, dQUOB, TelegraphCQ, NiagraCQ –Differences: centralized, no adaptation supports Distributed continuous query systems –Aurora*, Medusa, Borealis –Differences: continuous queries, not in Grid environment In- Network aggregation in sensor network Stream-based overlay networks

68 Related work Grid Resource Allocation –Condor, Realtor, ACDS –Main Differences: our work focus on Grid resource allocation for workflow applications Adaptation Through a Middleware –Cheng et al. ’ s adaptation framework, SWiFT, Conductor, DART, ROAM –Main Differences: our work focus on general supports for adaptation in run-time Dynamic Migration in Grid Environment –Condor, XCATS, Charm++ –Main Differences: our work use LSS

69 Conclusion Grid computing could be an effective solution for distributed data stream processing GATES –Distributed processing –Exploit grid web services –Self-adaptation to meet the real-time constraints –Grid resource allocation schemes and dynamic migration

70 Future Work CPU cycles and Network bandwidths –Currently, only network bandwidth is considered a constraint when scheduling Grid resources –Few related work proposes a metric to integrate both for pipelined appliations Port GATES from GT3 to GT4 Support fault-tolerance and high availability Further relieve programming burdens from application develops –Specify meta-data Support distributed continuous queries –Specify a set of query operators

71 Acknowledgements My advisor, Prof. Agrawal, proposed the idea of implementing the middleware, and gave lots advices for the directions of my research Prof. Shen gave lots of helps on implementing the render application, and provided lots of write-up for the chapter 7

72 Questions? No more questions? Thanks!