Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resource Allocation in a Middleware for Streaming Data

Similar presentations


Presentation on theme: "Resource Allocation in a Middleware for Streaming Data"— Presentation transcript:

1 Resource Allocation in a Middleware for Streaming Data
Liang Chen Gagan Agrawal Ohio State University

2 Outline Context of the Work Problem Definition and Complexity
Algorithm and Examples Evaluation Summary and Future Work

3 Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….

4 Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources

5 Motivating Application
Network Fault Management System Switch Network Network Fault Management System X

6 Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing

7 System Architecture

8 Problem Definition The resource allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed

9 Examples Examples of deployment configurations

10 Problem Complexity Challenge Given an application having
m stages n data sources k available computing nodes for placement of stages’ instances The number of possible configurations is: F(2, n, k) = 1 F(m, n, k) = (sn(i) *F(m-1, i, k-i)*Pki) F(3, n, k) >= nn

11 Our Approach O(nk2) v.s. (nn) Goal: Assumptions:
Determine a sub-optimal configuration Assumptions: Network bandwidths are the critical resources rather than CPU capabilities Bandwidths of networks inside a cluster are larger than bandwidths of network connecting clusters We know the topology of networks, the list of available clusters, and resource information of the clusters We know where data sources and destination are

12 Algorithm Observation Three steps:
The data arrival rates are very high at the first one or two stages The arrival rates decrease significantly at the following stages Prim’s algorithm to construct a Minimum Spanning Tree Three steps: Create a key path corresponding to each data source (Prim Algorithm, MST) Merge the key paths to create a layout tree Map each node in a layout tree to a computing node

13 Algorithm Network topology

14 Algorithm Make each data source a starting node to apply the Minimum spinning tree algorithm(Prim) to the graph

15 Algorithm

16 Key path

17 Algorithm

18

19 Algorithm

20 Algorithm

21 Algorithm Issues Optimization
When the number of tree nodes along a key path is larger than the number of stages transporters are automatically added When the number of tree nodes along a path is larger than the number of stages Deploy additional stages at the parent node node of the data source Optimization

22 Algorithm --- optimization

23 Evaluation The Deployment configuration created and optimized by our algorithm v.s. the best one manually chosen v.s. all possible configurations

24 Evaluation Environment Network topologies were randomly generated
The distributed counting sampling application manual-config: a configuration determined manually auto-config:a configuration generated by the algorithm opt-config:a configuration optimized by removing unnecessary transporters

25 Evaluation --- experiment 1

26 Evaluation --- experiment 2

27 Evaluation --- experiment 3

28 Future work We just consider network bandwidth as bottleneck
Dynamic resource allocation Dependences of various stages Evaluation with more applications

29 Related Work Condor’s matchmaking Realtor dQUOB
Work from database projects

30 Summary


Download ppt "Resource Allocation in a Middleware for Streaming Data"

Similar presentations


Ads by Google