Download presentation
Presentation is loading. Please wait.
Published byBela Patel Modified over 6 years ago
1
Resource Allocation in a Middleware for Streaming Data
Liang Chen Gagan Agrawal Ohio State University
2
Outline Context of the Work Problem Definition and Complexity
Algorithm and Examples Evaluation Summary and Future Work
3
Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….
4
Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources
5
Motivating Application
Network Fault Management System Switch Network Network Fault Management System X
6
Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing
7
System Architecture
8
Problem Definition The resource allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed
9
Examples Examples of deployment configurations
10
Problem Complexity Challenge Given an application having
m stages n data sources k available computing nodes for placement of stages’ instances The number of possible configurations is: F(2, n, k) = 1 F(m, n, k) = (sn(i) *F(m-1, i, k-i)*Pki) F(3, n, k) >= nn
11
Our Approach O(nk2) v.s. (nn) Goal: Assumptions:
Determine a sub-optimal configuration Assumptions: Network bandwidths are the critical resources rather than CPU capabilities Bandwidths of networks inside a cluster are larger than bandwidths of network connecting clusters We know the topology of networks, the list of available clusters, and resource information of the clusters We know where data sources and destination are
12
Algorithm Observation Three steps:
The data arrival rates are very high at the first one or two stages The arrival rates decrease significantly at the following stages Prim’s algorithm to construct a Minimum Spanning Tree Three steps: Create a key path corresponding to each data source (Prim Algorithm, MST) Merge the key paths to create a layout tree Map each node in a layout tree to a computing node
13
Algorithm Network topology
14
Algorithm Make each data source a starting node to apply the Minimum spinning tree algorithm(Prim) to the graph
15
Algorithm
16
Key path
17
Algorithm
19
Algorithm
20
Algorithm
21
Algorithm Issues Optimization
When the number of tree nodes along a key path is larger than the number of stages transporters are automatically added When the number of tree nodes along a path is larger than the number of stages Deploy additional stages at the parent node node of the data source Optimization
22
Algorithm --- optimization
23
Evaluation The Deployment configuration created and optimized by our algorithm v.s. the best one manually chosen v.s. all possible configurations
24
Evaluation Environment Network topologies were randomly generated
The distributed counting sampling application manual-config: a configuration determined manually auto-config:a configuration generated by the algorithm opt-config:a configuration optimized by removing unnecessary transporters
25
Evaluation --- experiment 1
26
Evaluation --- experiment 2
27
Evaluation --- experiment 3
28
Future work We just consider network bandwidth as bottleneck
Dynamic resource allocation Dependences of various stages Evaluation with more applications
29
Related Work Condor’s matchmaking Realtor dQUOB
Work from database projects
30
Summary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.