Resource Allocation in a Middleware for Streaming Data

Resource Allocation in a Middleware for Streaming Data
Liang Chen Gagan Agrawal Ohio State University

Outline Context of the Work Problem Definition and Complexity
Algorithm and Examples Evaluation Summary and Future Work

Streaming Data Model Continuous data arrival and processing
Emerging model for data processing Sources that produce data continuously: sensors, long running simulations WAN bandwidths growing faster than disk bandwidths Active topic in many computer science communities Databases Data Mining Networking ….

Summary/Limitations of Current Work
Focus on centralized processing of stream from a single source (databases, data mining) communication only (networking) Many applications involve distributed processing of streams streams from multiple sources

Motivating Application
Network Fault Management System Switch Network Network Fault Management System X

Need for a Grid-Based Stream Processing Middleware
Application developers interested in data stream processing Will like to have abstracted Grid standards and interfaces Adaptation function Will like to focus on algorithms only GATES is a middleware for Grid-based Self-adapting Data Stream Processing

System Architecture

Problem Definition The resource allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed

Examples Examples of deployment configurations

Problem Complexity Challenge Given an application having
m stages n data sources k available computing nodes for placement of stages’ instances The number of possible configurations is: F(2, n, k) = 1 F(m, n, k) = (sn(i) *F(m-1, i, k-i)*Pki) F(3, n, k) >= nn

Our Approach O(nk2) v.s. (nn) Goal: Assumptions:
Determine a sub-optimal configuration Assumptions: Network bandwidths are the critical resources rather than CPU capabilities Bandwidths of networks inside a cluster are larger than bandwidths of network connecting clusters We know the topology of networks, the list of available clusters, and resource information of the clusters We know where data sources and destination are

Algorithm Observation Three steps:
The data arrival rates are very high at the first one or two stages The arrival rates decrease significantly at the following stages Prim’s algorithm to construct a Minimum Spanning Tree Three steps: Create a key path corresponding to each data source (Prim Algorithm, MST) Merge the key paths to create a layout tree Map each node in a layout tree to a computing node

Algorithm Network topology

Algorithm Make each data source a starting node to apply the Minimum spinning tree algorithm(Prim) to the graph

Algorithm

Key path

Algorithm

Algorithm Issues Optimization
When the number of tree nodes along a key path is larger than the number of stages transporters are automatically added When the number of tree nodes along a path is larger than the number of stages Deploy additional stages at the parent node node of the data source Optimization

Algorithm --- optimization

Evaluation The Deployment configuration created and optimized by our algorithm v.s. the best one manually chosen v.s. all possible configurations

Evaluation Environment Network topologies were randomly generated
The distributed counting sampling application manual-config: a configuration determined manually auto-config:a configuration generated by the algorithm opt-config:a configuration optimized by removing unnecessary transporters

Evaluation --- experiment 1

Future work We just consider network bandwidth as bottleneck
Dynamic resource allocation Dependences of various stages Evaluation with more applications

Related Work Condor’s matchmaking Realtor dQUOB
Work from database projects

Summary

Resource Allocation in a Middleware for Streaming Data

Similar presentations

Presentation on theme: "Resource Allocation in a Middleware for Streaming Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Resource Allocation in a Middleware for Streaming Data

Similar presentations

Presentation on theme: "Resource Allocation in a Middleware for Streaming Data"— Presentation transcript:

Similar presentations

About project

Feedback