Download presentation
Presentation is loading. Please wait.
Published byMichelle Marchand Modified over 10 years ago
1
Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das Curtis Yu Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang
4
Data processing Network
5
Schedule computation
6
Schedule communication 33% of average job running time
7
FlowComb network management framework for Big Data processing 1. what is the traffic demand? 2. which path to choose? 3. how to change the path?
8
Demand prediction Use application semantics information to effectively and transparently infer network transfers (possibly before they start)
9
Demand prediction Agents on Hadoop nodes analyze Hadoop logs, query nodes and predict data transfers. Hadoop node Parses TaskTracker logs to identify reducers and size of map output Parses JobTracker logs to identify finished mappers Agent
10
Flow scheduling Reroute flows on paths with sufficient available bandwidth
11
Flow scheduling Where?Centralized decision engine Which flows? FIFO Reroute? If congestion on default path Which path? First with available bandwidth
12
Flow control Use OpenFlow to install new forwarding rules in the network and enforce the new paths
13
System Architecture Master Slaves 1 1 Hadoop Cluster PFS Analyze Hadoop logs 2 2 Extract flow information 5 5 Install routing rules 3 3 Schedule upcoming flows 4 4 Set up flow paths FlowComb Middleware OpenFlow Controller OpenFlow Controller FlowComb agent NEC Confidential13
14
Experiments
15
Does the network matter? Link capacity (Mbps)Avg. processing time (min) 10039 5053 (x1.3) 2567 (x1.7) 10146 (x3.7) 4 times slower !!!
16
Can FlowComb predict transfers? 28% of transfers detected before they start (and 56% before they end)
17
How quickly can FlowComb change paths? 10%70%20% 60% before transfer midpoint
18
Can FlowComb reduce processing time? 36% faster than Hadoop without FlowComb (and 28% faster than Hadoop with ECMP)
19
FlowComb Network management platform for Big Data processing that is transparent to applications and quick and accurate in detecting their demand uses application semantics to detect data transfers (sometimes before they even start)
20
Testbed
21
OpenFlow network Controller
22
Hadoop sort performance FlowComb baseline Time (s) Avg utilization (MBps)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.