Programming Your Network at Run- Time for Big Data Applications Guohui Wang, TS Eugene Ng, Anees Shaikh Presented by Jon Logan.

Slides:

Advertisements

Similar presentations

February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.

Advertisements

Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,

Data Communications and Networking

Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.

U of Houston – Clear Lake

VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

A Centralized Scheduling Algorithm based on Multi-path Routing in WiMax Mesh Network Yang Cao, Zhimin Liu and Yi Yang International Conference on Wireless.

Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.

Jaringan Komputer Lanjut Packet Switching Network.

CSCI 465 D ata Communications and Networks Lecture 20 Martin van Bommel CSCI 465 Data Communications & Networks 1.

Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,

OpenFlow-Based Server Load Balancing GoneWild

Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.

1 Switching and Forwarding Bridges and Extended LANs.

A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:

In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.

1 Switching and Forwarding Bridges and Extended LANs.

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Network Topologies.

Practical TDMA for Datacenter Ethernet

Basic Concepts of Computer Networks

T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 8: Bridging Slides used with permissions.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.

LAN Overview (part 2) CSE 3213 Fall April 2017.

Exploring VoD in P2P Swarming Systems By Siddhartha Annapureddy, Saikat Guha, Christos Gkantsidis, Dinan Gunawardena, Pablo Rodriguez Presented by Svetlana.

OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.

Network Aware Resource Allocation in Distributed Clouds.

Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌指導老師：王國禎教授.

David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 4 Switching Concepts.

T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Scaling Broadcast Ethernet Some slides used with.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Architectures and Algorithms for Future Wireless Local Area Networks  1 Chapter Architectures and Algorithms for Future Wireless Local Area.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

IP Routing Principles. Network-Layer Protocol Operations Each router provides network layer (routing) services X Y A B C Application Presentation Session.

Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Energy-Efficient Randomized Switching for Maximizing Lifetime in Tree- Based Wireless Sensor Networks Sk Kajal Arefin Imon, Adnan Khan, Mario Di Francesco,

C-Through: Part-time Optics in Data centers Aditi Bose, Sarah Alsulaiman.

1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.

Software-defined network(SDN)

Routing Semester 2, Chapter 11. Routing Routing Basics Distance Vector Routing Link-State Routing Comparisons of Routing Protocols.

Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)

Network Topology and LAN Technologies

University of Maryland College Park

The DPIaaS Controller Prototype

Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes

Advanced Computer Networks

Switching and Forwarding Bridges and Extended LANs

Chapter 2 Scheduling.

Intra-Domain Routing Jacob Strauss September 14, 2006.

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

NTHU CS5421 Cloud Computing

CPU SCHEDULING.

Programmable Networks

2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter：Hung-Yen Wang Authors：Peng Wang, George Trimponias, Hong Xu,

Bridges Neil Tang 10/10/2008 CS440 Computer Networks.

Towards Predictable Datacenter Networks

Chapter 2 from ``Introduction to Parallel Computing'',

Presentation transcript:

Programming Your Network at Run- Time for Big Data Applications Guohui Wang, TS Eugene Ng, Anees Shaikh Presented by Jon Logan

Objectives  Why Change Dynamically?   Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation  Scheduling  Patterns  Constructing the Network  Implementation & Overhead  Future Work  Conclusions  Shortcomings & Discussion

Why Change Dynamically?  With advances of Software Defined Networks (SDN), we are able to dynamically change our network structure  Big Data applications often involve large amounts of data being transferred from one node to another  If you’re not careful, the network can be a bottleneck  Essentially, we want to tailor the network layout to meet current/imminently executing application demands  Throughout the paper and this presentation, Hadoop is used as a typical “Big Data” application

Hadoop Essentials Image source:

Objectives  Why Change Dynamically?  Hadoop Essentials  How this is accomplished   SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation  Scheduling  Patterns  Constructing the Network  Implementation & Overhead  Future Work  Conclusions  Shortcomings & Discussion

How is this accomplished?  The paper is based on the idea of optical switches  Optical switches allow for the fast changing of fibre-optic links  They cite the transition time in the order of 10s of ms  Assume a hybrid electrical-optical switches  ToR switches are connected to two aggregation networks  One of them is over Ethernet (SLOW)  One of them is connected to a MEMS-optical switch (FAST)  Each ToR switch is connected to multiple optical uplinks  Typically 4-6 uplinks  Network is controlled through a SDN controller  Manages physical connectivity between ToR switches  Manages the forwarding at ToR switches using OpenFlow rules

SDN Master Interaction  Hadoop jobs are coordinated through a master node  Is responsible for scheduling, managing requests, placement of nodes, etc.  All switches are controlled through a SDN controller  The paper proposes interaction between the master of the job and the SDN controller

SDN Master Interaction  Proposes that the SDN Controller  Accepts traffic demand matrices from application controllers  Describes the volume and policy requirements for traffic exchanged between different racks  Issues a network configuration command to the topology accordingly  The application master can also use topology information provided by the SDN for more effective job scheduling/placement  This means that the application controller must be able to predict network usage

Objectives  Why Change Dynamically?  Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns   Why Application Aware?  Traffic Estimation  Scheduling  Patterns  Constructing the Network  Implementation & Overhead  Future Work  Conclusions  Shortcomings & Discussion

Traffic Patterns of Big Data Traffic can be categorized into three categories:  Bulk Transfer  Data Aggregation (Partitioning)  Control Messages

Control Traffic  Is typically latency sensitive, but not large volumes of data  Can simply be handled by the Ethernet network  In the paper’s “implementation”, control messages are sent over the packet-switched (Ethernet) network using the default routes

Data Aggregation / Partitioning  Data must be partitioned or aggregated between one server and a large number of other servers  Ex. Mapper output must be aggregated to (potentially) all reducers  In parallel database systems, most operations require merging/splitting of data from multiple tables  Data aggregation requires high bandwidth to exchange large volumes of data between large numbers of servers  If the network is oversubscribed, aggregation may be the bottleneck  Is the main goal that the paper ties to address

Why Application Aware?  Current approaches for routing optical circuits rely on network level statistics to estimate network demand  It is difficult to estimate real application traffic based solely on this information  Without more precise information, circuits may be configured between the wrong locations  “Circuit flapping” may also occur from repeated corrections

An Example Configuration  An 8-1 aggregation  Ex. 8 mappers outputting to 1 reducer  Each rack has a ToR switch with 3 optical links  Each optical link is capable of 10Gbps  Minimum circuit reconfiguration interval is set to 1 second  Residual Ethernet bandwidth is limited to 100Mbps  Each node wants to transfer 200MB of data to the aggregation node

A Naïve Approach  This task can be implemented in 3 rounds  In each round, 3 racks are connected directly to the aggregation rack  Repeat 3 times  This will require up to 3.16 seconds (The paper says 2.16 seconds)  If one rack is not configured to use the optical link correctly, it may have to use Ethernet, and take up to 16 seconds!

A Better Approach  If we “chain” tasks together, as we know the application demands, we could do this same transfer in just 1.48 seconds (the paper states 480ms), only requiring 1 round of switching

Objectives  Why Change Dynamically?  Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation   Scheduling  Patterns  Constructing the Network  Implementation & Overhead  Future Work  Conclusions  Shortcomings & Discussion

Traffic Estimation  In order to know how to allocate resources, we need to estimate demand  This is left up to the master node (In the case of Hadoop, the job tracker)  Must report a traffic demand matrix to the controller  The job tracker has information about the placement of mappers and reducers on a per-job basis  Computing the source and destination racks is easy  Computing the demand, not so easy

Estimating demand  The paper makes the assumption that more input data = more output data  This is not necessarily true  Ex. If your input is a list of URLs, a longer URL does not necessarily mean more data!  By looking at intermediate data, you can predict shuffling demand of map tasks before they complete  Glosses over the fact that mappers start transferring data before completing  Essentially, tries to state that more input data means more shuffle data

Hadoop Job Scheduling  Is currently FIFO (plus priorities)  Data locality is considered in the placement of map tasks to reduce network traffic  Reducers are schedule randomly  Hadoop could potentially change its scheduling based on real time network topology

Bin Packing Placement  Rack-based bin packing placement for reduce tasks  Attempts to minimize the number of racks utilized  Reduces the number of ToR switches required to be reconfigured  The paper is not clear how they actually accomplish this, if it is based on network demand or not.  Hadoop has a concept of “slots” for reducers, somewhat negating any real “bin packing” problem, if it were not for network usage  This would also require machines to be able to handle the huge amount of bandwidth that could be sent to them (up to 30Gbps in their scenario), in order to make it worthwhile

Batch Processing  Would essentially process entire batches of jobs together, within a time interval T  The job tracker selects those with the greatest estimated volume and requests the SDN to configure the network to best handle these jobs  Is not clear how you estimate this! Previous discussion always discussed talking about already running jobs  Tasks in earlier batches have higher priority  Helps aggregate traffic from multiple jobs to create long duration traffic that is suitable for optical paths  Can be implemented as a “simple extension” to the Hadoop job scheduling  In reality, it wouldn’t be “simple” by any means

Objectives  Why Change Dynamically?   Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation  Scheduling  Patterns   Constructing the Network  Implementation & Overhead  Future Work  Conclusions  Shortcomings & Discussion

Topology and Routing for Aggregation Patterns  The major issue with Hadoop jobs is intermediate data between mappers and reducers  Is essentially a N-to-M shuffling, where N is the number of mappers, and M is the number of reducers

Single Aggregation Pattern  Is the case when multiple reducers need to output to a single mapper  N-to-1 aggregation  As discussed earlier, we can construct a 2-hop aggregation tree in this case (ex. 8-to-1)  We can place racks with higher traffic demand “closer” to the aggregator in the tree  Ex. Make sure mappers 5, 1, 6 have the highest demand to reduce the number of hops

Data shuffling pattern  Is essentially an N-to-M aggregation  Ex. 8-to-4 shuffling  The paper relies on Hypercube or Torus Topology to achieve this  We want to place racks with high demand close to each other  Reduces amount of multi-hop traffic  Constructing an optimal Torus topology is difficult due to the large search space  A greedy heuristic algorithm can be used  Places racks into a 2-D coordinate space and connects each row and each column into rings

Constructing the Torus Topology

Constructing the Network  A routing scheme well suited for shuffling traffic is a per- destination spanning tree  Build a spanning tree rooted at each aggregator rack  Traffic routed to the aggregator rack will be routed over this tree  When an optical link is selected, increase its weight to favor other links for other spanning trees  This allows us to exploit all available links, and to achieve better load balancing and multi-pathing among multiple spanning trees

Partially Overlapping Aggregations  Some aggregations may overlap source or destination racks  Building a Torus network would have poor utilization  S 1 ’ and S 3 ’ are essentially N-1 aggregations  S 2 ’ is essentially an N-2 aggregation  Can use previously discussed configuration algorithms to schedule the network  Depending on available links, we could either schedule them concurrently or consecutively  Allows for path sharing among aggregations, and improving utilization of circuits

Objectives  Why Change Dynamically?  Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation  Scheduling  Patterns  Constructing the Network  Implementation & Overhead   Future Work  Conclusions  Shortcomings & Discussion

Implementation and Overhead  To implement, we need to use OpenFlow rules on ToR switches and issue commands to reconfigure optical switches  Commercial optical switches can switch in less than 10ms  Run-time routing configuration over a dynamic network requires rapid and frequent table updates on potentially large number of switches  Routing configuration has to be done within a short period of time  Requires the SDN to be scalable and responsiveness  We want to minimize the number of rules required  Reduces table size (which is limited)  Reduces delays in reconfiguring the network

Implementation  We can use the VLAN field on packets to tag the destination rack  Each rack is assigned to one VLAN ID  Packets sent to a destination rack will all have the same VLAN ID  Packet tagging could also be implemented at the server kernel level or using hypervisor virtual switches  Servers can look up the VLAN tag in a repository based on the destination  We would need at most N rules on each switch, where N is the number of racks  Most MR jobs last for several minutes (paper cites 10s of seconds or more)  Largest MR jobs use hundreds of servers  Equals tens of racks (at servers per rack)  Commercial switches can install more than 700 rules per second  They estimate 10s of ms to reconfigure the network for a typical MR job

Implementation  We need to be careful when rerouting multiple switches  Need to avoid potential transient errors or forwarding loops  Proposed solutions for this require a significant amount of extra rules on each switch  Unknown amount of delay this approach adds to achieve a consistent state during topology updates

Objectives  Why Change Dynamically?  Hadoop Essentials  How this is accomplished  SDN Master Interaction  Traffic Patterns  Why Application Aware?  Traffic Estimation  Scheduling  Patterns  Constructing the Network  Implementation & Overhead  Future Work   Conclusions  Shortcomings & Discussion

Future Work  Fault tolerance, Fairness, and Priority  Fairness and priority of network topology among different applications  Must be handled by the SDN  Traffic engineering  Could potentially allow rerouting over multiple paths, even if optical switches are not available

Conclusion  The paper claims the analysis has great promise of integrated network control  Although the discussion primarily relied on Hadoop, most Big Data applications have similar traffic patterns  Aggregation patterns can be applied to those as well  Study serves as a “step towards tight and dynamic interaction between applications and networks” using SDN

Shortcomings / Discussion  This relies heavily on the ability to predict application usage  Is not as simple as they portray it to be  More input is not necessarily more output!  Also seems to lack any real evaluation of their proposal  No actual data; no data even realistically modeled  Assumes a 100Mbps Ethernet, which seems low (1Gbps is the bare minimum in modern day applications)  Assumes that mappers would not have consistent load  If they go with their assumption that more input = more output, and it scales linearly, this is not true!  Mappers are all (except for the last one) generally given roughly equal chunks of data (unless you have a bizarre input split)  Therefore, Mappers should have consistent network load (if their assumptions are valid)