Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das Curtis Yu Cristian Lumezanu Yueping Zhang Vishal Singh Guofei.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
SDN + Storage.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Developing a MapReduce Application – packet dissection.
An Analytics Approach to Traffic Analysis in Network Virtualization Hui Zhang, Junghwan Rhee, Nipun Arora, Qiang Xu, Cristian Lumezanu, Guofei Jiang
An OpenFlow based virtual network environment for Pragma Cloud virtual clusters Kohei Ichikawa, Taiki Tada, Susumu Date, Shinji Shimojo (Osaka U.), Yoshio.
Path Optimization in Computer Networks Roman Ciloci.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Hadoop: The Definitive Guide Chap. 2 MapReduce
The Impact of False Sharing on Shared Congestion Management Aditya Akella and Srinivasan Seshan (Computer Science Department, Carnegie Mellon University)
Sharing is Caring In Datacenter Networks A cloud computing discussion led by Justine.
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER: Integrating Routing and Scheduling for Coflow-aware Data Center.
1 Computer Networks Switching Technologies. 2 Switched Network Long distance transmission typically done over a network of switched nodes End devices.
FlowSense: Monitoring Network Utilization with Zero Measurement Cost Curtis Yu 1, Cristian Lumezanu 2, Yueping Zhang 2, Vishal Singh 2, Guofei Jiang 2,
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Cross-Layer Scheduling in Cloud Computing Systems Authors: Hilfi Alkaff, Indranil Gupta.
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
Rice01, slide 1 Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
L13. Shortest path routing D. Moltchanov, TUT, Spring 2008 D. Moltchanov, TUT, Spring 2014.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌 指導老師:王國禎 教授.
Measuring Control Plane Latency in SDN-enabled Switches Keqiang He, Junaid Khalid, Aaron Gember-Jacobson, Sourav Das, Chaithan Prakash, Aditya Akella,
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
A Hierarchical MapReduce Framework Yuan Luo and Beth Plale School of Informatics and Computing, Indiana University Data To Insight Center, Indiana University.
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NTU Cloud 2010/05/30. System Diagram Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
HELSINKI UNIVERSITY OF TECHNOLOGY Visa Holopainen 1/18.
Multi-layer Network Virtualization with Resource Reservation based on SDN Nguyen Huu Thanh Tran Manh Nam Truong Thu Huong School of Electronics and Telecommunications.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
Next Generation of Apache Hadoop MapReduce Owen
Software-defined network(SDN)
Part III BigData Analysis Tools (YARN) Yuan Xue
Monitoring Windows Server 2012
HybNET: Network Manager for a Hybrid Network Infrastructure
University of Maryland College Park
An Analytics Approach to Traffic Analysis in Network Virtualization
Tutorial: Big Data Algorithms and Applications Under Hadoop
Chapter 10 Data Analytics for IoT
Hadoop MapReduce Framework
Managing Data Transfer in Computer Clusters with Orchestra
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
DDoS Attack Detection under SDN Context
Cloud Computing MapReduce in Heterogeneous Environments
Presentation transcript:

Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das Curtis Yu Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang

Data processing Network

Schedule computation

Schedule communication 33% of average job running time

FlowComb network management framework for Big Data processing  1. what is the traffic demand?  2. which path to choose? 3. how to change the path?

Demand prediction Use application semantics information to effectively and transparently infer network transfers (possibly before they start)

Demand prediction Agents on Hadoop nodes analyze Hadoop logs, query nodes and predict data transfers. Hadoop node Parses TaskTracker logs to identify reducers and size of map output Parses JobTracker logs to identify finished mappers Agent

Flow scheduling Reroute flows on paths with sufficient available bandwidth

Flow scheduling Where?Centralized decision engine Which flows? FIFO Reroute? If congestion on default path Which path? First with available bandwidth

Flow control Use OpenFlow to install new forwarding rules in the network and enforce the new paths

System Architecture Master Slaves 1 1 Hadoop Cluster PFS Analyze Hadoop logs 2 2 Extract flow information 5 5 Install routing rules 3 3 Schedule upcoming flows 4 4 Set up flow paths FlowComb Middleware OpenFlow Controller OpenFlow Controller FlowComb agent NEC Confidential13

Experiments

Does the network matter? Link capacity (Mbps)Avg. processing time (min) (x1.3) 2567 (x1.7) (x3.7) 4 times slower !!!

Can FlowComb predict transfers? 28% of transfers detected before they start (and 56% before they end)

How quickly can FlowComb change paths? 10%70%20% 60% before transfer midpoint

Can FlowComb reduce processing time? 36% faster than Hadoop without FlowComb (and 28% faster than Hadoop with ECMP)

FlowComb Network management platform for Big Data processing that is transparent to applications and quick and accurate in detecting their demand uses application semantics to detect data transfers (sometimes before they even start)

Testbed

OpenFlow network Controller

Hadoop sort performance FlowComb baseline Time (s) Avg utilization (MBps)