Coflow A Networking Abstraction For Cluster Applications UC Berkeley Mosharaf Chowdhury Ion Stoica.

Slides:



Advertisements
Similar presentations
Orchestra Managing Data Transfers in Computer Clusters Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, Ion Stoica UC Berkeley.
Advertisements

Finishing Flows Quickly with Preemptive Scheduling
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
Varys Efficient Coflow Scheduling Mosharaf Chowdhury,
U of Houston – Clear Lake
MapReduce.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Nanxi Kang Princeton University
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Spark: Cluster Computing with Working Sets
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Chapter 1 and 2 Computer System and Operating System Overview
CS533 - Concepts of Operating Systems
Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Offense Kai Chen Shih-Chi Chen.
The Power of Choice in Data-Aware Cluster Scheduling
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
HAMS Technologies 1
Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌 指導老師:王國禎 教授.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Processes Introduction to Operating Systems: Module 3.
A SDN-based HoneyGrid. HoneyGrid Goals (cont.) 2. Distributed Resources Management through DLB NFV – Deploying honeynets at multiple locations is not.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
MapReduce Algorithm Design Based on Jimmy Lin’s slides
A Case for Performance- Centric Network Allocation Gautam Kumar, Mosharaf Chowdhury, Sylvia Ratnasamy, Ion Stoica UC Berkeley.
The Structure of the “THE”- Multiprogramming System Edsger W. Dijkstra Presented by: Jin Li.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Static Process Scheduling
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Virtual-Channel Flow Control William J. Dally
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Software-defined network(SDN)
6.888 Lecture 8: Networking for Data Analytics Mohammad Alizadeh Spring  Many thanks to Mosharaf Chowdhury (Michigan) and Kay Ousterhout (Berkeley)
Efficient Coflow Scheduling with Varys
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf Chowdhury, Ion Stoica UC Berkeley.
Yiting Xia, T. S. Eugene Ng Rice University
Networking in Datacenters EECS 398 Winter 2017
TensorFlow– A system for large-scale machine learning
Processes and threads.
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
International Conference on Data Engineering (ICDE 2016)
Managing Data Transfer in Computer Clusters with Orchestra
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Load Weighting and Priority
Cloud Data Anonymization Using Hadoop Map-Reduce Framework With Qos Evaluation and Behaviour analysis PROJECT GUIDE: Ms.S.Subbulakshmi TEAM MEMBERS: A.Mahalakshmi( ).
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Cof low A Networking Abstraction for Distributed
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
COS 418: Distributed Systems Lecture 1 Mike Freedman
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
CPU SCHEDULING.
Centralized Arbitration for Data Centers
Design Yaodong Bi.
Virtual Memory: Working Sets
COS 518: Distributed Systems Lecture 11 Mike Freedman
Modeling and Evaluating Variable Bit rate Video Steaming for ax
Towards Predictable Datacenter Networks
Presentation transcript:

Coflow A Networking Abstraction For Cluster Applications UC Berkeley Mosharaf Chowdhury Ion Stoica

Cluster Applications Multi-Stage Data Flows  Computation interleaved with communication Computation  Distributed  Runs on many machines Communication  Structured  Between machine groups 2 Driver

A Flow  Sequence of packets  Independent  Often the unit for network scheduling, traffic engineering, load balancing etc. Multiple Parallel Flows  Independent  Yet, semantically bound  Shared objective 3 Driver Communication Abstraction Minimize Completio n Time

Coflow A collection of flows between two groups of machines that are bound together by application-specific semantics Captures 1.Structure 2.Shared Objective 3.Semantics 4 ‘

We Want To… Better schedule the network  Intra-coflow  Inter-coflow Write the communication layer of a new application  Without reinventing the wheel Add unsupported coflows to an application, or Replace an existing coflow implementation  Independent of applications 5

6 Coflow AP I The Network (Physically or Logically Centralized Controller) Cluster Applications

7 Coflow AP I Goals 1.Separate intent from mechanisms 2.Convey application-specific semantics to the network Goals 1.Separate intent from mechanisms 2.Convey application-specific semantics to the network

8 Coflow AP I Shuffl e finishe s MapReduc e Job finishes create(SHUFFLE)  handle put(handle, id, content) get(handle, id)  content terminate(handle) Driver

Choice of algorithms  Default  WSS 1 Choice of mechanism  App vs. Network layer  Pull vs. Push Choice of algorithms  Default  WSS 1 Choice of mechanism  App vs. Network layer  Pull vs. Push 9 mapper s reducer s shuffl e 1. Orchestra, SIGCOMM’2011 Coflow Flexibilit y

10 mappe rs reducer s shuffl e driver (JobTrack er) broadcas b  create(BCAST) … put(b, id, content) … get(b, id) … Coflow Flexibilit y

11 mapper s reducer s shuffl e driver (JobTrack er) broadca b  create(BCAST) s  create(SHUFFLE, ord=[b ~> s]) put(b, id, content) … terminate(b) get(b, id) put(s, id s1 ) … Coflow Flexibilit y

Throughput-Sensitive Applications 12 Minimize Completion Time After 2 seconds

Throughput-Sensitive Applications 13 After 2 seconds After 7 secondsAfter 4 seconds Minimize Completion Time

Throughput-Sensitive Applications 14 After 2 seconds After 7 seconds Minimize Completion Time Free up resources without hurting application- perceived communication time

HotNets 2012 Latency-Sensitive Applications 15 Top-level Aggregato r Mid-level Aggregato rs Workers

Top-level Aggregato r Mid-level Aggregato rs Workers Latency-Sensitive Applications 16 HotNets 2012 Meet Deadline 1,2 1. D3, SIGCOMM’ PDQ, SIGCOMM’2012 HotNets-XI: Home Page conferences.sigcomm.org/hotnets/2012/ The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together people with interest in computer networks to engage in a lively debate... HotNets Workshop | acm sigcomm The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative... HotNets-XI, Seattle, WA area, October 29-30, HotNets-XI: Call for Papers conferences.sigcomm.org/hotnets/2012/cfp.shtml The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together researchers in computer networks and systems to engage in a lively... Coflow accepted at HotNets' Sep 13, 2012 – Update: Coflow camera-ready is available online! Tell us what you think! Our position paper to address the lack of a networking abstraction for... Limit impact to as few requests as possible

One More Thing… 1. Critical Path Scheduling 2. OpenTCP 3. Structured Streams 4. … 17

Coflow UC Berkeley Mosharaf Chowdhury A semantically-bound collection of flows Conveys application intent to the network  Allows better management of network resources  Provides greater flexibility in designing applications

Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows Critical Path Scheduling 19 S B S A S S A S

20 OperationCaller create(PATTERN, [opt])  handleDriver put(handle, id, content, [opt])  resultSender get(handle, id, [opt])  contentReceiver terminate(handle, [opt])  resultDriver Coflow AP I

Throughput-Sensitive Applications 21 Local shuffle finishes Shuffle finishes Data Flow Minimize Completion Time 1 MapReduc e Framewor k Job finishes Map Stage Reduce Stage 1. Orchestra, SIGCOMM’2011

22 Coflow Resourc e Allocation 1. Weights [Across Apps] mappe rs reducers shuffle 1 mappe rs reducers shuffle 2 Job 1 Job 2 Weighted sharing between shuffle 1  create(SHUFFLE, weight=1) shuffle 2  create(SHUFFLE, weight=2) … Weighted sharing between shuffle 1  create(SHUFFLE, weight=1) shuffle 2  create(SHUFFLE, weight=2) …

23 Strict shuffle 1  create(SHUFFLE, pri=3) shuffle 2  create(SHUFFLE, pri=5) … Strict shuffle 1  create(SHUFFLE, pri=3) shuffle 2  create(SHUFFLE, pri=5) … Coflow Resourc e Allocation 2. Priorities [Across Apps] mappe rs reducers shuffle 1 mappe rs reducers shuffle 2 Job 1 Job 2

24 Coflow Resourc e Allocation 3. Dependencies [Within Apps] mappe rs reducers shuffle 2 driver broadcast (b) mappe rs reducers shuffle 1 Job 1 Job 2 aggregation(a gg) finishes_before b  create ( BCAST ) shuffle 2  create ( SHUFFLE, ord=[b ~> shuffle 2 ]) agg  create ( AGGR, ord=[shuffle 2 ~> agg]) finishes_before b  create ( BCAST ) shuffle 2  create ( SHUFFLE, ord=[b ~> shuffle 2 ]) agg  create ( AGGR, ord=[shuffle 2 ~> agg])

25 Coflow Resourc e Allocatio n Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows S B S A S S A S