EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.

Slides:

Advertisements

Similar presentations

SDN Controller Challenges

Advertisements

Introduction CSCI 444/544 Operating Systems Fall 2008.

Spark: Cluster Computing with Working Sets

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability.

Cloud MapReduce ： a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.

1 The Google File System Reporter: You-Wei Zhang.

Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

A Scalable, Commodity Data Center Network Architecture Jingyang Zhu.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Scalable and Coordinated Scheduling for Cloud-Scale computing

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

EECS 582 Projects Mosharaf Chowdhury EECS 582 – W1611/8/16.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Efficient Coflow Scheduling with Varys

Datacenter As a Computer Mosharaf Chowdhury EECS 582 – W1613/7/16.

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.

TensorFlow– A system for large-scale machine learning

About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.

EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – W16.

Introduction to Distributed Platforms

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

Curator: Self-Managing Storage for Enterprise Clusters

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Managing Data Transfer in Computer Clusters with Orchestra

Improving Datacenter Performance and Robustness with Multipath TCP

Presented by Haoran Wang

Parallel Programming By J. H. Wang May 2, 2017.

MapReduce Simplified Data Processing on Large Cluster

Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.

PA an Coordinated Memory Caching for Parallel Jobs

Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016

Software Engineering Introduction to Apache Hadoop Map Reduce

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Parallel Programming in C with MPI and OpenMP

Consistency in Distributed Systems

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

Datacenter As a Computer

EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – F16.

MapReduce Simplied Data Processing on Large Clusters

湖南大学-信息科学与工程学院-计算机与科学系

Omega: flexible, scalable schedulers for large compute clusters

Cluster Resource Management: A Scalable Approach

Multi-hop Coflow Routing and Scheduling in Data Centers

CPU SCHEDULING.

Interpret the execution mode of SQL query in F1 Query paper

Multithreaded Programming

Cloud Computing Large-scale Resource Management

Parallel Programming in C with MPI and OpenMP

Operating System Overview

MapReduce: Simplified Data Processing on Large Clusters

Lecture 29: Distributed Systems

Towards Predictable Datacenter Networks

Presentation transcript:

EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16

Stats on the 11 Papers We’ve Reviewed EECS 582 – F16

Programming Models EECS 582 – F16

Programming Models MapReduce Spark Exposes scalability and fault tolerance with little programming experience Doesn’t work for well for iterative algorithms Spark RDDs suits iterative workloads well Lineage for fault tolerance allows avoiding checkpointing Ease of usability EECS 582 – F16

Operating Systems EECS 582 – F16

Operating Systems Borg Mesos Classifies jobs into long and short with different priorities, preempting if required Hides details of allocation and failures from programmers Centralized schedulers can be scalable Mesos Two-level scheduling with resource offers Frameworks can choose to accept or reject offers Failure handling is left to the apps EECS 582 – F16

Resource Allocation EECS 582 – F16

Resource Allocation Omega DRF Schedule mix of batch and interactive jobs with good placement Optimistic concurrency control targeted toward larger clusters Shared-state scheduler DRF Generalization of max-min allocation to multiple resources and heterogeneous clusters Many properties to maximize utilization and fairness without cheating EECS 582 – F16

File System EECS 582 – F16

File System GFS Workload-guided design: appends and large reads with small number of huge files Centralized design with replication for fault tolerance FDS Data and compute are NOT collocated Exploits full bisection bandwidth networks Stores everything has blobs to maximize sequential I/O EECS 582 – F16

Memory Management EECS 582 – F16

Memory Management PACMan EC-Cache Coordinated caching for DFSes All-or-nothing property dictates two eviction policies Prefers small jobs EC-Cache Alternative to replication that erasure codes instead Improves performance and tail latency by exploiting parallel I/O and better load balancing by splitting individual objects EECS 582 – F16

Networking EECS 582 – F16

Networking Fat Tree Varys DC network topology to provide full bisection bandwidth by arranging commodity switches into multiple stages Approximates Clos topology Global scheduling to minimize congestions (Hedera) Varys Coflow abstraction to exploit application-level algorithm Heuristics to improve order and allocate rates using all-or-nothing Introduced the concurrent open shop scheduling with coupled resources EECS 582 – F16

Final Poster and Paper Posters are a good way to interact with others and get feedback Mileage may vary, but its important to be able to talk about what you do Research paper The key part Should be written similar to the papers you’ve read As if you’d submit it to a workshop with ~3 more months of work or to a conference after ~6 more months of work How to Write a Great Research Paper by Simon Peyton Jones 9/7/16 EECS 582 – F16

Rough Outline [8 Pages w/o References] Abstract Introduction (Highlight the importance and give intuition of solution) Motivation (Use data and simple examples) Overview (Summarize your overall solution so that readers can follow later) Core Idea (Main contribution w/ challenges and how you address them) Implementation (Discuss non-obvious parts of your implementation) Evaluation (Convince readers that it works and when it fails) Related Work (Let readers know that you know your competition!) Discussion (Know your limitations and possible workarounds) Conclusion (Summarize and point out future work) 9/7/16 EECS 582 – F16