EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16
Stats on the 11 Papers We’ve Reviewed EECS 582 – F16
Programming Models EECS 582 – F16
Programming Models MapReduce Spark Exposes scalability and fault tolerance with little programming experience Doesn’t work for well for iterative algorithms Spark RDDs suits iterative workloads well Lineage for fault tolerance allows avoiding checkpointing Ease of usability EECS 582 – F16
Operating Systems EECS 582 – F16
Operating Systems Borg Mesos Classifies jobs into long and short with different priorities, preempting if required Hides details of allocation and failures from programmers Centralized schedulers can be scalable Mesos Two-level scheduling with resource offers Frameworks can choose to accept or reject offers Failure handling is left to the apps EECS 582 – F16
Resource Allocation EECS 582 – F16
Resource Allocation Omega DRF Schedule mix of batch and interactive jobs with good placement Optimistic concurrency control targeted toward larger clusters Shared-state scheduler DRF Generalization of max-min allocation to multiple resources and heterogeneous clusters Many properties to maximize utilization and fairness without cheating EECS 582 – F16
File System EECS 582 – F16
File System GFS Workload-guided design: appends and large reads with small number of huge files Centralized design with replication for fault tolerance FDS Data and compute are NOT collocated Exploits full bisection bandwidth networks Stores everything has blobs to maximize sequential I/O EECS 582 – F16
Memory Management EECS 582 – F16
Memory Management PACMan EC-Cache Coordinated caching for DFSes All-or-nothing property dictates two eviction policies Prefers small jobs EC-Cache Alternative to replication that erasure codes instead Improves performance and tail latency by exploiting parallel I/O and better load balancing by splitting individual objects EECS 582 – F16
Networking EECS 582 – F16
Networking Fat Tree Varys DC network topology to provide full bisection bandwidth by arranging commodity switches into multiple stages Approximates Clos topology Global scheduling to minimize congestions (Hedera) Varys Coflow abstraction to exploit application-level algorithm Heuristics to improve order and allocate rates using all-or-nothing Introduced the concurrent open shop scheduling with coupled resources EECS 582 – F16
Final Poster and Paper Posters are a good way to interact with others and get feedback Mileage may vary, but its important to be able to talk about what you do Research paper The key part Should be written similar to the papers you’ve read As if you’d submit it to a workshop with ~3 more months of work or to a conference after ~6 more months of work How to Write a Great Research Paper by Simon Peyton Jones 9/7/16 EECS 582 – F16
Rough Outline [8 Pages w/o References] Abstract Introduction (Highlight the importance and give intuition of solution) Motivation (Use data and simple examples) Overview (Summarize your overall solution so that readers can follow later) Core Idea (Main contribution w/ challenges and how you address them) Implementation (Discuss non-obvious parts of your implementation) Evaluation (Convince readers that it works and when it fails) Related Work (Let readers know that you know your competition!) Discussion (Know your limitations and possible workarounds) Conclusion (Summarize and point out future work) 9/7/16 EECS 582 – F16