PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Slides:

Advertisements

Similar presentations

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Advertisements

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.

GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.

Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.

Clydesdale: Structured Data Processing on MapReduce Jackie.

Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun.

Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout Joint work with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun UC.

The Power of Choice in Data-Aware Cluster Scheduling

The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.

GreenSched: An Energy-Aware Hadoop Workflow Scheduler

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.

Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Virtualization and Databases Ashraf Aboulnaga University of Waterloo.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.

Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.

Scalable and Coordinated Scheduling for Cloud-Scale computing

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

ApproxHadoop Bringing Approximations to MapReduce Frameworks

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

A Platform for Fine-Grained Resource Sharing in the Data Center

Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.

Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf Chowdhury, Ion Stoica UC Berkeley.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Big Data Analytics with Parallel Jobs

GRASS: Trimming Stragglers in Approximation Analytics

BD-Cache: Big Data Caching for Datacenters

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Managing Data Transfer in Computer Clusters with Orchestra

PA an Coordinated Memory Caching for Parallel Jobs

Memory Management for Scalable Web Data Servers

Cloud Computing Large-scale Resource Management

Cloud Computing MapReduce in Heterogeneous Environments

Presentation transcript:

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica NSDI 2012

Motivation Scheduler JOBS

In-Memory Caching Majority of jobs are small in size Input data of most jobs can be cached in 32GB of memory 92% of jobs in Facebook’s Hadoop cluster fit in memory IO-intensive phase constitute significant portion of datacenter execution 79% of runtime, 69% of resource

PA Man: Parallel All-or-nothing Cache MANager Globally coordinates access to its distributed memory caches across various machines Two main tasks: Support queries for the set of machines where block is cached Mediate cache replacement globally across machines

PA Man Coordinator Keep track of the changes that are made by clients. Maintain a mapping between every cached block and the machines that cache it. Implements the cache eviction policies such as LIFE and LFU-F Secondary coordinator on cold standby as a backup.

PA Man Clients Service requests to existing cache blocks and manage new blocks. Data is cached at the destination, rather than the source What Is the Optimal Eviction Policy?

Key Insight: All-or-Nothing Property Tasks of small jobs run simultaneously in a wave slot 2 slot 1 time completion time slot 2 slot 1 time completion time Task duration (uncached input) Task duration (cached input) All-or-nothing: Unless all inputs are cached, there is no benefit slot 2 slot 1 time completion time

Problem of Traditional Policies Simply maximizing the hit-rate may not improve performance Ex) If we have Job 1 and Job 2 where Job 2 depends on result of Job 1 Task duration (uncached input) Task duration (cached input) slot 1 slot 2 slot 3 slot 4 Job 1 completion Job 1Job 2 completion slot 1 slot 2 slot 3 slot 4 Job 1 completion Job 1 Job 2 completion Sticky Policy: Evict the Incomplete Caches first.

Eviction Policy - LIFE Goal: Minimize the average completion time of jobs Are there any Incomplete Files? Evict largest Incomplete File Evict largest Complete File YESNO “largest” – file with largest wave-width Wave-width: Number of parallel tasks of a job

Eviction Policy – LFU-F Goal: Maximize the resource efficiency of the cluster (Utilization) Are there any Incomplete Files? Evict least accessed Incomplete File Evict least accessed Complete File YESNO

Eviction Policy – LIFE vs LFU-F slot 1 slot 2 slot 3 slot 4 slot 5 Job 1 Job 2 slot 1 slot 2 slot 3 slot 4 slot 5 Job 1 Job 2 LIFE: Evict the highest wave-width LFU-F : Evict the lowest frequency Job 1 – Wave-width: 3Capacity: 3Frequency: 2 Job 2 – Wave-width: 2Capacity: 4Frequency: 1 Capacity Required: 4 Capacity Required: 3

Results: PACMan vs Hadoop Significant reduction in completion time for small jobs. Better efficiency at larger jobs.

Results: PACMan vs Traditional Policies LIFE performs significantly better than MIN, despite having lower hit ratio for most applications. Sticky-policy help LFU-F have better cluster efficiency.

Summary Most datacenter workloads are small in size, and can fit in memory. PACMan – Coordinated Cache Management System Take into account All-or-nothing nature of parallel jobs to improve: Completion Time (LIFE) Resource Utilization (LFU-F) 53% improvement in runtime, 54% improvement in resource utilization over Hadoop.

Discussion & Questions How “fair” is PACMan? Will it favor or prioritize certain types of jobs over another? Is it okay? Are there workloads where “all-or-nothing” property does not hold true?

Scalability of PACMan PACMan client saturate at tasks per machine, for block sizes of 64/128/256MB, which is comparable to Hadoop Coordinator maintains a constant ~1.2ms latency till 10,300 requests per second, significantly better than Hadoop’s 3,200 requests per second bottleneck.

Evaluation Experimental Platform: 100-node cluster in Amazon EC2 34.2GB of memory, 20GB of cache allocation for PACMan per machine 13 cores and 850 GB storage Traces from Facebook and Bing