Container-Based Job Management for Fair Resource Sharing Jue Hong, Pavan Balaji, Gaojin Wen, Bibo Tu, Junming Yan, Chengzhong Xu, and Shengzhong Feng Oracle.

Slides:



Advertisements
Similar presentations
Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/
Advertisements

Workspaces for CE Management Kate Keahey Argonne National Laboratory.
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
PVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments Palden Lama, Xiaobo Zhou, University of Colorado at Colorado Springs.
Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration Xiang Zhang 1,2, Zhigang Huo 1, Jie Ma 1, Dan Meng 1 1. National Research Center.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
Runtime Support for Irregular Computations in MPI-Based Applications - CCGrid 2015 Doctoral Symposium - Xin Zhao *, Pavan Balaji † (Co-advisor), William.
Virtualization for Cloud Computing
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.
Virtualization and Cloud Computing Research at Vasabilab Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat.
Energy-aware Hierarchical Scheduling of Applications in Large Scale Data Centers Gaojin Wen, Jue Hong, Chengzhong Xu et al. Center for Cloud Computing,
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Supporting GPU Sharing in Cloud Environments with a Transparent
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
Impact of Network Sharing in Multi-core Architectures G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech Mathematics and Comp.
Virtual Machine Scheduling for Parallel Soft Real-Time Applications
Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk.
1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Martin Kruliš by Martin Kruliš (v1.1)1.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Sunpyo Hong, Hyesoon Kim
Transparent Accelerator Migration in Virtualized GPU Environments Shucai Xiao 1, Pavan Balaji 2, James Dinan 2, Qian Zhu 3, Rajeev Thakur 2, Susan Coghlan.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,
Covert Channels Through Branch Predictors: a Feasibility Study
Virtualization for Cloud Computing
Is Virtualization ready for End-to-End Application Performance?
Adaptive Cache Partitioning on a Composite Core
Xiaodong Wang, Shuang Chen, Jeff Setter,
Globus —— Toolkits for Grid Computing
PA an Coordinated Memory Caching for Parallel Jobs
Hadoop Clusters Tess Fulkerson.
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Department of Computer Science University of California, Santa Barbara
Resource Cages: A New Abstraction of the Hypervisor for Performance Isolation Considering IDS Offloading Kenichi Kourai*, Sungho Arai**, Kousuke Nakamura*,
Hardware Counter Driven On-the-Fly Request Signatures
Request Behavior Variations
Department of Computer Science University of California, Santa Barbara
Assoc. Prof. Marc FRÎNCU, PhD. Habil.
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

Container-Based Job Management for Fair Resource Sharing Jue Hong, Pavan Balaji, Gaojin Wen, Bibo Tu, Junming Yan, Chengzhong Xu, and Shengzhong Feng Oracle Corporation Argonne National Laboratory Chinese Academy of Sciences Tencent Inc.

Pavan Balaji, Argonne National Laboratory Resource Isolation Requirements  Resource contention is a big problem in multicore systems –Memory, network, shared caches  As systems become fatter (node core-count wise), this is going to continue to be a problem  Techniques to isolate each OS process into a virtual domain that has its own set of isolated resources can help with such contention  Idea is not that this will be helpful for all applications –Some applications can tolerate contention in order to deal with dynamic resource requirements between processes over time –For some applications, reduced contention can have a high impact ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Current Resource Isolation Models  Virtual machine (VM)-level sharing –Good resource-isolation –High overhead for controlling, setting up, and program execution  Process-level sharing –Difficult to track multiple-process job –Lack of fine-granularity isolation of CPU, net, etc.  OS-level Virtualization: Resource Containers –LRP, Vserver, OpenVZ, Linux Container (LXC) –Fine-grained partitioning of resources in a single OS –Low overhead: runs instructions native to the core CPU –Some of them have been in mainstream Linux kernel (LXC) ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory How Resource Containers Work ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Concerns with Resource Containers  While resource containers provide the ability to provide resource isolation, they do not provide a mechanism to schedule jobs based on their resource requirements –What processes can be executed, what need to be delayed, what processes can be “admitted” on a node for execution  Resource containers are notoriously bad in interacting with external tools used in parallel programming –E.g., Debugger tools (e.g., Totalview or DDT) require PID access of each process – this is hidden in resource containers –Information with respect to resource usage is hidden inside each container and not exposed outside ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory 100 Resource Container Interaction with Tools ISC (06/19/2013) mpiexec 126 totalview 100

Pavan Balaji, Argonne National Laboratory Primary Contributions  Idea: Using Linux Container to Implement Server-level Resource Control  Goal: Make resource containers a potentially usable model in HPC environments  Contributions: –A general container-based job management module (CJMM) A resource-aware management scheme showing how to apply the CJMM –Modifications to the resource container framework allowing it to expose information such as PIDs and resource usage, to better interact with external tools ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Container-based Job Management (CJMM)  Architecture of a Typical Cluster Computing System The CJMM is plugged into the execution engine, taking over the job execution, resource provisioning, and isolation ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Container-based Job Management Design  Design of CJMM –JobManager: Starts jobs and manages their containers Assigns and accounts for the resource usage of server –Container Represents the data structure and the operations of a real container Obtain the real-time resource usage of the underlying container ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Container-based Job Management  Implement Issues –Job-Startup Mechanism Due the hierarchical PID of container, original LXC did not provide a direct way to get a job’s outside-container PID when it runs inside a container Modify the starting mechanism to let CJMM get job’s top-level PID –Usage Information Retrieval Implement methods to calculate the real-time resource usage of a container using with the help of CGroup ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Exposing Resource Container Information ISC (06/19/2013) 100 mpiexec 126 totalview 126

Pavan Balaji, Argonne National Laboratory Applying CJMM  A Resource-aware Resource Management Scheme on TCluster –TCluster: a traditional cluster computing system without resource- aware feature –Integrate TCluster with CJMM: the CJMM-based executor enables the resource-aware scheduling and dispatching ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Applying CJMM  Architecture of Resource-aware TCluster ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Applying CJMM  Implementation of Resource-aware Tcluster –Scheduling Employ the DRF scheduling algorithm –Dispatching Simply find the server whose available resource matches the job’s required resources most Matching metric - Affinity Number, which is the Euclidean distance between two resource vector ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Experimental Setup  OS: SUSE Linux Enterprise 11-sp1 with kernel version of x86_64  LXC toolkit: version  Network: 1G Ethernet, same rack  Server: 6 servers, each with 4 Intel 3 GHz Xeon CPUs and 2 GB memory  CPU workload: two CPU-intensive calculating programs, one with single-process and the other with multiple-process  Memory workload: a memory-intensive program that continuously allocates and touches memory ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: CPU Usage  The CPU resource ratio set to the multi-process job, and the other three single-process jobs is 8:4:2:1  Number of processes in multi-process job: 3, 4, 5, 6, 7, 10, 12, and 24 ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Memory Usage ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Bomb-like Programs ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Resource Utilization  We deploy 10 jobs with different resource requirements, and compare three policies: first-fit, best-fit, and Affinity-number based Best Fit Jobs’ resource requirements Average resource utilization of each server with different policy. ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Overhead  We use an experimental IBM x3550 server with a quad-core Xeon E GHz CPU and 15 GB memory  Workload: GeekBench and UnixBench to evaluate the overhead of CPU, memory, disk I/O, and system operations. ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Overhead – CPU & Memory ISC (06/19/2013) CPU and memory overhead (Higher score is better)

Pavan Balaji, Argonne National Laboratory Performance Evaluation: Overhead – Disk I/O and System Operation ISC (06/19/2013) Disk I/OSystem Operation

Pavan Balaji, Argonne National Laboratory Conclusion  To enable the on-server resource control for fair resource sharing, we propose: –A general container-based job management module (CJMM), and –A resource-aware management scheme showing how to apply the CJMM  Experiments show our approach does good in controlling resource sharing and has very low overhead. ISC (06/19/2013)

Pavan Balaji, Argonne National Laboratory Personnel Acknowledgments Current Staff Members –Antonio Pena (postdoc) –Wesley Bland (postdoc) –Junchao Zhang (postdoc) –Huiwei Lu (postdoc) –Yan Li (postdoc) –Ken Raffenetti (s/w developer) –Yuqing Xiong (visiting researcher) Past Staff Members –James S. Dinan (postdoc) –Ralf Gunter (research associate) –David J. Goodell (developer) –Darius T. Buntinas (developer) Argonne Collaborators (Partial) –Rajeev Thakur (deputy director) –Marc Snir (division director) –Pete Beckman (scientist) –Fangfang Xia (asst. scientist) –Jeff Hammond (asst. scientist) ISC (06/19/2013) External Collaborators (Partial) –Ahmad Afsahi, Queen’s, Canada –Andrew Chien, U. Chicago –Wu-chun Feng, Virginia Tech –William Gropp, UIUC –Jue Hong, SIAT, Shenzhen –Yutaka Ishikawa, U. Tokyo, Japan –Laxmikant Kale, UIUC –Guangming Tan, ICT, Beijing –Yanjie Wei, SIAT, Shenzhen –Qing Yi, UC Colorado Springs –Yunquan Zhang, ISCAS, Beijing –Xiaobo Zhou, UC Colorado Springs Current and Past Students –Alex Brooks (Ph.D.) –Xiuxia Zhang (Ph.D.) –Chaoran Yang (Ph.D.) –Min Si (Ph.D.) –Huiwei Lu (Ph.D.) –Yan Li (Ph.D.) –David Ozog (Ph.D.) –Palden Lama (Ph.D.) –Xin Zhao (Ph.D.) –Ziaul Haque Olive (Ph.D.) –Md. Humayun Arafat (Ph.D.) –Qingpeng Niu (Ph.D.) –Li Rao (M.S.) –Lukasz Wesolowski (Ph.D.) –Feng Ji (Ph.D.) –John Jenkins (Ph.D.) –Ashwin Aji (Ph.D.) –Shucai Xiao (Ph.D.) –Sreeram Potluri (Ph.D.) –Piotr Fidkowski (Ph.D.) –James S. Dinan (Ph.D.) –Gopalakrishnan Santhanaraman (Ph.D.) –Ping Lai (Ph.D.) –Rajesh Sudarsan (Ph.D.) –Thomas Scogland (Ph.D.) –Ganesh Narayanaswamy (M.S.)

Thank You! Webpage: