An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University.

Slides:



Advertisements
Similar presentations
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Advertisements

Lecture 6: Multicore Systems
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Profiling your application with Intel VTune at NERSC
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Extensible Networking Platform 1 Liquid Architecture Cycle Accurate Performance Measurement Richard Hough Phillip Jones, Scott Friedman, Roger Chamberlain,
Memory System Characterization of Big Data Workloads
Academic and Research Technology (A&RT)
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
The Creation of a Big Data Analysis Environment for Undergraduates in SUNY Presented by Jim Greenberg SUNY Oneonta on behalf of the SUNY wide team.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications Jeff Diamond 1, Martin Burtscher 2, John D. McCalpin.
JPCM - JDC121 JPCM. Agenda JPCM - JDC122 3 Software performance is Better Performance tuning requires accurate Measurements. JPCM - JDC124 Software.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.
Srihari Makineni & Ravi Iyer Communications Technology Lab
(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads Zoltán Majó Thomas R. Gross Department of Computer Science ETH Zurich,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Tom Furlani Director, Center for Computational Research SUNY Buffalo Metrics for HPC September 30, 2010.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Slide 1/23 F RESCO : An Open Failure Data Repository for Dependability Research and Practice Saurabh Bagchi, Carol Song (Purdue University) Ravi Iyer,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Open XDMoD Overview Tom Furlani, Center for Computational Research
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Sunpyo Hong, Hyesoon Kim
CSE598c - Virtual Machines - Spring Diagnosing Performance Overheads in the Xen Virtual Machine EnvironmentPage 1 CSE 598c Virtual Machines “Diagnosing.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Tom Furlani, Director September 19, 2015 XDMoD Overview.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
1
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Slide 1 User-Centric Workload Analytics: Towards Better Cluster Management Saurabh Bagchi Purdue University Joint work with: Subrata Mitra, Suhas Javagal,
HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.
Slide 1 Cluster Workload Analytics Revisited Saurabh Bagchi Purdue University Joint work with: Subrata Mitra, Suhas Javagal, Stephen Harrell (Purdue),
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
XD Net Metrics Service (XMS)
Lecture 2: Performance Evaluation
Seth Pugsley, Jeffrey Jestes,
Cluster Optimisation using Cgroups
Diskpool and cloud storage benchmarks used in IT-DSS
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Matt Link Associate Vice President (Acting) Director, Systems
Introduction to XSEDE Resources HPC Workshop 08/21/2017
Department of Computer Science University of California, Santa Barbara
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
CARLA Buenos Aires, Argentina - Sept , 2017
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University at Buffalo, SUNY XSEDE14 JULY 13– 18, 2014

TECHNOLOGY AUDIT SERVICE Outline Motivation Overview of tools (XDMOD, tacc_stats) Background Results Conclusions Discussion

TECHNOLOGY AUDIT SERVICE CoAuthors Robert L. DeLeon (UB) Thomas R. Furlani (UB) Steven M. Gallo (UB) Matthew D Jones (UB) Amin Ghadersohi (UB) Cynthia D. Cornelius (UB) Abani K. Patra (UB) James C. Browne (UTexas) William L. Barth (TACC) John Hammond (TACC)

TECHNOLOGY AUDIT SERVICE Motivation Node sharing benefits: – increases throughput by up to 26% – increases energy efficiency by up to 22% (Breslow et al.) Node sharing disadvantages: – resource contention Number of cores per node increasing Ulterior motive: – Prove toolset A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely. The case for colocation of hpc workloads. Concurrency and Computation: Practice and Experience,

TECHNOLOGY AUDIT SERVICE Tools XDMoD – NSF funded open source tool that provides a wide range of usage and performance metrics on XSEDE systems – Web-based interface – Powerful charting features tacc_stats – low-overhead collection of system-wide performance data – Runs on every node on a resource collects data at job start, end and periodically during job CPU usage Hardware performance counters Memory usage I/O usage

TECHNOLOGY AUDIT SERVICE Data flow

TECHNOLOGY AUDIT SERVICE Data flow

TECHNOLOGY AUDIT SERVICE XDMoD Data Sources

TECHNOLOGY AUDIT SERVICE Background CCR's HPC resource "Rush" – cores – Heterogeneous cluster 8, 12, 16 or 32 cores per node – InfiniBand – Panasas parallel filesystem – SLURM resource manager node sharing enabled by default cgroup plugin to isolate jobs Academic computing center: higher % of smaller jobs than large XSEDE resources All data from Jan - Feb 2014 (~370,000 jobs)

TECHNOLOGY AUDIT SERVICE Number of jobs by job size

TECHNOLOGY AUDIT SERVICE Results Exclusive jobs: where no other jobs ran concurrently on the allocated node(s) (left hand side of plots) Shared jobs: where at least one other job was running on the allocated node(s) (right hand side) – Process memory usage – Total OS memory usage – LLC read miss rates – Job exit status – Parallel filesystem bandwidth – InfiniBand interconnect bandwidth

TECHNOLOGY AUDIT SERVICE Memory usage per core (MemUsed - FilePages - Slab) from /sys/devices/system/node/node0/meminfo Memory usage per core GB Exclusive jobs Memory usage per core GB Shared jobs

TECHNOLOGY AUDIT SERVICE Total memory usage per core (4GB/core nodes) Total memory usage per core GB Exclusive jobs Total memory usage per core GB Shared jobs

TECHNOLOGY AUDIT SERVICE Last level cache (LLC) read miss rate per socket UNC_LLC_MISS:READ on Intel Westmere uncore Gives upper bound estimate of DRAM bandwidth LLC read miss rate 10 6 /s Exclusive jobs LLC read miss rate 10 6 /s Shared jobs

TECHNOLOGY AUDIT SERVICE Job exit status reported by SLURM

TECHNOLOGY AUDIT SERVICE Panasas parallel filesystem write rate per node Write rate per node B/s Exclusive jobs Write rate per node B/s Shared jobs

TECHNOLOGY AUDIT SERVICE InfiniBand write rate per node Write rate Log 10 (B/s) Exclusive jobs Write rate Log 10 (B/s) Shared jobs Peaks truncated: ~45,000 for Exclusive jobs ~80,000 for shared jobs

TECHNOLOGY AUDIT SERVICE Conclusions Little difference on average between the shared and exclusive jobs on Rush Majority of jobs have resource usage much less than max available Have created data collection/processing software that facilitates easy evaluation of system usage

TECHNOLOGY AUDIT SERVICE Discussion Limitations of current work – Unable to determine impact (if any) on job wall time – Comparing overall average values for jobs – Shared node job statistics are convolved – Exit code not reliable way to determine failure

TECHNOLOGY AUDIT SERVICE Future work Use Application Kernels to get detailed analysis of interference Many more metrics now available: – FLOPS – CPU clock cycles per instruction (CPI) – CPU clock cycles per L1D cache load (CPLD) Add support for per job metrics on shared nodes. Study classes of applications

TECHNOLOGY AUDIT SERVICE Questions BOF: XDMoD: A Tool for Comprehensive Resource Management of HPC Systems – 6:00pm - 7:00pm tomorrow. Room A602 XDMoD – tacc_stats – Contact info –

TECHNOLOGY AUDIT SERVICE Acknowledgments This work is supported by the National Science Foundation under grant number OCI and grant number OCI for the technology audit service (TAS) for XSEDE