PAPI for Blue Gene/Q: The 5 BGPM Components Heike Jagode and Shirley Moore Innovative Computing Laboratory University of Tennessee-Knoxville

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Chapter Nine NetWare-Based Networking. Objectives Identify the advantages of using the NetWare network operating system Describe NetWare’s server hardware.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Dawson R. Engler, M. Frans Kaashoek, and James O'Tool Jr.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
OS Spring’03 Introduction Operating Systems Spring 2003.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
November 1, 2004Introduction to Computer Security ©2004 Matt Bishop Slide #29-1 Chapter 33: Virtual Machines Virtual Machine Structure Virtual Machine.
Embedded Systems Programming
ASPPRATECH.
What are the functions of an operating system? The operating system is the core software component of your computer. It performs many functions and is,
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Software Overview.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Enterprise Computing With Aspects of Computer Architecture Jordan Harstad Technology Support Analyst Arizona State University.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
ICOM Noack Operating Systems - Administrivia Prontuario - Please time-share and ask questions Info is in my homepage amadeus/~noack/ Make bookmark.
BG/Q Performance Tools Scott Parker Mira Community Conference: March 5, 2012 Argonne Leadership Computing Facility.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
Operating Systems. Definition An operating system is a collection of programs that manage the resources of the system, and provides a interface between.
Operating System What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. An operating.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
Processor Architecture
CS-303 Introduction to Programming
Full and Para Virtualization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
IST 222 Day 2. Homework for Today Take up homework and go over Go to CompTIA web site and view objectives for A+ certification test.
OPERATING SYSTEMS DO YOU REQUIRE AN OPERATING SYSTEM IN YOUR SYSTEM?
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 2 Introducing Operating Systems.
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT /13/2009.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Chapter 19: Real-Time Systems
Current Generation Hypervisor Type 1 Type 2.
Hands-On Microsoft Windows Server 2008
Parallel Objects: Virtualization & In-Process Components
What we need to be able to count to tune programs
OS Virtualization.
Advanced Operating Systems
Windows Internals Brown-Bag Seminar Chapter 1 – Concepts and Tools
Chapter 2: The Linux System Part 1
COMPUTER SOFT WARE Software is a set of electronic instructions that tells the computer how to do certain tasks. A set of instructions is often called.
Today’s agenda Hardware architecture and runtime system
Perfctr-Xen: A framework for Performance Counter Virtualization
Linux Architecture Overview.
Chapter 19: Real-Time Systems
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Java Programming Introduction
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

PAPI for Blue Gene/Q: The 5 BGPM Components Heike Jagode and Shirley Moore Innovative Computing Laboratory University of Tennessee-Knoxville ESP Code for “Q” Workshop Argonne National Laboratory March 19-21, 2012

Overview Introduction Processor Unit (PUnit) Component L2 Unit Component I/O Unit Component Network Component Compute Node Kernel Unit (CNKUnit) Component 2

Introduction Very little effort was put into hardware performance monitoring tools for the BG/Q predecessor BG/P. HPC community was left behind with rather poor and incomplete methods. To eliminate this limitation, for BG/Q we planned carefully and we collaborate closely with IBM’s Performance Group. Result: Added 5 new components to PAPI to support hardware performance monitoring for the BG/Q network, the I/O system, and the Compute Node Kernel in addition to the processing cores 3

PUnit Component Each of the 18 A2 CPU cores has a local UPC module. Each of these modules provides 24 counters (14-bit) to sample A2 events, L1 cache related events, floating point operations, etc. Local UPC module is broken down into 5 internal sub- modules: FU, XU, IU, LSU and MMU. The sub-modules are transparently identifiable from the PUnit event names (see next slide for examples).  The BGPM PUnit interfaces with these modules.  PAPI uses the BGPM interface. 4

PUnit Events (Native | Presets) Currently, there are 269 native PUnit events available: Out of 107 possible PAPI predefined events, there are currently 41 events available of which 12 are derived events: 5

L2 Unit Component Shared L2 cache is split into 16 separate slices Each of the 16 L2 memory slices (per chip) has a L2 UPC module that provides 6 counters (node-wide) 6

L2 Unit Native Events Currently, there are 32 L2 Unit events available: BG/Q processor has two DDR3 memory controllers, each interfacing with eight slices of the L2 cache to handle their cache misses (one controller for each half of the 16 cores on the chip). The counting hardware can either keep the counts from each slice separate, or combine the counts from each slice into single values (which is the default). 7

The Message, PCIe, and DevBus modules – which are collectively referred to as I/O modules – provide together 43 counters (node-wide) I/O Unit Component 8

I/O Unit Native Events Currently, there are 44 I/O Unit events available. The three I/O sub-modules are transparently identifiable from the I/O Unit event names. 9

Network Unit Component The 5D-Torus network provides a local UPC network module with 66 counters - each of the 11 links has six 64-bit counters. As of right now, a PAPI user cannot select which network link to which to attach. Currently, all network links are attached and this is hard- coded in the PAPI NWUnit component.  The BGPM NWUnit interfaces with the network modules.  PAPI Network Unit Component interfaces with BGPM. 10

Network Unit Native Events Currently, there are 31 Network Unit events available 11

CNK Unit Component CNK is the lightweight Compute Node Kernel that runs on all the 16 compute cores. BGPM offers a “virtual” CNK Unit that has software counters collected by the kernel (kernel counter values are read via a system call). Currently, there are 29 CNK Unit events available. 12

Overflow and Multiplexing Overflow: Only the local UPC module, L2 and I/O UPC hardware support performance monitor interrupts when a programmed counter overflows For that reason, only the PUnit, L2Unit, and I/OUnit provide overflow support in BGPM and PAPI. Multiplexing: PAPI supports multiplexing for the BG/Q platform. The BGPM PUnit does not directly implement multiplexing of event sets, but it does indirectly support multiplexing by supporting a multiplexed event set type. 13

PAPI on VEAS Installed in /soft/libraries/papi bgq-beta-4/ Utilities in bin directory papi_avail, papi_native_avail, etc. Run on compute node using qsub – e.g., qsub -n 1 --mode c1 -t 10 papi_avail Man pages in share directory MANPATH=“/soft/libraries/papi pgq-beta- 4/share/man:$MANPATH” Examples in /home/shirley/papi_beta4 cd /home/shirley cp –R papi_beta4 ~ cd papi_beta4/src/ctests Run on compute node using qsub – e.g., qsub –n 1 –mode c1 –t 10./matrix-hl 14