Common software needs and opportunities for HPCs Tom LeCompte High Energy Physics Division Argonne National Laboratory (A man wearing a bow tie giving.

Slides:

Advertisements

Similar presentations

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.

Advertisements

Beowulf Supercomputer System Lee, Jung won CS843.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Out-of core Streamline Generation Using Flow-guided File Layout Chun-Ming Chen 788 Project 1.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

Copyright © 2003, SAS Institute Inc. All rights reserved. Where's Waldo Uncovering Hard-to-Find Application Killers Claire Cates SAS Institute, Inc

Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.

Router Architecture : Building high-performance routers Ian Pratt

BEACON CLASS SEP 24TH Replication & misc. Today’s topics Replication and reproducibility How much memory does Avida use? Saving highly evolved critters.

David Notkin Autumn 2009 CSE303 Lecture 7 bash today, C tomorrow Quick reprise: debugging, performance What’s homework 2B? (yes, it’s posted) Some looks.

1: Operating Systems Overview

Hardware Basics: Inside the Box 2  2001 Prentice Hall2.2 Chapter Outline “There is no invention – only discovery.” Thomas J. Watson, Sr. What Computers.

SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.

Runtime alignment system SOFTWARE DESIGN IDEAS Wed 4 th May 2005 P Coe.

Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

Panda: MapReduce Framework on GPU’s and CPU’s

The Pentium: A CISC Architecture Shalvin Maharaj CS Umesh Maharaj:

Optimizing RAM-latency Dominated Applications

How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Input/ Output By Mohit Sehgal. What is Input/Output of a Computer? Connection with Machine Every machine has I/O (Like a function) In computing, input/output,

The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.

High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Argonne LCF Status Tom LeCompte High Energy Physics Division Argonne National Laboratory Tom Uram, Venkat Vishwanath Argonne Leadership Computing Facility.

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

The Future of the iPlant Cyberinfrastructure: Coming Attractions.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Memory Systems How to make the most out of cheap storage.

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Lessons from HLT benchmarking (For the new Farm) Rainer Schwemmer, LHCb Computing Workshop 2014.

EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.

 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.

Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Argonne National Laboratory Tom LeCompte1 Testbeam Requirements and Requests ATLAS Software Week Tom LeCompte Argonne National Laboratory

1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.

1 Multicore for Science Multicore Panel at eScience 2008 December Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University.

Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.

Plumbing the Computing Platforms of Big Data Dilma Da Silva Professor & Department Head Computer Science & Engineering Texas A&M University.

Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

Lecture on Central Process Unit (CPU)

Random numbers in C++ Nobody knows what’s next....

HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Moore vs. Moore Rainer Schwemmer, LHCb Computing Workshop 2015.

Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 0: Historical Overview.

Getting the Most out of Scientific Computing Resources

Conclusions on CS3014 David Gregg Department of Computer Science

Modeling Big Data Execution speed limited by: Model complexity

Getting the Most out of Scientific Computing Resources

Tom LeCompte High Energy Physics Division Argonne National Laboratory

Multicore Computing in ATLAS

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Ray-Cast Rendering in VTK-m

Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin

COMPI: Concolic Testing for MPI Applications

Lesson Objectives Aims You should be able to:

EE 155 / Comp 122 Parallel Computing

Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.

Presentation transcript:

Common software needs and opportunities for HPCs Tom LeCompte High Energy Physics Division Argonne National Laboratory (A man wearing a bow tie giving a slightly too-long ‘elevator speech’)

2 The Landscape  HEP Code evolved in a world where CPU was expensive and memory was cheap.  This is changing.  This is changing whether you are talking about –Supercomputers –Farms of commodity PCs –GPUs –Xeon Phis –Etc…  This fact, coupled with the fact that we don’t have the resources to rewrite our code from scratch forms the central problem for HEP computing. This talk will discuss one (of several) facets that address this.

3 A Success Story  This is a picture of Alpgen running on Mira –Mira is the #5 supercomputer: a BlueGene/Q with 786,432 cores  We are using the entire machine here – running 1,572,864 simultaneous instances of Alpgen –I don’t know exactly the job card we used here: These could be events that the Grid couldn’t make (highly filtered Z+5 jets) These could be ordinary events where we are supplementing Grid production –Credit: Taylor Childers (ANL/HEP), Tom Uram (ANL/ALCF), Doug Benjamin (Duke)  While we were running, this was maybe 10x-15x the ATLAS grid Mira at another time Mira when we were using it

4 How Did This Happen?  Not by accident and not all at once –I/O had to be substantially reworked –Memory usage had to be reworked – optimization is complex One can use memory to run more processes per node Or one can use memory to buffer data But you can’t use the same byte of memory twice –A long and ultimately unnecessary detour into the world of random number seeds We use up seeds 1.5 million times faster than a single threaded Alpgen would –We are running ~5-ish faster than when we started Not algorithmic improvements – just using the processors more efficiently We have reason to suspect at least another x2 is possible –All without breaking (or irreversibly forking) the code base  I could tell you similar stories about Sherpa and Geant –But not in six minutes –And those stories are less far along

5 Evolving Our Code  Memory/CPU ratio –A BlueGene/Q node has 16 GB of memory for 16 CPUs and 64 hardware threads –Not too different from a Xeon Phi with 16 GB for 61 single-threaded cores –Our biggest payoff on memory optimization is from releasing memory when it is no longer needed Example: once a PDF set is selected, the others no longer need to be kept in memory This is architecture independent This requires understanding of the code – i.e. code authors are in a better position to do this than “HPC optimizers”  Code quality –We found a bug in Alpgen Array out of bounds – crashes on BG/Q, does not crash on Intel –Those of you who have been through this before (32  64 bit, or Vax  Unix) know that changing platforms exposes problems, problems that can now be fixed.

6 “Without Breaking The Code”  Rationalizing the I/O –Example: using a RAMdisk for output until the very end, where the output is then consolidated. –Cordon off the new I/O behind preprocessor directives –Obviously, the actual code here is application specific – but the ideas behind it are more general  Sensible use of PDFs –We could use preprocessor directives here as well, but… –Nobody needs to switch PDFs mid-run –Shouldn’t this be rolled back into the main code base? In the process of getting code to run efficiently, we are seeing patterns emerge. Many of these patterns will help with other memory constrained architectures. Some may even help with today’s single-threaded machines. How do we best share this knowledge with each other?

7 Summary  Widely used HEP code runs on supercomputers –And not just x86-based supercomputers with a lot of memory  These provide us with a platform with many of the challenges that we will face eventually – but with an immediate payoff (delivering computing to our communities) in addition to the long-term (evolving our code)  Making progress requires collaboration between machine experts and software experts (us). Fortunately, they are really nice guys, so this is not so scary.  Certain common patterns are emerging – we shouldn’t all rediscover them ourselves