Presentation is loading. Please wait.

Presentation is loading. Please wait.

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Similar presentations


Presentation on theme: "LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344."— Presentation transcript:

1 LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC HPC Software Development at LLNL Presented to College of St. Rose, Albany Feb. 11, 2013

2 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 2  Sequoia #1 in the world, June 2012 IBM Blue Gene/Q 96 racks, 98,304 nodes 1.5 million cores 5-D Torus network Transactional Memory Runs lightweight, Linux-like OS Login nodes are Power7, but compute nodes are PowerPC A2 cores. Requires cross-compiling. LLNL has some of the world’s largest supercomputers

3 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 3  Zin Intel Sandy Bridge 2,916 16-core nodes 45,656 processors Infiniband Fat Tree interconnect Commodity parts Runs TOSS, LLNL’s Red Hat Linux Distro LLNL has some of the world’s largest supercomputers

4 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 4  Others Almost 30 clusters total See http://computing.llnl.govhttp://computing.llnl.gov LLNL has some of the world’s largest supercomputers

5 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 5  Multi-physics simulations  Material Strength  Laser-Plasma Interaction  Quantum Chromodynamics  Fluid Dynamics  Lots of complicated numerical methods for solving equations: Adaptive Mesh Refinement (AMR) Adaptive Multigrid Unstructured Mesh Structured Mesh Supercomputers run very large-scale simulations NIF Target Supernova AMR Fluid Interface

6 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 6 1. Code teams Work on physics applications Larger code teams are 20+ people — Software developers — Applied mathematicians — Physicists Work to meet milestones for lab missions Code Teams Researchers (CASC) Production Computing (LC) Structure of the Lab

7 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 7 2. Livermore Computing (LC) Run supercomputing center Development Environment Group — Works with application teams to improve code performance — Knows about compilers, debuggers, performance tools — Develops performance tools Software Development Group — Develops Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

8 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 8 2. Center For Applied Scientific Computing (CASC) Most CS Researchers are in CASC Large groups doing: — Performance Analysis Tools — Power optimization — Resilience — Source-to-source Compilers — FPGAs and new architectures — Applied Math and numerical analysis Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

9 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 9  Write software to measure the performance of other software Profiling Tracing Debugging Visualization  Tools themselves need to perform well: Parallel Algorithms Scalability and low overhead are important Performance Tools Research Code Teams Tools Resaerch (CASC) Development Environment Group (DEG)

10 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 10  Application codes are written in many languages Fortran, C, C++, Python Some applications have been around for 50+ years  Tools are typically written in C/C++ Tools typically run as part of an application Need to be able to link with application environment  Non-parallel parts of tools are often in Python. GUI front-end scripts some data analysis Development Environment

11 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 11  http://www.atlassian.com Confluence Wiki JIRA Bug Tracker Stash git repo hosting  Several advantages for our distributed environment: Scale to lots of users Fine-grained permissions allow us to stay within our security model We’ve started using Atlassian tools for collaboration

12 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 12  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send()... Parallel Application Single Process

13 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 13  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Tool Interposer Library Implements MPI_Send() Calls PMPI_Send() Application Calls MPI_Send()

14 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 14  This call intercepts calls from the application  It does its own measurement  Then it calls the MPI library  Allows us to measure time spent in particular routines Example Interposer Code int MPI_Bcast(void *buffer, int count, MPI_Datatype dtype, int root, MPI_Comm comm) { double start = get_time_ns(); PMPI_Bcast(buffer, count, dtype, root, comm); double duration = get_time_ns() – start; record_time(MPI_Bcast, duration); }

15 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 15  See other slide set. Another type of problem: communication optimization

16


Download ppt "LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344."

Similar presentations


Ads by Google