Presentation is loading. Please wait.

Presentation is loading. Please wait.

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Published byKevin McGee Modified over 9 years ago

Similar presentations

Presentation on theme: "LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344."— Presentation transcript:

1 LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC HPC Software Development at LLNL Presented to College of St. Rose, Albany Feb. 11, 2013

2 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 2  Sequoia #1 in the world, June 2012 IBM Blue Gene/Q 96 racks, 98,304 nodes 1.5 million cores 5-D Torus network Transactional Memory Runs lightweight, Linux-like OS Login nodes are Power7, but compute nodes are PowerPC A2 cores. Requires cross-compiling. LLNL has some of the world’s largest supercomputers

3 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 3  Zin Intel Sandy Bridge 2,916 16-core nodes 45,656 processors Infiniband Fat Tree interconnect Commodity parts Runs TOSS, LLNL’s Red Hat Linux Distro LLNL has some of the world’s largest supercomputers

4 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 4  Others Almost 30 clusters total See http://computing.llnl.govhttp://computing.llnl.gov LLNL has some of the world’s largest supercomputers

5 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 5  Multi-physics simulations  Material Strength  Laser-Plasma Interaction  Quantum Chromodynamics  Fluid Dynamics  Lots of complicated numerical methods for solving equations: Adaptive Mesh Refinement (AMR) Adaptive Multigrid Unstructured Mesh Structured Mesh Supercomputers run very large-scale simulations NIF Target Supernova AMR Fluid Interface

6 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 6 1. Code teams Work on physics applications Larger code teams are 20+ people — Software developers — Applied mathematicians — Physicists Work to meet milestones for lab missions Code Teams Researchers (CASC) Production Computing (LC) Structure of the Lab

7 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 7 2. Livermore Computing (LC) Run supercomputing center Development Environment Group — Works with application teams to improve code performance — Knows about compilers, debuggers, performance tools — Develops performance tools Software Development Group — Develops Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

8 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 8 2. Center For Applied Scientific Computing (CASC) Most CS Researchers are in CASC Large groups doing: — Performance Analysis Tools — Power optimization — Resilience — Source-to-source Compilers — FPGAs and new architectures — Applied Math and numerical analysis Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

9 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 9  Write software to measure the performance of other software Profiling Tracing Debugging Visualization  Tools themselves need to perform well: Parallel Algorithms Scalability and low overhead are important Performance Tools Research Code Teams Tools Resaerch (CASC) Development Environment Group (DEG)

10 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 10  Application codes are written in many languages Fortran, C, C++, Python Some applications have been around for 50+ years  Tools are typically written in C/C++ Tools typically run as part of an application Need to be able to link with application environment  Non-parallel parts of tools are often in Python. GUI front-end scripts some data analysis Development Environment

11 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 11  http://www.atlassian.com Confluence Wiki JIRA Bug Tracker Stash git repo hosting  Several advantages for our distributed environment: Scale to lots of users Fine-grained permissions allow us to stay within our security model We’ve started using Atlassian tools for collaboration

12 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 12  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send()... Parallel Application Single Process

13 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 13  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Tool Interposer Library Implements MPI_Send() Calls PMPI_Send() Application Calls MPI_Send()

14 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 14  This call intercepts calls from the application  It does its own measurement  Then it calls the MPI library  Allows us to measure time spent in particular routines Example Interposer Code int MPI_Bcast(void *buffer, int count, MPI_Datatype dtype, int root, MPI_Comm comm) { double start = get_time_ns(); PMPI_Bcast(buffer, count, dtype, root, comm); double duration = get_time_ns() – start; record_time(MPI_Bcast, duration); }

15 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 15  See other slide set. Another type of problem: communication optimization

Download ppt "LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344."

Similar presentations

About project

SlidePlayer
Terms of Service

Do Not Sell
My Personal
Information

Feedback

Privacy Policy
Feedback

© 2025 SlidePlayer.com. Inc.
All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google