LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Slides:



Advertisements
Similar presentations
Lawrence Livermore National Laboratory ROSE Compiler Project Computational Exascale Workshop December 2010 Dan Quinlan Chunhua Liao, Justin Too, Robb Matzke,
Advertisements

Today’s topics Single processors and the Memory Hierarchy
1 DISTRIBUTION STATEMENT XXX– Unclassified, Unlimited Distribution Laser Propagation Modeling Making Large Scale Ultrashort Pulse Laser Simulations Possible.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Types of Parallel Computers
Information Technology Center Introduction to High Performance Computing at KFUPM.
Introduction CS 524 – High-Performance Computing.
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Center for Component Technology for Terascale Simulation Software 122 June 2002Workshop on Performance Optimization via High Level Languages and Libraries.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
QCD Project Overview Ying Zhang September 26, 2005.
Slide 1/24 Lawrence Livermore National Laboratory AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks Greg Bronevetsky, Bronis R. de Supinski,
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
1 Raspberry Pi HPC Testbed By Bradford W. Bazemore Georgia Southern University.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
LLNL-PRES-?????? This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
LLNL-PRES DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract.
Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
Interconnection network network interface and a case study.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
1 Using PMPI routines l PMPI allows selective replacement of MPI routines at link time (no need to recompile) l Some libraries already make use of PMPI.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
August 12, 2004 UCRL-PRES Aug Outline l Motivation l About the Applications l Statistics Gathered l Inferences l Future Work.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.
VisIt Project Overview
Performance Technology for Scalable Parallel Systems
CRESCO Project: Salvatore Raia
Is System X for Me? Cal Ribbens Computer Science Department
BlueGene/L Supercomputer
Cluster Computers.
Presentation transcript:

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Lawrence Livermore National Security, LLC HPC Software Development at LLNL Presented to College of St. Rose, Albany Feb. 11, 2013

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 2  Sequoia #1 in the world, June 2012 IBM Blue Gene/Q 96 racks, 98,304 nodes 1.5 million cores 5-D Torus network Transactional Memory Runs lightweight, Linux-like OS Login nodes are Power7, but compute nodes are PowerPC A2 cores. Requires cross-compiling. LLNL has some of the world’s largest supercomputers

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 3  Zin Intel Sandy Bridge 2, core nodes 45,656 processors Infiniband Fat Tree interconnect Commodity parts Runs TOSS, LLNL’s Red Hat Linux Distro LLNL has some of the world’s largest supercomputers

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 4  Others Almost 30 clusters total See LLNL has some of the world’s largest supercomputers

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 5  Multi-physics simulations  Material Strength  Laser-Plasma Interaction  Quantum Chromodynamics  Fluid Dynamics  Lots of complicated numerical methods for solving equations: Adaptive Mesh Refinement (AMR) Adaptive Multigrid Unstructured Mesh Structured Mesh Supercomputers run very large-scale simulations NIF Target Supernova AMR Fluid Interface

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 6 1. Code teams Work on physics applications Larger code teams are 20+ people — Software developers — Applied mathematicians — Physicists Work to meet milestones for lab missions Code Teams Researchers (CASC) Production Computing (LC) Structure of the Lab

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 7 2. Livermore Computing (LC) Run supercomputing center Development Environment Group — Works with application teams to improve code performance — Knows about compilers, debuggers, performance tools — Develops performance tools Software Development Group — Develops Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 8 2. Center For Applied Scientific Computing (CASC) Most CS Researchers are in CASC Large groups doing: — Performance Analysis Tools — Power optimization — Resilience — Source-to-source Compilers — FPGAs and new architectures — Applied Math and numerical analysis Structure of the Lab Code Teams Researchers (CASC) Production Computing (LC)

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 9  Write software to measure the performance of other software Profiling Tracing Debugging Visualization  Tools themselves need to perform well: Parallel Algorithms Scalability and low overhead are important Performance Tools Research Code Teams Tools Resaerch (CASC) Development Environment Group (DEG)

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 10  Application codes are written in many languages Fortran, C, C++, Python Some applications have been around for 50+ years  Tools are typically written in C/C++ Tools typically run as part of an application Need to be able to link with application environment  Non-parallel parts of tools are often in Python. GUI front-end scripts some data analysis Development Environment

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 11  Confluence Wiki JIRA Bug Tracker Stash git repo hosting  Several advantages for our distributed environment: Scale to lots of users Fine-grained permissions allow us to stay within our security model We’ve started using Atlassian tools for collaboration

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 12  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send() MPI_Library Impleme nts MPI_Sen d() Impleme nts PMPI_Se nd() Application Calls MPI_Send()... Parallel Application Single Process

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 13  Parallel Applications use the MPI Library for communication  We want to measure time spent in MPI calls Also interested in other metrics Semantics, parameters, etc.  We write a lot of interposer libraries Simple example: Measuring MPI MPI_Library Implements MPI_Send() Implements PMPI_Send() Tool Interposer Library Implements MPI_Send() Calls PMPI_Send() Application Calls MPI_Send()

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 14  This call intercepts calls from the application  It does its own measurement  Then it calls the MPI library  Allows us to measure time spent in particular routines Example Interposer Code int MPI_Bcast(void *buffer, int count, MPI_Datatype dtype, int root, MPI_Comm comm) { double start = get_time_ns(); PMPI_Bcast(buffer, count, dtype, root, comm); double duration = get_time_ns() – start; record_time(MPI_Bcast, duration); }

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 15  See other slide set. Another type of problem: communication optimization