An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.

Slides:



Advertisements
Similar presentations
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
Advertisements

Presented By: Paul Grenning. Deflagration is the ignition and combustion Gasoline deflagrates when lit with a match Detonation is the explosive force.
Progress Report on SPARTAN Chamber Dynamics Simulation Code Farrokh Najmabadi and Zoran Dragojlovic HAPL Meeting February 5-6, 2004 Georgia Institute of.
For a typical white dwarf density of 5  10 8 g cm -3 and a pure carbon environment, the flame thickness is 3.78  cm and the speed is 58 km s -1.
1 Meshes of Trees (MoT) and Applications in Integer Arithmetic Panagiotis Voulgaris Petros Mol Course: Parallel Algorithms.
Collaborative Comparison of High-Energy-Density Physics Codes LA-UR Bruce Fryxell Center for Radiative Shock Hydrodynamics Dept. of Atmospheric,
ASCI/Alliances Center for Astrophysical Thermonuclear Flashes Simulating Self-Gravitating Flows with FLASH P. M. Ricker, K. Olson, and F. X. Timmes Motivation:
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.
Novae and Mixing John ZuHone ASCI/Alliances Center for Thermonuclear Flashes University of Chicago.
An Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Type Ia Supernovae and Cosmology  M ~ 0.3,   ~ 0.7 Smoldering.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Application Performance Analysis on Blue Gene/L Jim Pool, P.I. Maciej Brodowicz, Sharon Brunett, Tom Gottschalk, Dan Meiron, Paul Springer, Thomas Sterling,
An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.
New Coupled Models of Emerging Magnetic Flux in Active Regions W. P. Abbett, S. A. Ledvina, and G.H. Fisher.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Center for Magnetic Reconnection Studies The Magnetic Reconnection Code within the FLASH Framework Timur Linde, Leonid Malyshkin, Robert Rosner, and Andrew.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Chapter 6: User-Defined Functions
Application / User Viewpoint Computer Science Section Head Computational and Information Systems Laboratory National Center for Atmospheric.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
ASCI/Alliances Center for Astrophysical Thermonuclear Flashes Helium Detonations on Neutron Stars M. Zingale, F. X. Timmes, B. Fryxell, D. Q. Lamb, K.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago 1 05/31 ASC Alliances Center for Thermonuclear Flashes, University.
PARAMESH: A PARALLEL, ADAPTIVE GRID TOOL FOR THE SPACE SCIENCES Kevin Olson (NASA/GSFC and GEST/Univ. of MD, Baltimore) Presented, AISRP PI Meeting April,
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
LCSE – NCSA Partnership Accomplishments, FY01 Paul R. Woodward Laboratory for Computational Science & Engineering University of Minnesota October 17, 2001.
An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Overview PI: Doron Kushnir, Institute for Advanced Study, Weizmann Institute ECSS: David Bock, National Center for Supercomputing Applications Type Ia.
1. Systems and Software Development
In-situ Visualization using VisIt
Programming Models for SimMillennium
Alternative Processor Panel Results 2008
TeraScale Supernova Initiative
Parallel Programming in C with MPI and OpenMP
A Cell-by-Cell AMR Method for the PPM Hydrodynamics Code
Presentation transcript:

An Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear Flashes The Flash Code Bruce Fryxell Leader, Code Group Year 3 Site Review Argonne National Laboratory Oct. 30, 2000

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Outline Talk 1 – Bruce Fryxell Overview of Flash Adaptive Mesh Refinement Performance and Scaling Year 3 Integrated Calculation Talk 2 – Paul Ricker Current production version of Flash Flash Code architecture Flash physics modules Code verification Talk 3 – Andrew Siegel Development version of Flash The future of Flash

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago The Flash Code Group Bruce Fryxell Group Leader Andrew Siegel Code Architect Architecture Team Physics Modules Development, Maintenance, Testing Caceres, Ricker, Riley, Vladimirova, Young Calder, Dursi, Olson, Ricker, Timmes, Tufo, Zingale Calder, Linde, Mignone, Olson, Ricker, Timmes, Tufo, Weirs, Zingale

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Overview of Flash Mesh Hydro Nuclear Burning EOSGravityDiffusion Driver Time Dependent Steady Initialization Parallel I/O

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Year Three Upgrades  Evolution to object-oriented code architecture  P. Ricker, A. Siegel talks  PARAMESH  PARAMESH 1  “SHMEM” emulation replaced by native MPI  Unnecessary barriers removed  PARAMESH 2 (K. Olson poster)  Elimination of permanent guard cell storage  Capability to advance solution at all refinement levels instead of just at leaf blocks  Adaptivity in time  Guard filling in one direction at a time  New and upgraded physics modules  P. Ricker talk, many posters

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Other Accomplishments  Parallel I/O  HDF 5  10x improvement in I/O throughput  Documentation  Comprehensive user manual   The physics and algorithms used in Flash   Code release  Friendly users – May 2000  Astrophysics Community – Oct. 2000

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Adaptive Mesh Refinement  Reduces time to solution and improves accuracy by concentrating grid points in regions which require high resolution  PARAMESH (NASA / GSFC)  Block structured refinement (8 x 8 x 8 blocks)  User-defined refinement criterion – currently using second derivatives of density and pressure

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Flash / PARAMESH Block Guard Cells Interior Cells

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago PARAMESH Tree Structure  Each block contains n d zones in d dimensions  Blocks stored in 2 d -tree data structure  Factor of 2 refinement per level  Blocks assigned indices via space-filling curve Refinement Level

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Load Balancing  Work weighted Morton space filling curve  Performance insensitive to choice of space filling curve  Refinement and redist- ribution of blocks every four time steps

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Example – X-ray Burst

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Performance Optimization  Single processor tuning  Reduction in number of square roots and divides  Loop fusion to eliminate unneeded arrays  Elimination of scratch arrays  Removal of unnecessary array copies and initializations  Replacement of string comparisons by integer comparisons  Use of vendor-supplied math libraries  Modification of often-used routines to permit in-lining on ASCI Red  Result  90 Mflop/s on 250 MHz R10000 (64 bit)

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Performance Optimization  Parallel optimization  Use of Jumpshot and to identify problem areas  Removal of unnecessary barriers  Packing of small messages in tree portion of code  Result  Good scaling to processors  238 Gflop/s on 6420 processors of ASCI Red for the year 3 integrated calculation  2000 Gordon Bell prize finalist

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling qConstant work per processor scaling q Shock tube simulation q Two-dimensional q Hydrodynamics, Adaptive Mesh Refinement, gamma-law equation of state q Relatively high communication to computation cost

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling - Constant Work Per Processor Flash 1.6 – May 30, 2000

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling - Constant Work Per Processor

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling - Constant Work Per Processor

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling qFixed problem size scaling q Cellular detonation q Three-dimensional q Uses most of the major physics modules in the code q Relatively low communication to computation cost

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling – Fixed Problem Size

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Scaling – Fixed Problem Size

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Summary of Scaling  As number of blocks per processor decreases, a larger fraction of the blocks must get their guard cell information from off processor  This causes deviation from ideal scaling when the number of blocks per processor drops too low  Of the three ASCI machines, this effect is most noticeable on Red, due to its relatively small memory per processor  Significant variation in timings on Nirvana between identical simulations

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Summary of Scaling  Significant improvement in cross-box scaling on Nirvana can be achieved by tuning MPI environment variables  Scalability on Blue Pacific is highly dependent on operating system revisions  Parallel efficiency for memory bound jobs  > 90% on Blue Pacific and Red  > 75% on Nirvana  Typical performance – 10-15% of peak on 1024 processors

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Integrated Calculation Cellular Detonation In A Type Ia Supernova See also: J. Truran talk F. Timmes poster Evening demos

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Why a Cellular Detonation?  Two of our target astrophysics calculations (X-ray bursts and Type Ia Supernovae) involve detonations  We can not resolve the structure of the detonation front in a calculation which contains the entire star  Want to do a study of a small portion of the detonation front to see if a subgrid model is necessary to compute  The detonation speed  The nucleosynthesis  This problem exercises most of the major modules in the code and thus serves as a good test of the overall code performance

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Integrated Calculation  1000 processors on ASCI Blue Pacific  Effective grid size (if fully refined)  256 x 256 x 5120 = 335 million grid points  Actual grid size  6 million points at beginning of calculation  45 million points at end of calculation  Savings from using AMR  40-50x for first half of calculation  7x at end of calculation  Total wall clock time ~ 70 hours

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Integrated Calculation  Generated 1.2 Tbyte of data  Half of wall clock time required for I/O  0.2 Tbyte transferred to ANL by network for visualization  Used GridFTP to transfer files  7 parallel streams to 7 separate disks  Throughput ~ 4 Mbytes/s  Total transfer time < 1 day

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Integrated Calculation 6 level 5 level Simulation Time (10 -8 s)

The ASCI/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Summary  Substantial progress made in Year 3 in improving and extending Flash  Flash is now being used to address many of our target astrophysics problems and is producing important scientific results  Flash achieves good performance on all three ASCI computers and scales to thousands of processors  Large 3D integrated calculation completed on ASCI Blue Pacific and data successfully transferred back to Chicago for analysis and visualization