TAXI Code Overview and Status Timmie Smith November 14, 2003.

Slides:



Advertisements
Similar presentations
Computer Science 320 Clumping in Parallel Java. Sequential vs Parallel Program Initial setup Execute the computation Clean up Initial setup Create a parallel.
Advertisements

A LEGO-like Lightweight Component Architecture for Organic Computing Thomas Schöbel-Theuer, Universität Stuttgart
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Meshless Elasticity Model and Contact Mechanics-based Verification Technique Rifat Aras 1 Yuzhong Shen 1 Michel Audette 1 Stephane Bordas 2 1 Department.
Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu, Mauro Bianco, Nancy Amato Parasol Lab, Department of Computer.
Dynamic adaptation of parallel codes Toward self-adaptable components for the Grid Françoise André, Jérémy Buisson & Jean-Louis Pazat IRISA / INSA de Rennes.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
CSE351/ IT351 Modeling and Simulation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
METO 621 Lesson 15. Prototype Problem #1 Semi-infinite slab.
PContainerARMI Communication Library Oil well logging simulation MPIOpenMPPthreadsNative pAlgorithmspContainers User Application Code pRange STAPL Overview.
Thermal Radiation Solver
Product Line Architecture for a Family of Meshing Tools María Cecilia Bastarrica, Nancy Hitschfeld-Kahler, Pedro O. Rossel Computer Science Department,
STAPL The C++ Standard Template Adaptive Parallel Library Alin Jula Department of Computer Science, Texas A&M Ping An, Silvius Rus, Steven Saunders, Tim.
C++ fundamentals.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
S/W Project Management Software Process Models. Objectives To understand  Software process and process models, including the main characteristics of.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Particle Transport Methods in Parallel Environments
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Parallel Solution of the 3-D Laplace Equation Using a Symmetric-Galerkin Boundary Integral.
STAPL: A High Productivity Programming Infrastructure for Parallel & Distributed Computing Lawrence Rauchwerger Parasol Lab, Dept.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
1-1 Lesson 1 Objectives Objectives of Course Objectives of Course Go over syllabus Go over syllabus Go over course Go over course Overview of Course Overview.
Parallel MDOM for Rendering Participating Media Ajit Hakke Patil – Daniele Bernabei Charly Collin – Ke Chen – Sumanta Pattanaik Fabio Ganovelli.
Data Structures Using C++ 2E Chapter 8 Queues. Data Structures Using C++ 2E2 Objectives Learn about queues Examine various queue operations Learn how.
OBJECT-ORIENTED PROGRAMMING (OOP) WITH C++ Instructor: Dr. Hany H. Ammar Dept. of Electrical and Computer Engineering, WVU.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Lesson 4: Computer method overview
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
5-1 Lesson 5 Objectives Finishing up Chapter 1 Finishing up Chapter 1 Development of adjoint B.E. Development of adjoint B.E. Mathematical elements of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Progress on Component-Based Subsurface Simulation I: Smooth Particle Hydrodynamics Bruce Palmer Pacific Northwest National Laboratory Richland, WA.
Intro to the C++ STL Timmie Smith September 6, 2001.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Parallelization Geant4 simulation is an embarrassingly parallel computational problem – each event can possibly be treated independently 1.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
Parallel Computing Presented by Justin Reschke
An Introduction to Computational Fluids Dynamics Prapared by: Chudasama Gulambhai H ( ) Azhar Damani ( ) Dave Aman ( )
CSCE , SPRING 2016 INSTRUCTOR: DR. NANCY M. AMATO 1.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
CH 1-4 : INTRODUCTION ACKNOWLEDGEMENT: THE SLIDES ARE PREPARED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND.
A Parallel Communication Infrastructure for STAPL
Regarding homework 9 Many low grades
Xing Cai University of Oslo
Spatial treatment in 1D Slab Discrete Ordinates
mps-tk : A C++ toolkit for multiple-point simulation
Abstract Machine Layer Research in VGrADS
CS212: Object Oriented Analysis and Design
Chapter 10 – Software Testing
EE 4xx: Computer Architecture and Performance Programming
Presented By: Darlene Banta
The Transport Equation
Parallel Programming in C with MPI and OpenMP
Higher Level Languages on Adaptive Run-Time System
Presentation transcript:

TAXI Code Overview and Status Timmie Smith November 14, 2003

Transport Problem Example Oil Well Logging Tool Shaft dug at possible well location Radioactive sources placed in shaft with monitoring equipment Simulation allows for verification of new techniques and tools

Problem Specification Given a configuration at time T Spatial domain divided into cells Energy range divided into groups Quadrature set chosen for angular integrals Fixed Sources, Boundary Conditions, and Initial Conditions Compute particle flux at time T +  T Particle flux at time T +  T Reaction rates, etc., during time step

TAXI Data Structures Spatial domain decomposed into Cells Cells are stored in a grid Cell contains Elements Elements are the base spatial data structure Material definitions stored as in time- dependent neutronics problems Number densities stored by cell Microscopic cross sections stored by isotope Element Cell Grid

Aggregation in TAXI Angle Sets Angles in AngleSet generate same dependences Energy Group Sets Cell Sets Code computes aggregation User can specify an aggregation if they want Work function operates on GroupSet-AngleSet-CellSet combination

STAPL Overview Standard Template Adaptive Parallel Library Superset of C++ STL pContainers – generic distributed containers pAlgorithms – generic parallel algorithms pRange - abstract view of partitioned data binds pContainers and pAlgorithms view of dist. data, random access, data dependences Example: parallel sorting a distributed vector stapl::pvector pv(100); … initialize pv … stapl::p_sort( pv.get_prange() );

The STAPL Programming Environment ARMI MPIOpenMPPthreadsNative pAlgorithmspContainers User Code pRange

Grid Overview ArbitraryPGrid is derived from BaseSpatialGrid pGraph Grid is templated on cell type (brick, polyhedral, etc.) element type (whole_cell, corner,...) It contains instantiations of cells (one per graph_vertex) faces (really graph_edges) It contains methods for putting outgoing-surface intensities into graph_edges getting incoming-surface intensities from graph_edges both of these are (angleset  groupset) at a time Graph Vetex Cell Graph Edge

Cell Overview Cell class doesn ’ t contain much. list of isotopes and their number densities list of elements list of vertex indices The “ element ” concept gives us great generality and flexibility. There is one element for each “ fundamental ” spatial unknown. For example: simple methods with one unknown per cell use the “ whole-cell ” element methods with one unknown per vertex per cell use the “ corner ” element

Element Overview Elements contain: volumetric sources (scattering and total) the solution (angular moments for generating the scattering source; also the angular intensities in a time-dependent problem) Most of the code is unaware of problem geometry or discretization method. It just loops over elements Work function processes the Cells in a STAPL pRange

TAXI Setup Energy and Angle objects constructed Grid partition decided Specialized STAPL scheduling algorithms used Partition information distributed Grid constructed in parallel Cells and Elements created and inserted pRanges built on grid Specialized STAPL scheduler determines pRanges pRange dependence graphs are directed

TAXI Algorithm

TAXI Implementation History C++ selected for implementation Wanted to use STL and STAPL Prototype implemented with C++ and MPI STAPL components weren ’ t fully implemented STAPL didn ’ t support distributed memory systems STAPL has advanced ARMI Runtime System provides a consistent base for development Allows STAPL to run on distributed memory systems pContainers provide data distribution and parallel operations pRange supports hierarchical decomposition of data Scheduling components incorporated into STAPL

STAPLing TAXI Custom Data Structures Distributed 3D vector class User managed distribution 3D indexing (e.g. foo[x][y][z]) Stored Cells and CellSets CellSet class Explicit list of graph vertex ids (Cells in the CellSet) List of ids provided cell sweep order

STAPLing TAXI pGraph stores Cells Each pGraph vertex is a Cell pGraph edges identify neighboring cells Arbitrary grids now possible 2D grids also possible pRange used to represent CellSet General representation of dependences between CellSets Multiple execution orders of cells in subrange possible

pRange Dependence Graph Numbers are cellset indices Colors indicate processors angle-set Aangle-set Bangle-set Cangle-set D

angle-set A angle-set B Adding a reflecting boundary angle-set C angle-set D

A second reflecting boundary angle-set A angle-set B angle-set C angle-set D

Opposing reflecting boundary angle-set A angle-set B angle-set C angle-set D

STAPLing TAXI Abstracting Communication Explicit MPI calls in driver function replaced with calls to ARMI methods pGraph uses ARMI to transfer information between threads when necessary ARMI fences replace MPI barriers Fences required during problem initialization Most fences handled inside STAPL

Effects of STAPLing Problem Time to Completion Average 2.16 times slower but improving rapidly Currently 15 global barriers in the ASCI code There is opportunity for improvement

Effects of STAPLing Reduced code size Code reduced ~6,000 lines. Currently 13,000 lines and shrinking Elimination of classes MPI wrappers XYZSpace CellSet Code Reorganization Problem class reorganized Eliminated explicit grid distribution Use of pRanges

TAXI Driver void driver() { BaseProblem* problem = read_problem(*input_stream, *output_stream, *error_stream); problem->solve(); problem->report(); } BaseProblem* read_problem(istream& input_stream, ostream& output_stream, ostream& error_stream){ ProblemInput* pinput = new ProblemInput(input_stream); Hex_cell_creator creator(*pinput); BaseEnergyAggregation egs; XMLObject xml_input = pinput->getTagObject("prototype")[0]; egs.init_from_input(xml_input, output_stream); asci_preprocessing::ASCI_Regular_pRange_Preprocessor* sched = build_brick_scheduler(creator.dimx(), creator.dimy(), creator.dimz(), egs, *pinput); BaseProblem* bp= new GenericSweepProblem (pinput, creator, egs, sched, output_stream, error_stream); return bp; }

Questions?