Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009
Computational Quality of Service Definition: The ability to change a simulation code (a collection of CCA components) on-the-fly in order to maintain optimality –Optimality: Determined by a user-defined cost function of simulation behavior and/or solution properties Choose components to make the resultant code –Fast –Robust –Accurate Component behavior/performance depends on the problem at hand (i.e. the input) –Requirement: quantification of problem “difficulty” –Requirement: metrics for component performance and input difficulty Competing constraints
Theoretical Needs What metrics define performance? –FLOPS, iterations to convergence, % load imbalance What metrics define robustness? –Convergence failure, bad load-imbalance, etc. What metrics define accuracy? –Global, local, statistical, deterministic, etc. All metrics are functions of the component’s input We need a model that, given component inputs and machine characteristics, predicts the component’s performance
Practical Needs To make performance models, we need –A collection of components to choose from –A test harness for components –A performance measurement tool – TAU –A database for storing empirical performance data –Statistical tools for model making To use performance models and make adaptive codes, we need a control system that contains –An optimization system to choose the “best” component –A feedback system that can take corrective action if a bad component is chosen by the optimization system
Our Interests Problem: Create a control system that can choose the best load-balancer for a simulation –Load per grid cell varies in both space and time over Cartesian mesh What numerical techniques lead to imbalance? –Adaptive Cartesian meshes –Some operator-split constructions What applications show such behavior? –Hydrocarbon combustion –Astrophysics – Type II supernovae Net result: Simulation becomes load-imbalanced
Considerations Solution: Repartition frequently using fast, dynamic load-balancer –Speed achieved mainly by sacrificing partition quality –Some are partial to load-balance, others minimize communication time or data migration during repartition Physics and numerics determine if the simulation is computation- or communication-dominated, so –Same load-balancer may not work throughout the run –We need to choose load-balancers anew every time we repartition
Control System Configuration What would a control infrastructure for an analytical control law look like? Partitioner-C DriverPartitioner Mesh Partitions Driver Meta- Partitioner (if-then-else) Mesh Characterizer Partitioner-B Partitioner-A Mesh PartitionsMesh Mesh, Control law
Load-Balancer Selection Model the simulation to formulate metrics that depend on the current state –e.g., communication/computation cost Characterize the dynamic load-balancers with simplified metrics –e.g., communication time, data migration effort, grid shape, runtime Develop rules to pair simulation state with appropriate partitioner –Implement a “meta-partitioner” to select a load- balancing partitioner using the rules Essentially, the code adapts to the problem!
Control Systems Research Mostly done by J. Steensland & H. Johansson –Johansson H.; Design and Implementation of a Dynamic and Adaptive Meta-Partitioner for Parallel SAMR Grid Hierarchies Have a set of parameterized load-balancers Modeled relationship between mesh characteristics and load-balancer inputs that lead to optimal partitions Performed tests to predict if, given a mesh, the model can predict the best load-balancer –It cannot predict it reliably, but –Provides set of (~10) candidates that contains best one –Brute-force solution: test all candidates and select best (takes ~10 seconds)
Required Components Essentially, things in the CCA toolkit –Simulation components: a mesh, some integrators, some linear solvers, some physics components, etc. –A variety of load-balancers And a control system to choose load-balancers
2D mesh already exists Part of the tutorial and toolkit Parallel capabilities Works in Bocca Used in reaction-diffusion problems, with multiple integration techniques Can accommodate slab-wise and block- wise decomposition No connection to load-balancers yet –Does its own simple domain decomposition Great for tutorials, but too simple for CQoS Mesh Component Status
Over the next 6 months... Extend mesh to 3D to tackle harder problems Lemaster, Stone, & Gardiner (2007)
Over the next 6 months... Extend mesh to 3D to tackle harder problems Extend it to incorporate domain decomposition beyond slab- and block-wise
Over the next 6 months... Extend mesh to 3D to tackle harder problems Extend it to incorporate domain decomposition beyond slab- and block-wise –Sub-domains consisting of a disjoint set of abutting rectangles Design ports to load-balancers Identify more interesting applications for use in CQoS testing –Construct any extra components needed –Solve the problem; quantify the degree of difficulty
Results to come! Contact info: Nicole Lemaster Sandia National Labs