Download presentation
Presentation is loading. Please wait.
1
Sim-X: Parallel System Software for Interactive Multi-Experiment Computational Studies Siu-Man Yau, (smyau@cs.nyu.edu), New York University Eitan Grinspun (eitan@cs.columbia.edu), Columbia University Vijay Karamcheti (vijayk@cs.nyu.edu), New York University Denis Zorin (dzorin@cs.nyu.edu), New York University
2
Computational Studies ● Computer Simulation has become an integral part of the scientific method ● Wide-spread use of Computational Studies in science and engineering ● Much work done to speed up individual simulations ● But Computational Studies can involve 100s, 1000s, or 10000s of simulations...
3
Computational Studies Examples Car Chassis Design Defibrillator Design
4
Computational Studies & Systems ● Current Runtime Systems: – Limited Interactivity – Explicit control of individual experiments ● SimX: – Frees users and application developers from micromanaging every simulation process – Interactive, Steerable Computational Studies Notes: In traditional batch system, each simulation has to be managed, or no control can be exerted over them one the job is submitted. E.g., in parameter sweep.
5
Related Work ● Parameter Sweep schedulers – Condor – Globus – Virtual Instrument – Nimrod/O ● Computational Steering Infrastructures – Falcon – CUMULVS – SCIRun – CSE
6
Example Computational Study ● Bridge design computational study ● Goal: Find the support placements so as to: – Minimize max deflection and construction cost Study details: 1D bridge in 2D space, with end points clamped. Two elastic supports are added to the middle of bridge. Bridge is clamped at all support points, with bending and stretching forces, and uniform load. Performance measures: maximum deflection of the bridge, and the cost of supports Parameter space: distance of the two supports from the end of the bridge Simulation: Newton Iteration w/Line Search
7
Example Study: Pareto Frontier ● Pareto Frontier: set of designs that cannot be improved in both performance dimensions Study details: 1D bridge in 2D space, with end points clamped. Two elastic supports are added to the middle of bridge. Bridge is clamped at all support points, with bending and stretching forces, and uniform load. Performance measures: maximum deflection of the bridge, and the cost of supports Parameter space: distance of the two supports from the end of the bridge Simulation: Newton Iteration w/Line Search Design Space Performance Space Support 1 Support 2 Total Cost Max. Deflection
8
Pareto Frontier Discovery ● Pareto Frontier Discovery is Multi-Experiment Study ● Experiments can be independent, but – Potential to trade independence for performance ● Active Sampling ● Reuse checkpoints ● Trade precision for performance ● Early user feedback ● Need domain knowledge, permeable interface Notes: Trading independence for performance: 1) Result from one experiment can be used to speed up simulations of nearby parameter values (e.g., as initial guess for Newton Iteration.) 2) Portions of the space can be pruned as the study progresses, if they are far away from the pareto frontier Trading precision for performance: 1) Decisions can be made with imprecise results – if, even considering the upper bound of error with a given error tolerance or mesh size, we decided a point is not on pareto frontier, then there is no point running a simulation to completion. 2) A "fuzzy" frontier often provides enough information for the users to make some design decisions. More domain knowledge, more efficient the system.
9
SimX Architecture Notes:
10
Shared Object Layer ● Spatially-Indexed Shared Object Layer (SISOL) ● Used for storing system-wide shared objects – E.g., checkpoints and performance metric names a unique object in the system. Justification: Objects are all related to a point in design space. Spatial relation contains information about the objects: e.g., closer checkpoints may want to be stored on the same processor, because manager can start near-by simulations on that processor Typed objects: SISOL requires pack() and unpack() implementations for each type. For parallel object types, also requires a function to map parallel objects into different decompositions. User-defined types are supported via this interface. Split-phase create, delete, read and write: to enforce read-modify-write consistency A C B Support 1 Support 2
11
Shared Object Layer (cont’d) ● Neighborhood query ● Typed objects ● Split-phase operations ● Current implementation – TCP server – In core – Hash map
12
Active Sampler ● Resolves the pareto frontier progressively ● Issues experiments on the Design Space ● Maintains a set containing undominated points encountered so far ● The set of undominated points gives an approximation to the pareto frontier, but approaches the pareto frontier as more time is spent Notes: Detailed Algorithm: Maintains a task queue and a result set. Task queue = points in parameter space of interest, Result set = points discovered so far that are undominated (i.e., current pareto set candidates). Seeds a task queue with points from a lattice on the parameter space. Run the task queue. For each result that comes back, decide if the point is undominated by all points in the result set. If so, remove all points in the result set that are dominated by it, add it to the result set, and insert its lattice neighbors into the task queue. Continue until task queue is empty. Refine the lattice, then repeat
13
Active Sampler (cont’d) Initial Grid 1 st level results First Refinement 2 nd level results 2 nd Refinement 3 rd level results
14
Active Sampler (cont’d) 1 st Level Pareto Frontier 2 nd level Pareto Frontier 3 rd level Pareto Frontier ● Progressive approximation ● Enable early decision based on partial results
15
Evaluation ● Ease of Integration ● Benefits of Active Sampling ● Benefits of Checkpoint reuse ● Early Resolution of Pareto Frontier ● Scaling behaviour Notes: Simulation implemented with PETSc SNES solver. Jump-start from Checkpoints = use checkpoint's configuration as the starting guess Sampler Evaluation: Active Sampler: as described. Grid-based sampler performs a parameter sweep on the grid with increasing refinement Both run until the 4 th refinement Scaling behaviour: Run on 1 to 128 simulation containers (processors) w/ 4 SISOL servers and 1 manager
16
Evaluation Testbed ● Hardware: Max cluster – 256 nodes – 2GB RAM, 2 2.2 GHz PowerPC 970 per node – Myrinet interconnect ● Software: – 1 to 128 Simulation Processes – 4 SISOL Servers – 1 Manager Process Notes: Simulation implemented with PETSc SNES solver. Jump-start from Checkpoints = use checkpoint's configuration as the starting guess Sampler Evaluation: Active Sampler: as described. Grid-based sampler performs a parameter sweep on the grid with increasing refinement Both run until the 4 th refinement Scaling behaviour: Run on 1 to 128 simulation containers (processors) w/ 4 SISOL servers and 1 manager
17
Evaluation: Integration Front-end Manager Process Simulation Process Pool Visualisation & Interaction Module Active Sampler Resource Allocator FUEL Interface SISOL Server Pool Xform Module Simulation Module FUEL Interface SISOL Server Simulation Process
18
Evaluation: Active Sampling Ntes: Axis: r0 and r1 Grey dots = sampled points, blue dots = current pareto frontier approximation After 1 st refinement Active SamplerGrid Sampler # Experiments: 1735 # Experiments: 1727
19
Evaluation: Active Sampling After 2 nd refinement Active SamplerGrid Sampler # Experiments: 4950 # Experiments: 2584
20
Evaluation: Active Sampling After 3 rd refinement Active SamplerGrid Sampler # Experiments: 18632 # Experiments: 4243
21
Evaluation: Active Sampling After 4 th refinement Active SamplerGrid Sampler # Experiments: 75351 # Experiments: 4526
22
Evaluation: Checkpoint Reuse Time-to-level = Wall clock time required by this configuration to refine the pareto frontier 4 times. All taken using 128 processors runs. ● Wall clock time cut by an order of magnitude
23
Evaluation: Early Resolution Evolution of partial frontier measured as the Hausdorff distance to the final result Hausedorff distance = maximum distance of a set to the nearest point in the other set
24
Evaluation: Scaling Behaviour Conclusion: Reasonably scalable up to 128 processing elements ● Reasonable scalability up to 128 simulators ● Algorithmic overhead ● Non-uniform problem sizes ● Communication overhead
25
Conclusions ● Multiple simulation experiments managed automatically in a Computational Study ● Exploit Domain Knowledge via a more permeable system interface: – Re-use of checkpoints – Active sampling – Partial results ● Interactive speed (was 5678s, now 13s or lower)
26
Future Work ● Componentise SimX (Scirun & Scirun2 framework) ● Interactivity ● Support for larger-scale studies – Distributed cache-based implementation of SISOL – Support for parallel simulations – MPI-based communication substrate ● Other Computational Studies (Defibrillator)
27
SimX – Internal View Shared Object Layer Front-end Manager Simulation Processes Pool Visualisation & Interaction Module Active Sampler Resource Allocator Simulation Module Simulation Module Simulation Module Simulation Module
28
SimX – Shared Object Layer Shared Object Layer Front-end Manager Simulation Processes Pool Visualisation & Interaction Module Active Sampler Resource Allocator Simulation Module Simulation Module Simulation Module Simulation Module
29
SimX – Explicit Communication Shared Object Layer Front-end Manager Simulation Processes Pool Visualisation & Interaction Module Active Sampler Resource Allocator Simulation Module Simulation Module Simulation Module Simulation Module
30
SimX – Explicit Communication ● As simulator runs one simulation for a parameter point, the manager is processing the last one(s), and sending it the next. Simulator Process Manager Process Calculate point i Registers point i-1 to Active Sampler, Gets point i+1 from Resource Allocator Time Calculate point i+1 Registers point i to Active Sampler, Gets point i+2 from Resource Allocator
31
SimX – Active Sampler Shared Object Layer Front-end Manager Simulation Processes Pool Visualisation & Interaction Module Active Sampler Resource Allocator Simulation Module Simulation Module Simulation Module Simulation Module
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.