Download presentation
Presentation is loading. Please wait.
Published byClement May Modified over 9 years ago
1
Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)
2
Simecs – Problem Description ● Multi-Experiment Computational Studies: – Computational Studies involving multiple experiments, each corresponding to an individual execution of a simulation software ● Example: Design Space Exploration – Goal: Given a set of possible parameter values (a parameter space), an experiment that maps a parameter value to a performance metric, find a subset of the parameter space whose performance metrics fit certain criteria.
3
Simecs – Problem Description ● Model Application: Pareto Frontier Discovery. ● Pareto Frontier is a set of points on the parameter space that is not completely dominated by any other point in the parameter space. – p “completely dominates” q iff there is all components in p's performance metric perform better than q's.
4
Simecs – Pareto Frontier Insights ● Simulations are independent – embarrassingly parallel ● An experiment corresponds to an execution of a simulation software, which can itself be parallel or sequential ● Result from one simulation can be used to speed up simulations of nearby parameter values (e.g., as initial guess for Newton Iteration.)
5
Simecs – Pareto Frontier Insights ● Decisions can be made with imprecise results: can trade off precision Vs resources ● If parameter space is large, sweeps are inefficient. ● Need to prune portions of the space as the study progresses, either automatically or interactively. ● Active Sampler can automatically pick "interesting" simulations (e.g., close to boundary)
6
Simecs – Example Problem ● Bridge design computational study: 1D bridge in 2D space, with end points clamped. Two elastic supports are added to the middle of bridge. ● Parameter space: distance of the two supports from the end of the bridge. ● Performance measures: maximum deflection of the bridge, and the cost of supports ● Bridge is clamped at all support points, with bending and stretching forces, and uniform load.
7
Simecs – Example Problem Test Problem. Parameter: Performance metric:. Cost function: c(r)
8
Simecs – Goal ● Simecs: Software on parallel systems that manages simulation processes in a Multi- Experiment Computational Study. ● Frees users and application developers from micromanaging every simulation process ● Goal: Interactive, Steerable Design Space Exploration
9
Simecs – User View ● Two types of parameters – technique parameters (e.g., discretisation of nodes, convergence tolerance) – model parameters (e.g., young's modulus of a material, viscosity of a fluid). ● Goal: As the Pareto frontier obtained from one set of parameters is forming, the user can switch to another setup and continue the study. – e.g., Limit the exploration space but increase the resolution.
10
Simecs – Developer View ● Application Developer provides 3 modules: – Simulation: Maps a parameter space point to performance space point – Visualisation & interaction: Displays the relevant information to user; Collects information from user, and maps the information into the Simulation module – Transformation: Transform a state of a simulation on one technique parameter into another. ● e.g., interpolate checkpoints from different resolutions
11
Simecs – System View ● Shared object layer, Active sampler, Resource Allocator
12
Simecs – System View ● Shared object space layer: System-wide repository of shared objects (e.g., checkpoints, error estimations, results) ● Sampler: Based on users' specifications, issues sample points where simulations will be run ● Resource Allocator / Manager: Maps simulations into computing elements, decides whether to use a checkpoint.
13
Simecs – SISOL ● Spatially-Indexed Shared Object Layer (SISOL) ● Used for storing system-wide shared objects. ● For the model problem, checkpoints, and results (performance metric at each parameter point). ● names a unique object in the system.
14
Simecs – SISOL ● Objects are typed: SISOL requires pack() and unpack() implementations for each type. For parallel object types, also requires a function to map parallel objects into different decompositions. ● Supports split-phase create, delete, read and write: to enforce read-modify-write consistency ● Supports neighborhood query
15
Simecs – SISOL Implementation ● Ideal implementation: directory-based cache, where each node participates in storing of objects. ● Current implementation: – Single TCP Server – In core – Hash-map based lookup – Linear lookup for nearest neighbor – Supports only sequential objects
16
Simecs – SISOL Implementation – Object sets created on server – Nearest neighbor query retrieves coordinates only – Supports Sequential Petsc Vector object type by default. ● Sufficient for small sets, small objects
17
Simecs – SISOL Use ● Current Pareto Frontier problem uses two object sets: – Result set (parameter point => performance metric) – Checkpoint set (parameter point => Sequential Petsc vectors) ● In the test problem, parameter point is a 2D vector, so result set & checkpoint set have 2D indices.
18
Simecs – FUEL ● Frame/Update Exchange Layer: Control layer between the manager and simulation processes ● Codes that represent a functional aspect of a steerable application are grouped together (called a Satellite). ● Event-based on manager process; Poll- based on simulation processes ● Dynamic model: Satellites can be activated and decommissioned as a simulation is running
19
Simecs – FUEL Interaction ● As simulator runs one simulation for a parameter point, the manager is processing the last one(s). Simulator Process Manager Process Calculate point X Query Sampler, gets point Y Time Register X result, Query Sampler, get point Z Calculate point YCalculate point Z Register Y result, Query Sampler, get point A X result Y Z result A Y result Z
20
Simecs – Active Sampler ● Resolves the pareto frontier progressively – Maintains a task queue and a result set – Task queue = points in parameter space of interest, result set = points discovered so far that are undominated (i.e., current pareto set candidates) – Seeds a task queue with points from a lattice on the parameter space. – Run the task queue.
21
Simecs – Active Sampler – For each result that comes back, decide if the point is undominated by all points in the result set. If so, remove all points in the result set that are dominated by it, add it to the result set, and insert its lattice neighbors into the task queue. – Continue until task queue is empty. – Refine the lattice, then repeat ● Effect: result set contains a set of pareto point candidates that had originated from a lattice. The lattice is finer as more time is spent.
22
Simecs – Active Sampler Initial Grid
23
Simecs – Active Sampler 1 st level results
24
Simecs – Active Sampler First Level Pareto Frontier
25
Simecs – Active Sampler First Refinement
26
Simecs – Active Sampler 2 nd level results
27
Simecs – Active Sampler Second level Pareto Frontier
28
Simecs – Active Sampler 2 nd Refinement
29
Simecs – Active Sampler 3 rd level results
30
Simecs – Active Sampler 3 rd level Pareto Frontier
31
Simecs – Manager ● Spawns off simulation processes ● When the result of a simulation comes back (via a FUEL callback): – Registers the result – Asks active sampler for the next point to run – Looks up the SISOL for a checkpoint to jump- start the next point – Sends the parameters of the next simulation, coordinates of the checkpoint, and error tolerances to the simulation process.
32
Simecs – Test System ● Single Server implementation of SISOL to store checkpoint set ● 3 Versions Samplers: Active, Random, and Sweep ● TCP-based FUEL ● Simulation implemented with PETSc SNES solver. ● Jump-start from Checkpoints = use checkpoint's configuration as the starting guess
33
Simecs – Test System ● Heterogenous cluster: – 1 1.5GHz Athlon node (manager, SISOL Server), – 22 1.2GHz Duron nodes (simulation processes) – 10 3 GHz Pentium 4 nodes. (simulation processes) – 100Mbps switched Ethernet network between Athlon and Duron nodes, 10Mbps Ethernet between Pentium 4 nodes.
34
Simecs – Test Result (Sampler) ● Active Sampler compared against: 1) Grid- based sampler, which performs a parameter sweep on the grid with increasing refinement, 2) Random sampler ● Both run for 1500 simulations, and the partial frontiers are dumped at periodic intervals. Housedorff distance is measured, using the final Active Sampler-based frontier with 1500 simulations as the ground truth.
35
Simecs – Test Result (Sampler)
46
Simecs – Test Results (Sampler)
52
Simecs - Test Result (Checkpoints) ● Cuts down number of iterations per simulation.
53
Simecs – Test Result (Scaling) Duron nodes added (Slower speed, faster communication)
54
Simecs – Test Result (Scaling)
55
Simecs – Conclusions ● Multiple experiments can be managed automatically ● Interactive speed can be achieved via re-use of checkpoints, active sampling, and partial results – run time goes from 3088 seconds down to 17, and lower if partial frontiers can be used
56
Simecs – Conclusions ● TCP-based communication framework provides system with portability - can be used on heterogeneous clusters ● Spatially-indexed object sets are useful communication substrate
57
Simecs – Future work ● Distributed implementation of SISOL ● Parallelise individual simulations (SISOL Support for Parallel Objects) ● MPI-based communication for SISOL and FUEL ● Interactivity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.