Download presentation
Presentation is loading. Please wait.
Published byBeatrice Oliver Modified over 9 years ago
1
Towards large-scale parallel simulated packings of ellipsoids with OpenMP and HyperFlow Monika Bargieł 1, Łukasz Szczygłowski 1, Radosław Trzcionkowski 1, Maciej Malawski 1,2 1 Department of Computer Science, 2 Academic Computer Centre Cyfronet AGH University of Science and Technology CGW 15 Kraków, 28 October 2015
2
Outline Packing of ellipsoids – problem description Parallelization requirements Thread-level parallelization with OpenMP Task-level parallelism using HyperFlow Workflow execution using pilot jobs Experiments on Zeus cluster and results
3
Packing of ellipsoids – problem description Our goal is to obtain the highest packing fraction of ellipsoids of different shapes (axes ratio) still preserving randomness (in the sense of position and spatial orientation) of the bed. For this the Force Biased algorithm was adapted.
4
Algorithm description Time step of the calculation: calculate the ‘forces’ between pairs of overlapping particles, proportional to the size of the overlap, move (and possibly rotate) particles according to the resultant ‘forces’, reduce the particles’ diameter slightly (small reduction rate increases the density AND the execution time).
5
Requirements for parallelization Single simulation Computing overlapping regions between ellipsoids C++ code, in-house developed Multiple-nested loops Task-level parallelism (parameter study) Need to execute multiple simulations Vary particle shape, rotation factor, etc. Repeated runs to gather better statistics
6
Parallelization with OpenMP Sequential version: 8 minutes Using parallel for #pragma omp parallel for private(ipart) schedule(static) for (ipart = 0; ipart < No_parts; ipart++) { forces() method –down to 3 min 38 sec motion() method – down to 3 min 24 sec force_all() method – down to 3 min 23 sec
7
OpenMP speedup Parallel speedup on a single node for a system of 10000 molecules Zeus cluster Node: 2 x 6-core Intel Xeon L5640 processors 12 cores total Intel compiler.
8
HyperFlow - introduction Simple high-level workflow description + low-level programming capabilities for advanced developers Skilled programmers can be as productive as in any mainstream programming language Lightweight, non- invasive workflow deployment model that can be applied to various cloud platforms / infrastructures Processes = workflow activities Connected through signals (ins and outs) Can be mapped to commands OR JavaScript functions (Node.js) { "processes": [ { "name" : "ComputeStats", "ins" : [ "Data" ], "outs" : [ "Statistics" ], "config" : { "command" : { "executable" : "cstats.sh", "args" : "$Data_filename" } } }, { "name" : "PlotChart", "ins" : [ "Statistics" ], "outs" : [ "Charts" ], "function" : "plotCharts.js", } ], "signals": [ { "name" : "Data" },... ] } Simple JSON format, easy to generate Supports large-scale workflows
9
Parameter-study workflow Command tasks – call external executables Function tasks – evaluated in the workflow engine
10
Workflow generator using template Generation of Cartesian product of parameters Configurable repetition for averaging var ifCoord = [ "1" ]; var numberOfParticles = [ "1000" ]; var numberOfSpecies = [ "1" ]; var forceScalingFactor = [ "0.1" ]; var rotationScalingFactor = [ "3.00" ]; var diameterIncreasingFactor = [ "0.01" ]; var cellsX = [ "20", "40" ]; … var diameterOfParts3 = [ "1.0" ]; var numberOfLinesPerPage = [ "56" ]; var numberOfStepsBetweenPrintouts = [ "100" ]; var numberOfStepsBetweenCoord = [ "1000000" ]; var numberOfStepsBetweenRotations = [ "1" ];
11
Setup on Zeus using pilot jobs Master node executed as interactive job Worker nodes submitted as batch jobs Using parallel file system for data exchange PBS Scripts for submission TODO diagram
12
Sample workflow execution Experiment with 40 computing tasks 40 pilot jobs submitted, 12 cores, 3 hours each Max 10 jobs running concurrently Total time < 2 hours Mean execution time 18 minutes Total 144 core-hours
13
Density packing results Semiaxes: 1 : a b : a 0 (prolate) 1 (oblate) Particles with a given axes ratio have a unique random packing density. Ellipsoids with ≈ 1.7 and ≈ 0.5 can be packed the densest
14
Conclusions Packing of ellipsoids proved to be a good application to parallelize Hybrid parallel model used – OpenMP within a single node – HyperFlow for large-scale workflow New deployment model of HyperFLow with pilot jobs tested 5000 CPU hours on Zeus consumed so far
15
Future work More large-scale runs Better automation of pilot-job management Generalization of parameter-study workflow Support for sensitivity analysis Deployment and tests on other infrastructures Clouds Containers Other parallelization options GPU, CUDA
16
References 1.Bartosz Balis, HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows, Future Generation Computer Systems, Volume 55, February 2016, Pages 147-162, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2015.08.015. http://dx.doi.org/10.1016/j.future.2015.08.015 2.Mościński, J., Bargieł, M., “C-language program for simulation of irregular close packing of hard spheres”, Computer Physics Communication 64 (1991) 183-192.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.