Automatic launch and tracking the computational simulations with LiFlow and Sumatra Evgeniy Kuklin
Living system simulation Significant computational recourses. High degree of qualification in CS. Numerous computational experiments on the same model. routine and time-consuming. Low reproducibility of numerical experiments. Living system simulations is a relevant part of modern science. However, scientists who works in this field face some challenges. Simulation of living systems requires significant computational resources. Such investigations are certain to be rather time-consuming and, thus, are hard to be conducted without parallel computing systems and supercomputers. However, the use of parallel computing systems requires a high degree of qualification in computer science, which many researchers involved in living systems modeling do not possess or want. Simulations often requires numerous computational experiments on the same model for different parameter values. Thus, the data preparation for computational experiments is routine and time-consuming. Another important problem that researchers often face is non-reproducibility of computational experiments. Scientists devote little time to keeping the records of experimental details, especially in case of hundreds of experiments. In addition, many other factors can affect computational results
Personalized heart models project IIP, IMM UrB RAS; UrFU. Biophysicists, mathematicians, software developers. Heart simulation software for parallel computing systems. Only professional computer scientists can use it. Almost zero reproducibility of experiments. The example that confirms these simulation problems is the Personalized heart models project which is the joint labor of Institute of Immunology and Physiology, Krasovskii Institute of Mathematics and Mechanics and Ural Federal University The project employs Biophysicists, mathematicians, software developers and many other people with different specializations. We have heart simulation software for parallel computing systems, but it is rather complex and can only be used by professional computer scientists. Dozens of different parameters in hundreds of experiments reduce the reproducibility to almost zero.
Personalized heart models project Requirements for biophysicists & mathematicians: Simple graphical user interface. Ability to execute a series of experiments with various parameter values. Reproducibility requirements: Archive data storage. Metadata capturing for easy replicating. Therefore there was a need in automated launch system with the following requirements: Storing all obtained data in the archive
LiFlow LiFlow – LIving system simulation workFLOW: Graphical interface for the users. Executing a series of experiments on parallel computing systems. Easy to learn and use. To address the first two Requirements, we developed LiFlow - a LIving system simulation workFLOW The tool must be easy to learn and use, otherwise busy scientists will not spend their time to investigate its possibilities.
Related work Scientific workflow systems: Taverna, Kepler, Triana. Systems for reproducibility of computational experiments: CDE, Madagascar, Sumatra. To bridge the gap between researchers and software engineers scientific computation workflow systems are being developed. The most frequently used among them are Taverna, Kepler, and Triana. Howeever, such systems are very complicated and difficult to install, maintain, and use. Their main disadvantage is the fact that creating a new component can require considerable efforts and a detailed knowledge of the workflow system architecture. It is not suitable for our simplified worklow. Also, there are special software tools for improving the reproducibility of computational experiments. To address the reproducibility requirements of our system, we decide to integrate LiFlow with Sumatra tool, becouse it is focused on numerical computations.
Sumatra Capture the information required to recreate the computational experiment environment instead of capturing the experimental context itself. Keep up experiment catalog with ability to search by tags or data. Store obtained data in the archive. (-) Lacks a convenient desktop user interface. Sumatra is an open source tool to support reproducible computational research. Unfortunately, Sumatra lacks a convenient desktop user interface.
Workflow in LiFlow LiFlow system is designed for the simplified workflow shown on the slide. During the first stage, researchers prepare the description of the experiment series, which is a set of experiments with the same model and varying parameter values.
Computational Package The information required to execute a series of experiments: Source code (git repository). Initial data and parameters. Generator of experiment series. Supported generators: Parameter: initial value, final value, and increment. Explicit parameter values. LiFlow uses the concept of a computational package that contains all the information required to execute a series of experiments. The LiFLow computational package consists of the following components: - - - A distinctive feature of the LiFlow computational package is that it describes not one experiment but a whole series. The parameter values for every experiment are produced by the generators
How LiFlow works User provides the computational package. Computational package is transferred to the supercomputer using SSH. The source code is build on the supercomputer. Experiment series generator (Python script) is executed and produce the input data with different parameter values. Sumatra project is set up to store the environment. The jobs are putted into the supercomputer resource manager (SLURM) queue. After job completion Sumatra compresses and places obtained data to the archive using the NFS protocol. User received a e-mail notification. A few words how LiFlow works. 2) ...with Paramiko library 3) ...If the build process fails, LiFlow warns the user and sends back to him the build log file. In the case of a successful compilation, the system runs the generator of the experiment series to produce the input data. 5) After that, LiFlow calls the Sumatra Module that makes the rest of the work. First of all, Sumatra project is set up to store the environment. 6) Next, the jobs... 8) At the end of all computations the User received a e-mail notification.
LiFlow architecture Here you can see LiFlow system architecture. The LiFlow system is responsible for the experiment preparation, and the Sumatra module is responsible for capturing metadata and results and launching the jobs on the supercomputer.
Sumatra technical details Default project LiFlow Storing the project information in the project local directory. Using shared database file to create the single database of all experiments of all users. Available by --store option. Manual job launch. Sumatra direct interaction with the SLURM Workload Manager. Available by option --launch_mode=slurm-mpi. Storing the output data in the project local directory. Using shared folder via NFS to create the single archive for all system users. Available by --archive option. This slide shows the adoptation of default sumatra project to the single system integrated with LiFlow. -By default, Sumatra store a project information in the project local directory. In order to create the single database of all experiments of all users, I use the available --store record option to set the shared database file. -Sumatra can directly interact with the SLURM Workload Manager, which handles the supercomputer. So I abandon my previous implementation of setting tasks on computation and gave the control to Sumatra. -Sumatra has an embedded option for compression and placing obtained data to the archive. With the --archive option we can set the shared folder to create the single archive for all system users. Though the original data by default is removed from the user's home directory, it is possible to disable this feature and make copies only, leaving users to do with their data whatever they want.
Conslusion LiFlow: LiFlow + Sumatra: Approbation: Convenient graphical user interface for executing the series of experiments. LiFlow + Sumatra: Archiving experimental results and metadata for reproducibility. Approbation: Supercomputer “Uran” of the Krasovskii Institute of Mathematics and Mechanics. Computational cluster of the Ural Federal University. Lets draw a conclusion. Used in tandem, LiFlow and Sumatra provide the ability to combine the advantages of both systems in one solution. While LiFlow allows one to execute a series of computational experiments on parallel computing systems using a convenient interface, Sumatra automatically captures and stores the experimental environment in order to improve the experiments' reproducibility. Nowadays, the LiFLow system is integrated with the URAN supercomputer at the Krasovskii Institute of Mathematics and Mechanics and the computational cluster of the Ural Federal University. Unfortunately for now we are unable to join two supercomputers into the single system. However, Sumatra has embedded options that provides the opportunities to do this.
Conslusion The system is intended to be used by researchers in mathematical biology and biophysics without extensive knowledge in parallel computing. Is suitable for other research projects with own simulation code. The system is intended to be used by researchers in mathematical biology and biophysics without extensive knowledge in parallel computing. In contrast with existed workflow systems, LiFlow is more suitable for projects in which researches write the simulation code by themselves. So, with a little improvement, it can be used in other fields of computational science.
Thank you for attention