Productivity Tools for Scientific Computing

Slides:



Advertisements
Similar presentations
Why python? Automate processes Batch programming Faster Open source Easy recognition of errors Good for data management What is python? Scripting programming.
Advertisements

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
© 2010 IBM Corporation IBM Experience Modeler - Theme Editor Installing Python Image Library Presenter’s Name - Presenter’s Title DD Month Year.
Eclipse IDE. 2 IDE Overview An IDE is an Interactive Development Environment Different IDEs meet different needs BlueJ and DrJava are designed as teaching.
Introduction to UNIX/Linux Exercises Dan Stanzione.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Python – Part 1 Python Programming Language 1. What is Python? High-level language Interpreted – easy to test and use interactively Object-oriented Open-source.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
A (very brief) intro to Eclipse Boyana Norris June 4, 2009.
N from what language did C++ originate? n what’s input, output device? n what’s main memory, memory location, memory address? n what’s a program, data?
INFSO-RI Enabling Grids for E-sciencE SCDB C. Loomis / Michel Jouvin (LAL-Orsay) Quattor Tutorial LCG T2 Workshop June 16, 2006.
Eclipse 24-Apr-17.
Introduction to programming Carl Smith National Certificate Year 2 – Unit 4.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Using the ARCS Grid and Compute Cloud Jim McGovern.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Execution ways of program References: www. en.wikipedia.org/wiki/Integrated_development_environment  You can execute or run a simple java program with.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Ansible and Ansible Tower 1 A simple IT automation platform November 2015 Leandro Fernandez and Blaž Zupanc.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
© 2011 Pittsburgh Supercomputing Center XSEDE 2012 Intro To Our Environment John Urbanic Pittsburgh Supercomputing Center July, 2012.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
Review Why do we use protection levels? Why do we use constructors?
Advanced Computing Facility Introduction
PHP Basics and Syntax Lesson 3 ITBS2203 E-Commerce for IT.
Overview of Scientific Workflows: Why Use Them?
The Distributed Application Debugger (DAD)
GRID COMPUTING.
Build and Test system for FairRoot
Welcome to Indiana University Clusters
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Development Environment
4/24/ :07 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Features of Authoring Tools
HPC usage and software packages
Development with Eclipse
Introduction to Eclipse
Welcome to Indiana University Clusters
ITCS-3190.
ATS Application Programming: Java Programming
A Short DOS Presentation
A (very brief) intro to Eclipse
Spark Presentation.
PYTHON: AN INTRODUCTION
A451 Theory – 7 Programming 7A, B - Algorithms.
Introduction to Programming the WWW I
CRESCO Project: Salvatore Raia
Java programming lecture one
Practice #0: Introduction
Concurrent Version Control
PHP / MySQL Introduction
Drupal VM and Docker4Drupal For Drupal Development Platform
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
HP C/C++ Remote developer plug-in for Eclipse
Prepared by Kimberly Sayre and Jinbo Bi
TRANSLATORS AND IDEs Key Revision Points.
Intel® Parallel Studio and Advisor
Compiling and Job Submission
Module 01 ETICS Overview ETICS Online Tutorials
Distributed Algorithms
Introduction to High Performance Computing Using Sapelo2 at GACRC
Overview of Workflows: Why Use Them?
By Rajanikanth B Eclipse IDE Overview By Rajanikanth B
Quick Tutorial on MPICH for NIC-Cluster
Review of Previous Lesson
Working in The IITJ HPC System
A DevOps process for deploying R to production
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Productivity Tools for Scientific Computing Python & ECLIPSE akhilap@iisc.ac.in

HPC workflow Data preparation Data decomposition Write a job script with appropriate parameters (nodes, cores per node) Submit the job to the queue Verify job run status Analyze output – comparative analysis of multiple outputs or analysis of a single output. Repeat!! Automated => Easy to reproduce, reduction in errors, reduce code duplication, speed up research.

Why Python? Python is probably the most versatile language for workflow management. One of its most useful features is its ability to tie things together and automate the execution of other programs. Free, open source with a huge set of libraries/ packages. Python with Snakemake ( a workflow management tool) is a simple and preferred tool for automating large-scale HPC workflows. Snakemake works cross-platform (Windows, MacOS, Linux) and is compatible with all HPC schedulers. More importantly, the same workflow will work and scale appropriately regardless of whether it’s on a laptop or cluster without modification.

Automating qsub using Python template.pbs #!/bin/sh #This job should be redirected to idqueue #PBS -N jobname #PBS -l select=2:ncpus=24 #PBS -l walltime=00:02:00 #PBS -l place=scatter #PBS -l accelerator_type="None" #PBS -S /bin/sh@sdb -V . /opt/modules/default/init/sh #cd /home/proj/16/secpraba cp /home/proj/16/secpraba/${SOURCE}.out /home/proj/16/secpraba/${WEEK}_${SOURCE}.out aprun -j 1 -n 48 -N 24 ./${WEEK}_${SOURCE}.out ${DATA} job_params.csv week42,source1,data1 week43,source2,data2 week44,source3,data3

Automating qsub using Python job_params.csv week42,source1,data1 week43,source2,data2 week44,source3,data3 subprocess.check_output(qsub_command)

Automating HPC Workflow for Openfoam Input Data File for “flow in a lid driven cavity”- {input_data_file} Multiple runs for to test scalability for execution parameters – 50TF, 100TF, 200TF, 400TF (performance) Preprocessing steps for each execution parameter: Extract files using tar -zxvf {input_data_file} change directory to {dir_name} in extracted data run command ./{preprocess_script} compute number of processors for each execution parameter (TF to cores on a given system) compute coefficients for data decomposition which depends upon number of cores above edit {decomposition_config_file} with appropriate cores and coefficients execute {data_decomposition} Submit multiple jobs -one for each execution parameter (TF here)

Automating HPC Workflow for Openfoam compute number of cores YAML file with information about each execution parameter. Read YAML file Interpret each document Compute data decomposition coefficients Extract files change directory For each document run preprocess.sh edit {decomposition_config_file} Decompose Data Submit a job

Automating HPC Workflow for Openfoam

Automating HPC Workflow for Openfoam Jobscript50TF.YAML Jobscript100TF.YAML Create a YAML file with job parameters for each execution Jobscript5200TF.YAML Jobscript400TF.YAML

Automating HPC Workflow for Openfoam compute number of cores Compute data decomposition coefficients Extract files change directory run preprocess.sh edit {decomposition_config_file} Decompose Data Submit a job

Automating HPC Workflow for Openfoam Jobscript50TF.YAML Jobscript100TF.YAML Jobscript5200TF.YAML Jobscript400TF.YAML Python script Jobscript50TF Jobscript100TF Jobscript5200TF Automating HPC Workflow for Openfoam Jobscript400TF

Available Python tools & packages atomic-hpc snakemake

Available Python tools & packages FABRIC/FABRIC 3.0 A tool that enables execution of arbitrary Python functions via the command line A library of subroutines (built on top of a lower-level library) to make executing shell commands over SSH easy and Pythonic.

Eclipse IDE and the Parallel Tools Platform Eclipse is an IDE (Integrated Development Environment) It contains a base workspace and an extensible plug-in system It can be used for app Development in Java, Ada, ABAP, C, C++, C#, Fortran, Julia, Python, PHP R, Scala, Scheme, Groovy, Erlang…… It can also be used to develop documents with Latex and packages for the software Mathematica. Development environments include the Eclipse Java development tools (JDT) for Java and Scala, Eclipse CDT for C/C++, and Eclipse PDT for PHP, among others.

Eclipse IDE and the Parallel Tools Platform Full development lifecycle support Revision control integration (CVS, SVN, Git) Project dependency management Incremental building Content assistance Context sensitive help Language sensitive searching Multi-language support Debugging

Type an incomplete function name e. g Type an incomplete function name e.g. “get” into the editor, and hit ctrl-space Select desired completion value with cursor or mouse

PTP’s Parallel Language Development Tools (PLDT) Show MPI Artifacts Code completion / Content Assist Context Sensitive Help for MPI Hover Help MPI Templates in the editor View artifacts in the View-window MPI Barrier Analysis PLDT has similar features for OpenMP, UPC, OpenSHMEM, OpenACC

MPI_BARRIER Analysis Barrier statements, in which all tasks must synchronize, are not matched properly, may cause deadlocks. If one MPI task in a communicator does not pass through the same number of MPI_Barrier statements as the other tasks in the same communicator, the other tasks will deadlock while waiting. Barrier analysis checks an MPI program and determines whether there are deadlocks due to the misplacement of barriers. It will report errors if any, and show "matching sets" for each barrier if error-free. The matching set of a barrier contains all barriers that synchronize with it at run time.

MPI Barrier Analysis Verify barrier synchronization in C/MPI programs For verified programs, lists barrier statements that synchronize together (match) For synchronization errors, reports counter example that illustrates and explains the error

Automating HPC workflows Organize Data and HPC Code into folders that align with workflow pipelines. Versioning code using git or similar tools will increase productivity. Use YAML that maps to the folder structure to define these workflows. Add python scripts to translate YAML to actionable code that could range from code commits to git, preprocessing, job submission and post processing analytics. Pre and Post processing activities are performed on DELLVS machines (different from SahasraT)- python could help tie all loose ends and enable research reproducibility. Use of Eclipse for Scientific workflow and development. https://www.eclipse.org/downloads/ https://www.eclipse.org/ptp/downloads.php