Python: Swiss-Army Glue

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Python for Science Shane Grigsby. What is python? Why python? Interpreted, object oriented language Free and open source Focus is on readability Fast.
Strategies for solving scientific problems using computers.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
Guide To UNIX Using Linux Third Edition
Game Programming © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line L.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.
Introduction to InVEST ArcGIS Tool Nasser Olwero GMP, Bangkok April
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
NHSC PACS How to Use getObservation And saveObservation To Access and Save Herschel Data in HIPE Yi Mei (NHSC/Caltech) Y. Mei NHSC Data Workshop – August.
Earth System Modeling Framework Python Interface (ESMP) October 2011 Ryan O’Kuinghttons Robert Oehmke Cecelia DeLuca.
Linux+ Guide to Linux Certification, Third Edition
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Esri UC 2014 | Technical Workshop | Creating Geoprocessing Services Kevin Hibma.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Five todos when moving an application to distributed HTC.
ORNL is managed by UT-Battelle for the US Department of Energy Spark On Demand Deploying on Rhea Dale Stansberry John Harney Advanced Data and Workflows.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
Python – It's great By J J M Kilner. Introduction to Python.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
Editing, Transferring, and Running Files on Vieques Daniel Malmer Dowell Lab Short Reads Course 6/9/15.
Advanced Computing Facility Introduction
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
Welcome to Indiana University Clusters
PH2150 Scientific Computing Skills
IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016
Managing, Storing, and Executing DTS Packages
Introduction to InVEST ArcGIS Tool
Large Output and Shared File Systems
Hadoop.
Welcome to Indiana University Clusters
IW2D migration to HTCondor
Spark Presentation.
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
IPYTHON AND MATPLOTLIB Python for computational science
Building Analytics At Scale With USQL and C#
Lecture 11 bash scripting overview c programming overview moving data between c and bash memory and pointers.
Course Name: QTP Trainer: Laxmi Duration: 25 Hrs Session: Daily 1 Hr.
Python for Quant Finance
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Lesson 1: Introduction to Trifacta Wrangler
Prepared by Kimberly Sayre and Jinbo Bi
Lesson 1: Introduction to Trifacta Wrangler
Network Visualization
Lecture 1: Introduction
Python for Scientific Computing
MCE 372 Engineering Analysis
Lesson 1 – Chapter 1B Chapter 1B – Terminology
Use of Mathematics using Technology (Maltlab)
High-Throughput Computing in Atomic Physics
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
Chapter Four UNIX File Processing.
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Software - Operating Systems
Quick Tutorial on MPICH for NIC-Cluster
L L Line CSE 420 Computer Games Lecture #3 Introduction to Python.
PYTHON PANDAS FUNCTION APPLICATIONS
Igor Stančin, Alan Jović to: {igor.stancin,
Presentation transcript:

Python: Swiss-Army Glue Josh Karpel <karpel@wisc.edu> Graduate Student, Yavuz Group UW-Madison Physics Department

My Research: Matrix Multiplication Python: Swiss-Army Glue - HTCondor Week 2018

My Research: Computational Quantum Mechanics Why HTC? HUGE PARAMETER SCANS How HTC? Manage jobs w/o big infrastructure https://doi.org/10.1364/OL.43.002583 Python: Swiss-Army Glue - HTCondor Week 2018

Using Python for Cluster Tooling Create Create jobs programmatically: “questionnaires” Run Computation: numpy scipy cython ... Store rich data: pickle Analyze Automate file transfer: paramiko Processing: pandas sqlite matplotlib ... Should already be familiar to sysadmins, who might write their own quick Python scripts Python: Swiss-Army Glue - HTCondor Week 2018

Using compiled C/Fortran code >>> import numpy as np >>> x = np.array(list(range(10))) >>> x array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> x * 2 array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]) >>> x ** 2 array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]) >>> np.dot(x, x) 285 Also see: Cython, Numba, F2PY, etc. Python: Swiss-Army Glue - HTCondor Week 2018

Scientific Python stack is full-featured Other Things People Like Python Equivalents Mathematica’s symbolic mathematics sympy Mathematica Notebooks / MATLAB’s command window IPython Notebooks MATLAB’s multidimensional arrays numpy MATLAB’s plotting tools matplotlib Pre-implemented numerical routines scipy Plus all the power of Python as a general-purpose language! Python: Swiss-Army Glue - HTCondor Week 2018

Generate jobs programmatically via “questionnaires” $ ./create_job__tdse_scan.py demo --dry Mesh Type [cyl | sph | harm] [Default: harm] > R Bound (Bohr radii) [Default: 200] > 100 R Points per Bohr Radii [Default: 10] > 20 l points [Default: 500] > [WARNING] ~ Predicted memory usage per Simulation is >15.3 MB Mask Inner Radius (in Bohr radii)? [Default: 80.0] > Mask Outer Radius (in Bohr radii)? [Default: 100.0] > <MORE QUESTIONS> Generated 75 Specifications Job batch name? [Default: demo] > Flock and Glide? [Default: y] > n Memory (in GB)? [Default: 1] > .8 Disk (in GB)? [Default: 5] > 3 Creating job directory and subdirectories... Saving Specifications... Writing Specification info to file... Writing submit file... Python: Swiss-Army Glue - HTCondor Week 2018

Use input and eval (carefully!) choices = { 'a': 'hello', 'b': 'goodbye', } choice = choices[input('Choice? ')] # Choice? <a> print(choice) # hello array = eval(input('Enter your array!')) import numpy as np array = eval('np.linspace(0, 10, 11)') print(type(array)) # <class 'numpy.ndarray'> print(array) # [0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] Python: Swiss-Army Glue - HTCondor Week 2018

Generate jobs programmatically via questionnaires [09:18 PM | karpel@submit-5 | ~/jobs/demo] $ ls -lh total 236K -rw-rw-r-- 1 karpel karpel 257 Apr 9 21:09 info.pkl drwxrwxr-x 2 karpel karpel 4.0K Apr 9 21:09 inputs drwxrwxr-x 2 karpel karpel 4.0K Apr 9 21:09 logs drwxrwxr-x 2 karpel karpel 4.0K Apr 9 21:09 outputs -rw-rw-r-- 1 karpel karpel 22K Apr 9 21:09 parameters.txt -rw-rw-r-- 1 karpel karpel 186K Apr 9 21:09 specifications.txt -rw-rw-r-- 1 karpel karpel 972 Apr 9 21:09 submit_job.sub Advantages Avoids copy-paste issues Provide feedback during job creation to catch errors early Flexible enough to define new “types” of jobs without writing entirely new scripts Easy to generate metadata about job Avoids copy-paste issues like forgetting to change something, or changing something you shouldn’t have Easy to generate metadata about job because you have access to the inputs while they’re still in Python Python: Swiss-Army Glue - HTCondor Week 2018

Store rich data using pickle Advantages Works straight out of the box Avoid transforming to/from other data formats (CSV, JSON, HDF5, etc.) Implement self-checkpointing jobs easily Gotchas Certain types of objects can’t be serialized Not as compressed as dedicated formats Can accidently break backwards-compatibility import pickle class Greeting: def __init__(self, words): self.words = words def yell(self): print(self.words.upper()) greeting = Greeting('hi!') with open('foo.pkl', mode = 'wb') as file: pickle.dump(greeting, file) with open('foo.pkl', mode = 'rb') as file: from_file = pickle.load(file) print(from_file.words) # hi! from_file.yell() # HI! Python: Swiss-Army Glue - HTCondor Week 2018

Automate file transfer using paramiko import paramiko remote_host = 'submit-5.chtc.wisc.edu' username = 'karpel' key_path = 'wouldnt/you/like/to/know' ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect(remote_host, username = username, key_filename = key_path) ftp = ssh.open_sftp() ssh.exec_command('ls -l') # returns stdin, stdout, stderr ftp.put('local/path', 'my/big/fat/input/data') ftp.get('path/to/completed/simulation', 'local/path') Advantages Runs on a schedule Easy to control which files get downloaded Can hook directly into data processing Gotchas Slow Occasional strange interactions with Dropbox/Box/Google Drive? Python: Swiss-Army Glue - HTCondor Week 2018

Process data using pandas df = pd.read_excel(...) df = pd.read_csv(...) df = pd.read_hdf(...) df = pd.read_json(...) df = pd.read_pickle(...) df = pd.read_sql(...) df.to_excel(...) df.to_csv(...) df.to_hdf(...) df.to_json(...) df.to_pickle(...) df.to_sql(...) df.to_html(...) # and more! import numpy as np import pandas as pd dates = pd.date_range('2018-01-01', periods = 6) df = pd.DataFrame( np.random.randn(6, 4), index = dates, columns = list('ABCD'), ) print(df) ##################### A B C D 2018-01-01 -0.165014 0.721058 1.113825 1.778694 2018-01-02 1.774170 0.130640 1.089180 -0.812315 2018-01-03 1.167511 0.121111 -0.766156 1.816411 2018-01-04 0.103793 0.438878 -0.040532 0.238539 2018-01-05 -0.492766 1.466809 -0.384373 2.209309 2018-01-06 -1.304448 0.593538 0.055233 1.930035 Python: Swiss-Army Glue - HTCondor Week 2018

Visualize data using matplotlib Python: Swiss-Army Glue - HTCondor Week 2018

Python is Swiss-Army Glue Generate Jobs Run Jobs Retrieve Jobs Process Jobs pickle numpy scipy cython input/eval Python pickle matplotlib pandas sqlite paramiko pickle Python: Swiss-Army Glue - HTCondor Week 2018

Your own ideas! Where to go from here? My (extremely unstable) framework, Simulacra The HTCondor Python Bindings James Bourbeau’s PyCondor Your own ideas! Python: Swiss-Army Glue - HTCondor Week 2018