Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.

Slides:



Advertisements
Similar presentations
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Advertisements

Workflows: from Development to Automated Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin -
CHAPTER 2 ALGORITHM ANALYSIS 【 Definition 】 An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition,
Pune Institute of Computer Technology Vijay Gabale Santosh Patil Ashish Deshpande -students of Computer department presents:
Programming Types of Testing.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Introduction to C++ Programming CS 117 Section 2 and KNET Sections Spring 2001 MWF 1:40-2:30.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
1 -Defined Functions 1. Goals of this Chapter 2. General Concept 3. Advantages 4. How it works Programmer.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Chapter 12 Recursion, Complexity, and Searching and Sorting
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Robert Crawford, MBA West Middle School.  Explain how the binary system is used by computers.  Describe how software is written and translated  Summarize.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Parallel Algorithms & Distributed Computing Matt Stimmel Matt White.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
The Software Development Process
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
13-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. April 28, 2009 Inventory # Chapter 13 Solver.out File and CCL Introduction to.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Sponsored by the National Science Foundation Systematic Experimentation Sarah Edwards GENI Project Office.
Software Engineering Algorithms, Compilers, & Lifecycle.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Christina Koch Research Computing Facilitators
Intermediate HTCondor: More Workflows Monday pm
Large Output and Shared File Systems
Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am
About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Chapter 13 Solver .out File and CCL
Advanced QlikView Performance Tuning Techniques
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
OSG Connect and Connect Client
Troubleshooting Your Jobs
Data Structures and Algorithms
湖南大学-信息科学与工程学院-计算机与科学系
Haiyan Meng and Douglas Thain
Objective of This Course
Final Project Details Note: To print these slides in grayscale (e.g., on a laser printer), first change the Theme Background to “Style 1” (i.e., dark.
What’s Different About Overlay Systems?
Introduction to High Throughput Computing and HTCondor
CS246 Search Engine Scale.
CS246: Search-Engine Scale
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison

2013 OSG User School Workflows Should Make Life Science Easier non-computing workflows are all around you … especially in science  grading exams  instrument setup  experimental procedures when planned/documented, workflows help with:  organizing and managing processes  saving time  objectivity, reliability, and reproducibility (THE TENENTS OF GOOD SCIENCE!) 2

2013 OSG User School Workflows are like Computing Algorithms 3 Steps Connections (Metadata)

2013 OSG User School ‘Engineering’ a Good Workflow 1.Draw out the general workflow 2.Define details (test ‘pieces’ with HTCondor) 3.Define the computing workflow  divide or consolidate processes  off-load file transfers and consider file transfer times  identify steps to be automated or checked 4.Build it piece-by-piece; test and optimize 5.Scale-up: data and computing resources 6.What more can you automate or error-check? (And document, document, document) 4

2013 OSG User School From schematics… 5

2013 OSG User School … to the real world 6

2013 OSG User School Start with This 7 file preparation (minutes) processing (days) transform results (minutes)

2013 OSG User School process ‘99’ (filter output) End Up with This 8 (special transfer) file prep and split (POST-RETRY) process ‘0’ (filter output) combine, transform results (POST-RETRY)... 1 GB RAM 2 GB Disk 1.5 hours 100 MB RAM 500 MB Disk 40 min (each) 300 MB RAM 1 GB Disk 15 min (PRE) (POST-RETRY)

2013 OSG User School Key Workflow Principles 9 1.Increase Throughput 2.Be Kind to Your Submit Node 3.Bring it With You 4.‘Scriptify’ As Much As Possible 5.“Testing, testing, 1, 2, 3 …”

2013 OSG User School Always focus on Throughput 10 What is High Throughput  many ‘smaller’ jobs  constant job pressure  automation  long total workflow times What is not?  job runtimes less than 5 min  jobs you expect to start up quickly  micro-optimizations

2013 OSG User School ‘Engineering’ a Good Workflow 1.Draw out the general workflow 2.Define details (test ‘pieces’ with HTCondor) 3.Define the computing workflow  divide or consolidate processes  off-load file transfers and consider file transfer times  identify steps to be automated or checked 4.Build it piece-by-piece; test and optimize 5.Scale-up: data and computing resources 6.What more can you automate or error check? (And document, document, document) 11

2013 OSG User School Resources Jobs Need CPU  #CPUs and time RAM Disk  Working (execute side)  Total (submit side)  Bandwidth (file transfer) Network bandwidth  Usually for file transfer only 12

2013 OSG User School First run locally: To measure usage Did it run correctly?  Are you sure? Run once remotely  NOT ON SUBMIT MACHINE!  Might be surprised Once working, run a couple of times If big variance, should you take the…  Average? Median? Worst case? 13

2013 OSG User School User Log shows all 005 ( ) 06/07 14:12:55 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 5 - Run Bytes Sent By Job Run Bytes Received By Job 5 - Total Bytes Sent By Job Total Bytes Received By Job Partitionable Resources : Usage Request Cpus : 1 Disk (KB) : Memory (MB) :

2013 OSG User School Golden Rules for DAGs 15 Beware of the shish kebab! Use PRE and POST script generously RETRY is your friend DAGs of DAGs are good  SPLICE, SUB_DAG_EXTERNAL

2013 OSG User School It’s not all about the DAG 16 PRE and POST scripts run on the submit node  avoid combining/splitting large files  avoid compiling Remember: DAGs don’t do iterative loops well Solution: move more tasks into a ‘job’

2013 OSG User School Wrapper Scripts are Essential 17 Before execution (bring it with you!)  transfer/prepare files and directories  setup/configure environment and other dependencies including run-time libraries (Matlab, R, Python, etc.) Execution  prepare complex command-line arguments  batch together many ‘small’ tasks After execution  filter, divide, consolidate, and/or compress files  check for errors

2013 OSG User School Automate All The Things well, not really, but kind of … Really: What is the minimal number of manual steps necessary? even 1 might be too many; zero is perfect! Consider what you get out of automation  time savings (including less ‘babysitting’ time)  reliability and reproducibility 18

2013 OSG User School Is It Worth the Time? 19

2013 OSG User School Batching (Merging) is easy Scripting  Avoids transfer of intermediate files  Careful with error conditions  Debugging can be a bit tricky 20

2013 OSG User School Breaking up is hard to do… Ideally into parallel (separate) jobs  reduced job requirements = more matches  not always possible Often need checkpoints  standard universe can help  user-defined check-pointing  checkpoint images can be hard to manage 21

2013 OSG User School Automation: Parameter Sweeps Command arguments can become complicated and messy. Wrapper scripts could:  Hardcode “extra” arguments  Compute arguments  Look up arguments from a table Use $(Process) !! 22

2013 OSG User School ‘Engineering’ a Good Workflow 1.Draw out the general workflow 2.Define details (test ‘pieces’ with HTCondor) 3.Define the computing workflow  divide or consolidate processes  off-load file transfers and consider file transfer times  identify steps to be automated or checked 4.Build it piece-by-piece; test and optimize 5.Scale-up: data and computing resources 6.What more can you automate or error check? (And document, document, document) 23

2013 OSG User School process ‘99’ (filter output) End Up with This 24 (special transfer) file prep and split (POST-RETRY) process ‘0’ (filter output) combine, transform results (POST-RETRY)... 1 GB RAM 2 GB Disk 1.5 hours 100 MB RAM 500 MB Disk 40 min (each) 300 MB RAM 1 GB Disk 15 min (PRE) (POST-RETRY)

2013 OSG User School Questions? Feel free to contact me:  Now: “Joe’s Workflow” Exercise 6.1  9:30-10am, in groups Later:  10-10:30am: From Workflow to Production  10:30-10:45am: Break  10:45am-12:15: Exercises 6.2,