High-Throughput Computing in Atomic Physics

Slides:



Advertisements
Similar presentations
Being ready is important. When adults tell me to get ready, it is my job to do it. Being ready is my responsibility. Im ready!
Advertisements

How to finish your project on time ?
EconS 529 Research Methods Fall semester Introductory Lecture.
Applied Software Project Management 1 Introduction Dr. Mengxia Zhu Computer Science Department Southern Illinois University Carbondale.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
COMP 111 Programming Languages 1 First Day. Course COMP111 Dr. Abdul-Hameed Assawadi Office: Room AS15 – No. 2 Tel: Ext. ??
Microsoft ® Office 2007 Training Security II: Turn off the Message Bar and run code safely presents:
HFOOAD Chapter 2 Requirements. We create software for a reason. We create software fro people. We need to know what the people want in the software we.
Making Decisions uCode: October Review What are the differences between: o BlueJ o Java Computer objects represent some thing or idea in the real.
Office © 2013, WORK FAMILY & HEALTH NETWORK Manager Only Session.
Functions. Built-in functions You’ve used several functions already >>> len("ATGGTCA")‏ 7 >>> abs(-6)‏ 6 >>> float("3.1415")‏ >>>
Santa’s s By…….. Sending an attachment to Santa Explain how you attached the . Clicked The Paper Clip and attatched it. Explain why you.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
P3 - prepare a computer for installation/upgrade By Ridjauhn Ryan.
BY: Eugene Baker Illustrations and PowerPoint By: Aminah Rashid.
Week 14 - Monday.  What did we talk about last time?  Inheritance.
If the same piece of code needs to be used several times we can use a loop – but only if those times are all together. If you need to run the same bit.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
Review Please turn in your homework and practicals Packages, installation, rpm command Apache – Quick and easy way to set up a web server to play around.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Pitfalls of your first paper Shu Cai Institute of Computing Technology, Chinese Academy of Sciences
JOB SEEKER STUDENTSTUDENT STUDENTSTUDENT.
Five todos when moving an application to distributed HTC.
Christina Koch Research Computing Facilitators
CSC 108H: Introduction to Computer Programming
Introduction to the Command Line for Data Analysts Gus Cavanaugh
Course Overview - Database Systems
Digital Stewardship Curriculum
The Future of Apache Flink®
HTCondor Security Basics
Intermediate HTCondor: Workflows Monday pm
1. BEGINNING STEPS QUESTION 3 – What in life is satisfying?
Presentation Title Your Name
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
High Availability in HTCondor
Telling the Whole Story
Reporting Overview Business Goals Demystify the report menu
OSG Connect and Connect Client
Statement atoms The 'atomic' components of a statement are: delimiters (indents, semicolons, etc.); keywords (built into the language); identifiers (names.
Get best assistance for HP Printer Australia Dial
Reviving the Essay Week 4
Troubleshooting Your Jobs
Course Overview - Database Systems
Let's Learn Python and Pygame
Welcome to Computer Science
How can you read better? Strategies for success
Unit 6: Application Development
What’s Different About Overlay Systems?
Organization Leadership Skill Area
Strategies for Test Success
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Problem Solving Designing Algorithms.
The Big 6 Information Management Model
Santa’s s By……..
How to Improve Releasing Efficiency via i18N/L10n Test Automation.
Hi, lovely to meet you all…….. Etc……..
Python: Swiss-Army Glue
CISC101 Reminders Assignment 3 due next Friday. Winter 2019
Overview of Workflows: Why Use Them?
Easy-Speak How easy is it?
Mr. Rosten’s Expectations
While Loops in Python.
CS 580u Programming Systems and Tools
Welcome! Let’s get creative!.
Troubleshooting Your Jobs
DO NOT BEGIN UNTIL YOU ARE TOLD TO
Critical, creative and problem solving skills
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

High-Throughput Computing in Atomic Physics Josh Karpel ⟨karpel@wisc.edu⟩ Graduate Student, Yavuz Group UW-Madison Physics Department

My Research: Matrix Multiplication HTC in Atomic Physics - OSG User School 2018

My Research: Computational Quantum Mechanics Why HTC? HUGE PARAMETER SCANS https://doi.org/10.1364/OL.43.002583 HTC in Atomic Physics - OSG User School 2018

Workflows in Atomic/Molecular/Optical Physics Chelkowski, S., Bandrauk, A. D., & Corkum, P. B. (2017). https://doi.org/10.1103/PhysRevA.95.053402 Develop Theory Simulate Specific Examples Write Paper AMO Theory Simulate Tons of Examples Develop Theory to Explain Results Write Paper What I Do They’re working in a regime where the lines are straight – I’m not! I need very high resolution to make sure I’m not missing things HTC in Atomic Physics - OSG User School 2018

The Curse of Ambition Started out wanting to run a few hundred hours Ended up running… 10 million hours, about 1150 years of computing, in just the last year! I started out running a few dozen hours of simulations on my desktop Then I wanted to run a few hundreds hours, and needed HTC… and now I’m here HTC in Atomic Physics - OSG User School 2018

OSG is not a pristine environment Your Computer You set up the whole system Run for as long as you want without interruption Someone Else’s Computer No idea what software is installed No idea how long you’ll be able to run for Want to talk about two challenges I faced, each an example of one of those problems. HTC in Atomic Physics - OSG User School 2018

Automatic Retries HTC in Atomic Physics - OSG User School 2018

Automatic Retries I use Cython Cython needs GCC Sometimes GCC isn’t available My jobs explode and clog things up wait patiently to try again I get yelled at My jobs finish (eventually) on_exit_hold = (ExitCode =!= 0) periodic_release = (JobStatus == 5) && (HoldReasonCode == 3) && (CurrentTime - EnteredCurrentStatus >= 300) && (NumJobCompletions <= 10) HTC in Atomic Physics - OSG User School 2018

Your jobs will fail sometimes, for reasons that you can’t solve Make sure your jobs fail politely (don’t retry forever) Don’t give up on your jobs (max_retries, etc.) Tell people about your problems! (Nuclear Option: Docker/Singularity) HTC in Atomic Physics - OSG User School 2018

Self-Checkpointing Jobs HTC in Atomic Physics - OSG User School 2018

Self-Checkpointing Jobs # Python-ish pseudocode def run_simulation(): last_checkpoint = now done = False while not done: advance_simulation() if (now – last_checkpoint) > time_between_checkpoints: do_checkpoint() done = True HTC in Atomic Physics - OSG User School 2018

Self-Checkpointing Jobs # Python-ish pseudocode def execute_node(): try: simulation = find_existing_simulation() except FileNotFoundError: inputs = load_inputs() simulation = Simulation(inputs) simulation.run_simulation() If you represent your job as an object, it (usually) becomes easy to save it to disk I use pickle, part of the Python standard library The thing to look up is serialization HTC in Atomic Physics - OSG User School 2018

My Workflow Generate input parameters Submit job The smoother you can make this part work, the happier you’ll be Generate input parameters Submit job Wait… read a book… er, paper… Jobs are running… Failed jobs are re-running automatically… Evicted jobs aren’t failing… Check Results Do Science to Results This is the part you can’t control, but have to interact with HTC in Atomic Physics - OSG User School 2018

Leverage HTCondor built-ins to solve your problems (Late Materialization is coming soon!) Don’t be afraid to write your own solution! (I gave a talk at HTCondor Week 2018 about my workflow) HTC involves a different mindset, with new problems and new tools HTC in Atomic Physics - OSG User School 2018