Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center
2Sep 13, 2006Scientific Computing Overview What is a scientific computing project? Procedures to manage scientific computing projects
3Sep 13, 2006Scientific Computing Commodity computing Web access Writing: papers, letters, thesis, presentations, web content Drawing: graphs, figures, plots Calculating: spreadsheets, Mathematica, Maple, SAS, Matlab
4Sep 13, 2006Scientific Computing Science and Engineering Computing with software Physics: VASP, WIEN Physics: VASP, WIEN Chemistry: Gaussian, Q-Chem Chemistry: Gaussian, Q-Chem Engineering: ANSYS Engineering: ANSYS Developing software Programming Programming Prototyping Prototyping Debugging Debugging Performance analysis Performance analysis
5Sep 13, 2006Scientific Computing Scientific Computing Project Significant human effort Many steps with dependencies Takes a long time on one computer or many computers to complete Involves a lot of data Input given to be processed Input given to be processed Intermediate data for the computation Intermediate data for the computation Output produced to be analyzed Output produced to be analyzed
6Sep 13, 2006Scientific Computing Example SCP Test a set of model parameters Given basic parameters B n Given basic parameters B n Compute dependent values D j Compute dependent values D j Compare to test values T j Compare to test values T j If the number of dependent and test value sets is large, say 1,000 And each computation takes time, say 1 h Then this is a project
7Sep 13, 2006Scientific Computing Recognizing SCP Act from early stages as if it is SCP Then procedures are tested and reliable by the time the science of the project becomes harder the science of the project becomes harder and requires all attention and requires all attention
8Sep 13, 2006Scientific Computing Reliability of modern computers Computers, networks and software are Very stable Very stable Very powerful Very powerful Leads to wide spread belief that they are Infinitely stable Infinitely stable Infinitely powerful Infinitely powerful Probability of failure Small chance times lots of work = big chance Small chance times lots of work = big chance
9Sep 13, 2006Scientific Computing Overview What is a scientific computing project? Procedures to manage scientific computing projects
10Sep 13, 2006Scientific Computing Manage a SCP Project analysis Data Data Computation Computation Develop strategy Organize the computation Organize the computation Manage the data Manage the dataAutomation Avoid human errors Avoid human errors Protect against disasters Protect against disasters
11Sep 13, 2006Scientific Computing Project analysis Often a project starts small Once you decide the project is worthwhile, perform a project analysis Data: before, during, after Data: before, during, after Computation: how many, how long Computation: how many, how long Precautions: minimize effect of disasters Precautions: minimize effect of disasters
12Sep 13, 2006Scientific Computing Develop strategy Organize the computation Choose computer system Choose computer system Study scheduling system Study scheduling system Match the project computation flow onto the scheduling policies Manage the data Input files generated by hand? By machine? Input files generated by hand? By machine? Space for large intermediate files Space for large intermediate files Space for output files Space for output files
13Sep 13, 2006Scientific Computing Automation Extra tools needed to manage the project? Generate input files from a database? Generate input files from a database? Write scripts? Use a tool already developed? Generate scheduler command files? Generate scheduler command files? Does a tool exist? Some tools are very complex. Is it easier to write scripts than to learn the tool? Collect data from output files into a database? Collect data from output files into a database? Write scripts? Write a compiled program?
14Sep 13, 2006Scientific Computing Automation Computation and data monitoring Check status of each run Check status of each run Submit the job again if it failed Check correctness and integrity of output data Check correctness and integrity of output data Even if the job finished it may have generated an error message there may be no result or the result may be invalid or incorrect
15Sep 13, 2006Scientific Computing Precautions Prepare for some disasters Some or all computed data is lost or corrupted? Some or all computed data is lost or corrupted? Make sure all files created manually are on disks that are backed up at least, you can run computations again Some output has been processed Some output has been processed Make sure partial results are on disks that are backed up
16Sep 13, 2006Scientific Computing Growing projects Often projects start small Procedures are developed and used Procedures are developed and used They work well for 1,000 cases Then the scope is increased After partial success After partial success Procedures are used unchanged Procedures are used unchanged They do not work for 1,000,000 cases! Must perform new analysis when scope changes
17Sep 13, 2006Scientific Computing Tool choices Small operations Scripts are easy to write and change Scripts are easy to write and change Run fast for small numbers Run fast for small numbers Large operations Running a script 10,000,000 times may be very slow and cause unexpected side effects Running a script 10,000,000 times may be very slow and cause unexpected side effects Investigate better tools Investigate better tools Program in compiled language Use database instead of simple files
18Sep 13, 2006Scientific Computing Conclusion A little bit of thought, can save you from a lot of trouble and extra work Every scientific computation project that is worth doing is worth a little bit of thought about how to do it.