Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University.

Similar presentations

Presentation on theme: "Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University."— Presentation transcript:

1 Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

2 Outline  Grid Enablement of Weather Research and Forecasting Code (WRF)  Profiling and Prediction Tools  Research Goals  Project Timeline  Current Progress  Challenges  Remaining Work 2REU - Florida International University

3 Motivation – Weather Research and Forecasting Code (WRF)  Goal – Improved Weather Prediction Accurate and Timely Results Precise Location Information  WRF Status Over 160,000 lines (mostly FORTRAN and C) Single Machine/Cluster compatible Single Domain Fine Resolution -> Resource Requirements  How to Overcome this? Through Grid Enablement Expected Benefits to WRF More available resources – Different Domains Faster results Improved Accuracy 3REU - Florida International University

4 System Overview  Web-Based Portal  Grid Middleware (Plumbing) Job-Flow Management Meta-Scheduling ○ Performance Prediction Profiling and Benchmarking  Development Tools and Environments Transparent Grid Enablement (TGE) ○ TRAP: Static and Dynamic adaptation of programs ○ TRAP/BPEL, TRAP/J, TRAP.NET, etc. GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid 4 REU - Florida International University

5 Performance Prediction IMPORTANT part of Meta-Scheduling Allows for:  Optimal usage of grid resources through “smarter” meta-scheduling Many users overestimate job requirements Reduced idle time for compute resources Could save costs and energy  Optimal resource selection for most expedient job return time 5REU - Florida International University


7 Amon / Aprof  Amon – monitoring program that runs on each compute node recording new processes  Aprof – regression analysis program running on head node; receives input from Amon to make execution time predictions (within cluster & between clusters) 7REU - Florida International University

8 Amon / Aprof Monitoring and Prediction 8REU - Florida International University

9 Amon / Aprof Approach to Modeling Resource Usage 9 WRF REU - Florida International University

10 Sample Amon Output Process --- (464) --- name: wrf.exe cpus: 8 inv clock: 1/2297.700 [MHz] inv cache size: 1/1024 [KB] elapsed time: 1234232 [msec] utime: 1233890 [msec] 1236360 [msec] stime: 560 [msec] 1420 [msec] intr: 44959 ctxt switch: 84394 fork: 89 storage R: 0 [blocks] 0 [blocks] storage W: 0 [blocks] network Rx: 4188840 [bytes] network Tx: 2106854 [bytes] 10REU - Florida International University

11 Sample Aprof Output name: wrf_arw_DM.exe elapsed time: 5.783787e+06 =========================================================== explanatory: value parameter ----------------- ------------- ------------- ------------- : 1.000000e+00 5.783787e+06 1.982074e+05 =========================================================== predicted: value residue rms ----------------- ------------- ------------- ------------- elapsed time: 5.783787e+06 4.246451e+06 1.982074e+05 =========================================================== REU - Florida International University11

12 Sample Query Automation Script Output adj. cpu speed, processors, actual, predicted, rms, std. dev, actual difference, 3591.363, 1, 5222, 5924.82, 1592.459, 415.3491, 13.4588280352 3591.363, 2, 2881, 3246.283, 1592.459, 181.5382, 12.6790350573 3591.363, 3, 2281, 2353.438, 1592.459, 105.334, 3.17571240684 3591.363, 4, 1860, 1907.015, 1592.459, 69.19778, 2.52768817204 3591.363, 5, 1681, 1639.161, 1592.459, 49.83672, 2.48893515764 3591.363, 6, 1440, 1460.592, 1592.459, 39.5442, 1.43 3591.363, 7, 1380, 1333.043, 1592.459, 34.76459, 3.40268115942 3591.363, 8, 1200, 1237.381, 1592.459, 33.27651, 3.11508333333 3591.363, 9, 1200, 1162.977, 1592.459, 33.56231, 3.08525 3591.363, 10, 1080, 1103.454, 1592.459, 34.68943, 2.17166666667 3591.363, 11, 1200, 1054.753, 1592.459, 36.15324, 12.1039166667 3591.363, 12, 1080, 1014.169, 1592.459, 37.70271, 6.09546296296 3591.363, 13, 1200, 979.8292, 1592.459, 39.22018, 18.3475666667 3591.363, 14, 1021, 950.3947, 1592.459, 40.65455, 6.91530852106 3591.363, 15, 1020, 924.8848, 1592.459, 41.9872, 9.32501960784 REU - Florida International University12

13 Previous Findings for Amon / Aprof Experiments were performed on two clusters at FIU—Mind (16 nodes) and GCB (8 nodes)  Experiments were run to predict for different number of nodes and cpu loads (i.e. 2,3,…,14,15 and 20%, 30%,…,90%, 100%)  Aprof predictions were within 10% error versus actual recorded runtimes within Mind and GCB and between Mind and GCB  Conclusion: first step assumption was valid. -> Move to extending research to higher number of nodes. 13REU - Florida International University

14 Paraver / Dimemas o Dimemas - simulation tool for the parametric analysis of the behavior of message-passing applications on a configurable parallel platform. o Paraver – tool that allows for performance visualization and analysis of trace files generated from actual executions and by Dimemas Tracefiles generated by MPItrace that is linked into execution code 14REU - Florida International University

15 Dimemas Simulation Process Overview 1. Link MPItrace into application source code—dynamically generates tracefiles for each node application running on (.mpit) 2. Use CEPBA tool ‘mpi2prv’ to convert.mpit files into one.prv file 3. Load file into Parver using XML filtering file (provided by CEPBA) to reduce tracefile eliminating ‘perturbed regions’ (i.e. much of the initialization) 4. Open tracefile in Paraver using ‘useful_duration’ configuration file and adjust scales to fit events 5. Identify computation iterations compose a smaller trace file by selecting a few iterations, preserving communications and eliminating initialization phases REU - Florida International University15

16 Paraver tracefile with iterations selected, cut, and ready for Dimemas conversion. REU - Florida International University16

17 Simulation Process (cont’d) 6. Convert the new tracefile to Dimemas format (.trf) using CEPBA provided ‘prv2trf’ tool 7. Load tracefile into Dimemas simulator, configure target machine, and with information generate Dimemas configuration file 8. Call simulator with or without option of generating a Paraver (.prv) tracefile for viewing. Great News: You only have to go through this process once if done for the maximum amount of nodes you will simulate for! Once configuration file is generated, different numbers of nodes can be simulated for through alterations to the file. REU - Florida International University17

18 Dimemas Simulator Results 18REU - Florida International University

19 Goals 1. Extend Amon/Aprof research to larger number of nodes, different architecture, and different version of WRF (Version 2.2.1). 2. Compare/contrast Aprof predictions to Dimemas predictions in terms of accuracy and prediction computation time. 3. Analyze if/how Amon/Aprof could be used in conjunction with Dimemas/Paraver for optimized application performance prediction and, ultimately, meta-scheduling 19REU - Florida International University

20 Timeline  End of June: Get MPItrace linking properly with WRF Version Compiled on GCB, then Mind COMPLETE a) Install Amon and Aprof on MareNostrum and ensure proper functioning AMON COMPLETE; APROF FINAL STAGES b) Run Amon benchmarks on MareNostrum COMPLETE  Early/Mid July: Use and analyze Aprof predictions within MareNostrum (and possibly between MareNostrum, GCB, and Mind) IN PROGRESS Use generated MPI/ OpenMP tracefiles (Paraver/Dimemas) to predict within (and possibly between) Mind, GCB, and MareNostrum IN PROGRESS  Late July/Early August: Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas Analyze how findings relate to bigger picture. Make optimizations on grid-enablement of WRF. Compose paper presenting significant findings. 20REU - Florida International University

21 21REU - Florida International University

22 General  Completed reading of related works papers  Well advanced in Linux studies  Established effective collaboration/working relationship with developers of Dimemas and Paraver 22REU - Florida International University

23 Amon  Installed on MareNostrum  Adjusted source code to properly read node information from MareNostrum (will document this on Wiki to be considered when configuring on new architectures) 23REU - Florida International University

24 Amon (cont’d)  Automated benchmarking shell script developed Starts Amon on each compute node returned by system scheduler Executes WRF with one process per node for: ○ Node counts of: 8, 16, 32, 64, 96, and 128 ○ CPU percentage (%) loads of: 25, 50, 75, & 100 (Done through implementation of CPULimit program) Writes results (to be used as Aprof input) to organized results directory of …/ / / / 24REU - Florida International University

25 Aprof  Installed on MareNostrum  Adjusted source code to change the way Aprof reads in information Before: Input files had to specify number of bytes in process listing in process header (This was very complicated and error prone. Aprof was inconsistent in loading MareNostrum data). Now: Input files simply need to separate process entries with one or more blank lines. 25REU - Florida International University

26 Aprof (cont’d)  Script developed that combines Amon output from all nodes and edits it into the necessary read-in format for Aprof.  Aprof query automation script adjusted /developed for MareNostrum Queries Aprof for prediction information for different cases (number of nodes; cpu percentage loads) Compares predicted values to actual values returned by run 26REU - Florida International University

27 Dimemas / Paraver  Paraver tracefile successfully generated and visualized with GUI on MareNostrum  Dimemas tracefile successfully generated from Paraver on MareNostrum  Configuration file for MareNostrum developed  Prediction simulations will begin shortly 27REU - Florida International University

28 Significant Challenges Overcome  Amon: Adjustment of source code to proper functioning on MareNostrum Development of benchmarking script to conform to system architecture of MareNostrum (i.e. going through its scheduler; one process per node; etc.)  Aprof: Adjustment of source code for less complex, more consistent data input Development of prediction and comparison scripts for MareNostrum 28REU - Florida International University

29 Significant Challenges Overcome (cont’d)  Dimemas/Paraver MPItrace properly linked in with WRF on GCB and Mind Paraver and Dimemas successfully generated and configuration file configured for MareNostrum.  WRF Version 2.2 installed and compiled on Mind 29REU - Florida International University

30 Remaining Work  Scripting Dimemas prediction simulations for the same scenarios of those of Amon and Aprof  Finalizing Aprof prediction/comparison script so that Aprof’s performance on new architecture of MareNostrum can be analyzed  Deciding if and how to compare results from MareNostrum, GCB, and Mind (i.e. the same versions of WRF would have to be running in all three locations)  Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas 30REU - Florida International University

31 References  S. Masoud Sadjadi, Liana Fong, Rosa M. Badia, Javier Figueroa, Javier Delgado, Xabriel J. Collazo-Mojica, Khalid Saleem, Raju Rangaswami, Shu Shimizu, Hector A. Duran Limon, Pat Welsh, Sandeep Pattnaik, Anthony Praino, David Villegas, Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye, Juan Carlos Martinez, Ivan Rodero, Shuyi Chen, Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh Willoughby, Michael McFail, Christine Lisetti, and Malek Adjouadi. Transparent grid enablement of weather research and forecasting. In Proceedings of the Mardi Gras Conference 2008 - Workshop on Grid-Enabling Applications, Baton Rouge, Louisiana, USA, January 2008. ns/Mardi-Gras-GEA-2008-TGE- WRF.ppt  S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo. A modeling approach for estimating execution time of long- running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS- 2008), the Fifth High-Performance Grid Computing Workshop (HPGC- 2008), Miami, Florida, April 2008. ns/HPGC-2008- WRF%20Modeling%20Paper%20Pre sentationl.ppt  “Performance/Profiling”. Presented by Javier Figueroa in Special Topics in Grid Enablement of Scientific Applications Class. 13 May 2008 31REU - Florida International University

32 Acknowledgements  REU  PIRE  BSC  Masoud Sadjadi, Ph. D. - FIU  Rosa Badia, Ph.D. - BSC  Javier Delgado – FIU  Javier Figueroa - UM 32REU - Florida International University

Download ppt "Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University."

Similar presentations

Ads by Google