Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University.

Slides:

Advertisements

Similar presentations

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (

Advertisements

CPSCG: Constructive Platform for Specialized Computing Grid Institute of High Performance Computing Department of Computer Science Tsinghua University.

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.

ICS103 Programming in C Lecture 1: Overview of Computers & Programming

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Automated Instrumentation and Monitoring System (AIMS)

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Communication Pattern Based Node Selection for Shared Networks

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.

Towards an agent integrated speculative scheduling service L á szl ó Csaba L ő rincz, Attila Ulbert, Tam á s Kozsik, Zolt á n Horv á th ELTE, Department.

Multiscalar processors

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.

An Approach to Test Autonomic Containers Ronald Stevens (IEEE Computer Society & ACM Student Member) August 1, 2006 REU Sponsored by NSF.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

GPU Performance Prediction GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Javier Delgado Gabriel Gazolla.

Application Performance Prediction Javier Delgado Feb. 9, 2009 X.

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.

FIU Academic Community Workshop Date: September 30, 2008 Graduate Fellowship Opportunity for Science and Engineering.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Transparent Grid Enablement Using Transparent Shaping and GRID superscalar I. Description and Motivation II. Background Information: Transparent Shaping.

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,

AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.

Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations.

A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1, Shu Shimizu 2, Javier Figueroa 1,3, Raju.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.

Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Partnership for International Research and Education A Global Living Laboratory for Cyberinfrastructure Application Enablement Enhanced Grid Enabled Weather.

Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University.

Presented by: Marlon Bright 1 August 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University.

Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.

Weather Research & Forecasting Model Xabriel J Collazo-Mojica Alex Orta Michael McFail Javier Figueroa.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.

WRF - REU Project Presentation Michael McFail Xabriel J Collazo-Mojica Javier Figueroa Alex Orta.

CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.

Finding Discriminating DNA Probe Sequences by Implementing a Parallelized Solution in a Cluster REU Camilo A. Silva Professor and Advisor: Dr. S. Masoud.

Sunpyo Hong, Hyesoon Kim

Weather Research and Forecasting (WRF) Portal Seychelles Martinez School of Computing and Information Sciences Florida International University Elias Rodriguez.

LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.

Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Performance Assurance for Large Scale Big Data Systems

Porting MM5 and BOLAM codes to the GRID

QianZhu, Liang Chen and Gagan Agrawal

ICS103 Programming in C Lecture 1: Overview of Computers & Programming

Parallelized Analysis Using Subdural Interictal EEG

Weather Research and Forecasting (WRF) Portal

Weather Research and Forecasting (WRF) Portal

Laura Bright David Maier Portland State University

Overview of Workflows: Why Use Them?

Presentation transcript:

Presented by: Marlon Bright 14 July 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Outline  Grid Enablement of Weather Research and Forecasting Code (WRF)  Profiling and Prediction Tools  Research Goals  Project Timeline  Current Progress  Challenges  Remaining Work 2REU - Florida International University

Motivation – Weather Research and Forecasting Code (WRF)  Goal – Improved Weather Prediction Accurate and Timely Results Precise Location Information  WRF Status Over 160,000 lines (mostly FORTRAN and C) Single Machine/Cluster compatible Single Domain Fine Resolution -> Resource Requirements  How to Overcome this? Through Grid Enablement Expected Benefits to WRF More available resources – Different Domains Faster results Improved Accuracy 3REU - Florida International University

System Overview  Web-Based Portal  Grid Middleware (Plumbing) Job-Flow Management Meta-Scheduling ○ Performance Prediction Profiling and Benchmarking  Development Tools and Environments Transparent Grid Enablement (TGE) ○ TRAP: Static and Dynamic adaptation of programs ○ TRAP/BPEL, TRAP/J, TRAP.NET, etc. GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid 4 REU - Florida International University

Performance Prediction IMPORTANT part of Meta-Scheduling Allows for:  Optimal usage of grid resources through “smarter” meta-scheduling Many users overestimate job requirements Reduced idle time for compute resources Could save costs and energy  Optimal resource selection for most expedient job return time 5REU - Florida International University

Amon / Aprof  Amon – monitoring program that runs on each compute node recording new processes  Aprof – regression analysis program running on head node; receives input from Amon to make execution time predictions (within cluster & between clusters) 7REU - Florida International University

Amon / Aprof Monitoring and Prediction 8REU - Florida International University

Amon / Aprof Approach to Modeling Resource Usage 9 WRF REU - Florida International University

Sample Amon Output Process --- (464) --- name: wrf.exe cpus: 8 inv clock: 1/ [MHz] inv cache size: 1/1024 [KB] elapsed time: [msec] utime: [msec] [msec] stime: 560 [msec] 1420 [msec] intr: ctxt switch: fork: 89 storage R: 0 [blocks] 0 [blocks] storage W: 0 [blocks] network Rx: [bytes] network Tx: [bytes] 10REU - Florida International University

Sample Aprof Output name: wrf_arw_DM.exe elapsed time: e+06 =========================================================== explanatory: value parameter std.dev : e e e+05 =========================================================== predicted: value residue rms std.dev elapsed time: e e e+05 =========================================================== REU - Florida International University11

Sample Query Automation Script Output adj. cpu speed, processors, actual, predicted, rms, std. dev, actual difference, , 1, 5222, , , , , 2, 2881, , , , , 3, 2281, , , , , 4, 1860, , , , , 5, 1681, , , , , 6, 1440, , , , , 7, 1380, , , , , 8, 1200, , , , , 9, 1200, , , , , 10, 1080, , , , , 11, 1200, , , , , 12, 1080, , , , , 13, 1200, , , , , 14, 1021, , , , , 15, 1020, , , , REU - Florida International University12

Previous Findings for Amon / Aprof Experiments were performed on two clusters at FIU—Mind (16 nodes) and GCB (8 nodes)  Experiments were run to predict for different number of nodes and cpu loads (i.e. 2,3,…,14,15 and 20%, 30%,…,90%, 100%)  Aprof predictions were within 10% error versus actual recorded runtimes within Mind and GCB and between Mind and GCB  Conclusion: first step assumption was valid. -> Move to extending research to higher number of nodes. 13REU - Florida International University

Paraver / Dimemas o Dimemas - simulation tool for the parametric analysis of the behavior of message-passing applications on a configurable parallel platform. o Paraver – tool that allows for performance visualization and analysis of trace files generated from actual executions and by Dimemas Tracefiles generated by MPItrace that is linked into execution code 14REU - Florida International University

Dimemas Simulation Process Overview 1. Link MPItrace into application source code—dynamically generates tracefiles for each node application running on (.mpit) 2. Use CEPBA tool ‘mpi2prv’ to convert.mpit files into one.prv file 3. Load file into Parver using XML filtering file (provided by CEPBA) to reduce tracefile eliminating ‘perturbed regions’ (i.e. much of the initialization) 4. Open tracefile in Paraver using ‘useful_duration’ configuration file and adjust scales to fit events 5. Identify computation iterations compose a smaller trace file by selecting a few iterations, preserving communications and eliminating initialization phases REU - Florida International University15

Paraver tracefile with iterations selected, cut, and ready for Dimemas conversion. REU - Florida International University16

Simulation Process (cont’d) 6. Convert the new tracefile to Dimemas format (.trf) using CEPBA provided ‘prv2trf’ tool 7. Load tracefile into Dimemas simulator, configure target machine, and with information generate Dimemas configuration file 8. Call simulator with or without option of generating a Paraver (.prv) tracefile for viewing. Great News: You only have to go through this process once if done for the maximum amount of nodes you will simulate for! Once configuration file is generated, different numbers of nodes can be simulated for through alterations to the file. REU - Florida International University17

Dimemas Simulator Results 18REU - Florida International University

Goals 1. Extend Amon/Aprof research to larger number of nodes, different architecture, and different version of WRF (Version 2.2.1). 2. Compare/contrast Aprof predictions to Dimemas predictions in terms of accuracy and prediction computation time. 3. Analyze if/how Amon/Aprof could be used in conjunction with Dimemas/Paraver for optimized application performance prediction and, ultimately, meta-scheduling 19REU - Florida International University

Timeline  End of June: Get MPItrace linking properly with WRF Version Compiled on GCB, then Mind COMPLETE a) Install Amon and Aprof on MareNostrum and ensure proper functioning AMON COMPLETE; APROF FINAL STAGES b) Run Amon benchmarks on MareNostrum COMPLETE  Early/Mid July: Use and analyze Aprof predictions within MareNostrum (and possibly between MareNostrum, GCB, and Mind) IN PROGRESS Use generated MPI/ OpenMP tracefiles (Paraver/Dimemas) to predict within (and possibly between) Mind, GCB, and MareNostrum IN PROGRESS  Late July/Early August: Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas Analyze how findings relate to bigger picture. Make optimizations on grid-enablement of WRF. Compose paper presenting significant findings. 20REU - Florida International University

21REU - Florida International University

General  Completed reading of related works papers  Well advanced in Linux studies  Established effective collaboration/working relationship with developers of Dimemas and Paraver 22REU - Florida International University

Amon  Installed on MareNostrum  Adjusted source code to properly read node information from MareNostrum (will document this on Wiki to be considered when configuring on new architectures) 23REU - Florida International University

Amon (cont’d)  Automated benchmarking shell script developed Starts Amon on each compute node returned by system scheduler Executes WRF with one process per node for: ○ Node counts of: 8, 16, 32, 64, 96, and 128 ○ CPU percentage (%) loads of: 25, 50, 75, & 100 (Done through implementation of CPULimit program) Writes results (to be used as Aprof input) to organized results directory of …/ / / / 24REU - Florida International University

Aprof  Installed on MareNostrum  Adjusted source code to change the way Aprof reads in information Before: Input files had to specify number of bytes in process listing in process header (This was very complicated and error prone. Aprof was inconsistent in loading MareNostrum data). Now: Input files simply need to separate process entries with one or more blank lines. 25REU - Florida International University

Aprof (cont’d)  Script developed that combines Amon output from all nodes and edits it into the necessary read-in format for Aprof.  Aprof query automation script adjusted /developed for MareNostrum Queries Aprof for prediction information for different cases (number of nodes; cpu percentage loads) Compares predicted values to actual values returned by run 26REU - Florida International University

Dimemas / Paraver  Paraver tracefile successfully generated and visualized with GUI on MareNostrum  Dimemas tracefile successfully generated from Paraver on MareNostrum  Configuration file for MareNostrum developed  Prediction simulations will begin shortly 27REU - Florida International University

Significant Challenges Overcome  Amon: Adjustment of source code to proper functioning on MareNostrum Development of benchmarking script to conform to system architecture of MareNostrum (i.e. going through its scheduler; one process per node; etc.)  Aprof: Adjustment of source code for less complex, more consistent data input Development of prediction and comparison scripts for MareNostrum 28REU - Florida International University

Significant Challenges Overcome (cont’d)  Dimemas/Paraver MPItrace properly linked in with WRF on GCB and Mind Paraver and Dimemas successfully generated and configuration file configured for MareNostrum.  WRF Version 2.2 installed and compiled on Mind 29REU - Florida International University

Remaining Work  Scripting Dimemas prediction simulations for the same scenarios of those of Amon and Aprof  Finalizing Aprof prediction/comparison script so that Aprof’s performance on new architecture of MareNostrum can be analyzed  Deciding if and how to compare results from MareNostrum, GCB, and Mind (i.e. the same versions of WRF would have to be running in all three locations)  Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas 30REU - Florida International University

References  S. Masoud Sadjadi, Liana Fong, Rosa M. Badia, Javier Figueroa, Javier Delgado, Xabriel J. Collazo-Mojica, Khalid Saleem, Raju Rangaswami, Shu Shimizu, Hector A. Duran Limon, Pat Welsh, Sandeep Pattnaik, Anthony Praino, David Villegas, Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye, Juan Carlos Martinez, Ivan Rodero, Shuyi Chen, Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh Willoughby, Michael McFail, Christine Lisetti, and Malek Adjouadi. Transparent grid enablement of weather research and forecasting. In Proceedings of the Mardi Gras Conference Workshop on Grid-Enabling Applications, Baton Rouge, Louisiana, USA, January ns/Mardi-Gras-GEA-2008-TGE- WRF.ppt  S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo. A modeling approach for estimating execution time of long- running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS- 2008), the Fifth High-Performance Grid Computing Workshop (HPGC- 2008), Miami, Florida, April ns/HPGC WRF%20Modeling%20Paper%20Pre sentationl.ppt  “Performance/Profiling”. Presented by Javier Figueroa in Special Topics in Grid Enablement of Scientific Applications Class. 13 May REU - Florida International University

Acknowledgements  REU  PIRE  BSC  Masoud Sadjadi, Ph. D. - FIU  Rosa Badia, Ph.D. - BSC  Javier Delgado – FIU  Javier Figueroa - UM 32REU - Florida International University