GPU Performance Prediction GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009. Javier Delgado Gabriel Gazolla.

Slides:

Advertisements

Similar presentations

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

Advertisements

How Cyberinfrastructure is Helping Hurricane Mitigation Students Javier Delgado (FIU)‏ [presenter] Zhao Juan (CNIC)‏ [presenter] Bi Shuren (CNIC)‏ Silvio.

GreenSlot: Scheduling Energy Consumption in Green Datacenters Íñigo Goiri, Kien Le, Md. E. Haque, Ryan Beauchea, Thu D. Nguyen, Jordi Guitart, Jordi Torres,

Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

st International Conference on Parallel Processing (ICPP)

OpenFOAM on a GPU-based Heterogeneous Cluster

Lecture 1: History of Operating System

Transitioning unique NASA data and research technologies to the NWS 1 Evaluation of WRF Using High-Resolution Soil Initial Conditions from the NASA Land.

Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago)

PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.

Copyright Arshi Khan1 System Programming Instructor Arshi Khan.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini.

Simulating Condor Stephen McGough, Clive Gerrard & Jonathan Noble Newcastle University Paul Robinson, Stuart Wheater Arjuna Technologies Limited Condor.

SAGE: Self-Tuning Approximation for Graphics Engines

Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.

“High-performance computational GPU-stand for teaching undergraduate and graduate students the basics of quantum-mechanical calculations“ “Komsomolsk-on-Amur.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

April 24, 2007 Nihat Cubukcu Utilization of Numerical Weather Forecast in Energy Sector.

Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.

1 Chapter 04 Authors: John Hennessy & David Patterson.

Application Performance Prediction Javier Delgado Feb. 9, 2009 X.

SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1, Shu Shimizu 2, Javier Figueroa 1,3, Raju.

Profiling and Tuning OpenACC Code. Profiling Tools (PGI) Use time option to learn where time is being spent -ta=nvidia,time NVIDIA Visual Profiler 3 rd.

GPU Architecture and Programming

ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.

Multi-core Acceleration of NWP John Michalakes, NCAR John Linford, Virginia Tech Manish Vachharajani, University of Colorado Adrian Sandu, Virginia Tech.

Modeling VHDL in POSE. Overview Motivation Motivation Quick Introduction to VHDL Quick Introduction to VHDL Mapping VHDL to POSE (the Translator) Mapping.

An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh

 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh

Faucets Queuing System Presented by, Sameer Kumar.

Weather Research & Forecasting Model Xabriel J Collazo-Mojica Alex Orta Michael McFail Javier Figueroa.

Innovation for Our Energy Future Opportunities for WRF Model Acceleration John Michalakes Computational Sciences Center NREL Andrew Porter Computational.

June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

How do you prevent community loss in the event of a natural disaster? In a study done by the Community Action Plan for Seismic Safety (CAPSS), San Francisco.

David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)

WRF - REU Project Presentation Michael McFail Xabriel J Collazo-Mojica Javier Figueroa Alex Orta.

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Sunpyo Hong, Hyesoon Kim

Comparison of Algorithms Objective: Understand how and why algorithms are compared. Imagine you own a company. You’re trying to buy software to process.

Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.

Analysis of Sparse Convolutional Neural Networks

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

Heterogeneous Computation Team HybriLIT

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Predicting Miscellaneous Electrical Loads (MELs) in Commercial Buildings: A Time Series Analysis Presented by: Behzad Esmaeili, Ph.D. April 26th, 2018.

Hui Chen, Shinan Wang and Weisong Shi Wayne State University

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Presentation transcript:

GPU Performance Prediction GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Javier Delgado Gabriel Gazolla Constantinos Menelaou Lixi Wang Mark Joselli

Outline Motivation Role in Energy Efficiency Performance Modeling GPU programming for Weather Modeling GPU Programming for BLAST Model Testing Conclusion GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Benefits GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

GPU Performance Improvement Over Time GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Source: nVidia.com

Sample Speedups GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Source: nVidia.com

Outline Motivation Role in Energy Efficiency Performance Modeling GPU programming for Weather Modeling GPU Programming for BLAST Model Testing Conclusion GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Role in Energy Efficiency Idle GPU = wasted energy Maximally-loaded GPU = a lot of power consumption For example  Nvidia 8800 GTX consumes max load  Intel Xeon LS5400 consumes max load GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Source: (which is derived from data from

Power Consumption GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2,

GPU Role in Energy Efficiency But... GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Source: John Michalakes and Manish Vachharajani

And... GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Outline Motivation Role in Energy Efficiency Hurricane Mitigation Overview Performance Modeling GPU Programming for BLAST Model Testing Conclusion GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Motivation Hurricanes cost coastal regions financial and personal damage Damage can be mitigated, but  Impact area prediction is inaccurate  Simulation using commodity computers is not precise Alarming Statistics  40% of (small-medium sized) companies shut down within 36 months, if forced closed for 3 or more days after a hurricane  Local communities lose jobs and hundreds of millions of dollars to their economy If 5% of businesses in South Florida recover one week earlier, then we can prevent $219,300,000 in non- property economic losses Hurricane Andrew, Florida 1992 Katrina, New Orleans 2005 Ike, Cuba 2008

Outline Motivation Role in Energy Efficiency Hurricane Mitigation Overview Performance Modeling GPU Programming for BLAST Model Testing Conclusion GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Motivation for application profiling and performance prediction Optimal usage of grid resources through “smarter” meta-scheduling Many users overestimate job requirements Reduced idle time for compute resources Save utility and energy costs Optimal resource selection for most expedient job return time GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Predicting Execution Time of Weather Research and Forecasting (WRF) Software Paradox of submitting computationally intensive jobs Underestimated run time = killed job Overestimating run time = long queue times When performing hurricane simulations, results are usually needed very quickly GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Process GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Typical Results on Large Clusters Input: Marenostrum – 8, 16, and 32 nodes – 1 process per node Output: Marenostrum – 8, 16, 32, 64, 96, and 128 nodes

Future Modeling Plans Model execution time with different GPU configurations Current GPU project objective: learn how to model GPU performance by porting WRF kernels to CUDA Test with different cards Test with different processor configurations Test with different number of nodes GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Overview of GPU Benchmarking Project GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, Understand Source code of existing CUDA-ported code Understand old source code (Fortran) Learn CUDA Port another module Benchmark Learn WRF Learn CUDA Learn Fortran

Status Code has been compiled and executed Regions of similarity are being identified – Fortran Program: 1729 lines – CUDA (C) Program: 1329 lines (incl init) Currently figuring out necessary code logic of existing ported kernel Preliminary documentation/report of findings GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Outline Motivation Role in Energy Efficiency Hurricane Mitigation Overview Performance Modeling GPU Programming for BLAST Model Testing Conclusion GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Purpose BLAST used extensively for sequence analysis Provides a different kind of application for testing GPU performance improvements Further improve our GPU programming and performance modeling knowledge GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Status Literature review concerning other sequence analysis work with GPU Learning how BLAST works GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Long-running, Fault-tolerant Weather Prediction Slight inaccuracies in initial conditions of domain can cause significant inaccuracies later Third component of this project: account for this using perturbation analysis The effects of perturbation on runtime must also be modeled GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Conclusion GPU’s promise much faster job execution for different applications In order to maximize resource utilization, application execution time should be predictable Especially for time-critical applications that take long to execute GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.

Thank You Questions? GreenLight Education & Outreach Summer Workshop UCSD. La Jolla, California. July 1 – 2, 2009.