Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

Power calculation for transistor operation What will cause power consumption to increase? CS2710 Computer Organization1.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory.
Performance Evaluation of Load Sharing Policies on a Beowulf Cluster James Nichols Marc Lemaire Advisor: Mark Claypool.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
Chapter 2 Computer Clusters Lecture 2.1 Overview.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Computer Performance Computer Engineering Department.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Memory/Storage Architecture Lab Computer Architecture Performance.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Lecture 8: 9/19/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Academic PowerPoint Computer System – Architecture.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Concurrency and Performance Based on slides by Henri Casanova.
Distributed and Parallel Processing George Wells.
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
4- Performance Analysis of Parallel Programs
Introduction to Load Balancing:
Introduction to parallel programming
Tohoku University, Japan
CSCE 212 Chapter 4: Assessing and Understanding Performance
Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of.
CSE8380 Parallel and Distributed Processing Presentation
COMP60621 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Chapter 01: Introduction
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department of Computer Science University of Cape Town, South Africa

Lincoln University Canterbury New Zealand Introduction Why measure and evaluate parallel performance? What to measure How to evaluate Why speedup and efficiency are not appropriate Alternate methods of performance evaluation –Power weight –Linear speed –Linear efficiency Conclusions

Lincoln University Canterbury New Zealand Why measure parallel performance? Increasing use of parallel processing –Clusters, such as Beowulfs –Networks of Workstations (NOWs) Scalability –Does performance improve as more processors are added? –Will performance continue to improve as more processors are added? Efficiency –Is the best possible performance being achieved? –Where and how can performance be improved? –Is the best algorithm being used?

Lincoln University Canterbury New Zealand What makes systems heterogeneous? Different architectures Different memory and cache capabilities Different operating systems Communications Non-dedicated systems

Lincoln University Canterbury New Zealand Why use a real application to measure performance? Real users want to know how long real applications take Measures such as: –MIPS –MFLOPS –Kernels –Vendor-tuned benchmarks do not take into account factors such as: –input/output, –communication time, –memory needs and usage, –idle time etc.

Lincoln University Canterbury New Zealand Background Climatology application –Cloud radiation simulation to model and measure reflectivity, transmissivity, and absorptivity of a heterogeneous strato-cumulus cloud deck. Environment –Network of Unix workstations, five different models of Silicon Graphics and Sun workstations with varying CPU performance and memory capacity, –connected by a 10Mbit Ethernet network.

Lincoln University Canterbury New Zealand Serial times for processors CPU, System and Elapsed times

Lincoln University Canterbury New Zealand Parallel performance - What to measure? Elapsed time CPU time System time Communication time Idle time

Lincoln University Canterbury New Zealand Parallel performance - How to evaluate? Elapsed time graphs Speedup Efficiency Power weight Linear speed Linear efficiency

Lincoln University Canterbury New Zealand Heterogeneous Parallel group CPU, System and Elapsed times

Lincoln University Canterbury New Zealand Speedup Speedup is: the ratio of the serial time taken on one processor compared to the parallel time on all processors Elapsed time on 1 processor Elapsed time on n processors

Lincoln University Canterbury New Zealand Speedup – how to calculate it? Which single processor elapsed time should be used? –Elapsed time of fastest processor? –Elapsed time of slowest processor? –Mean elapsed time of all processors used?

Lincoln University Canterbury New Zealand Speedup vs Perfect Speedup (calculated with means)

Lincoln University Canterbury New Zealand Efficiency Efficiency is: the ratio of the speedup divided by the number of processors used Speedup for n processors n

Lincoln University Canterbury New Zealand Efficiency (calculated with means)

Lincoln University Canterbury New Zealand Speedup and efficiency not appropriate Conclusion that speedup and efficiency are not appropriate for evaluating parallel performance on a heterogeneous system, and even have their limitations on a homogeneous system.

Lincoln University Canterbury New Zealand Power weight (Zhang et al) Power weight is: ratio of performance of each processor as compared to performance of the fastest processor Elapsed time for fastest processor Elapsed time for n th processor

Lincoln University Canterbury New Zealand Linear Speed (Crowl) Linear Speed is: the amount of work done in unit time 1 Total elapsed time

Lincoln University Canterbury New Zealand Linear Speed

Lincoln University Canterbury New Zealand Linear Efficiency (Post and Goosen) Linear Efficiency is: ratio of work done in parallel application compared to potential amount of work that could be done by all processors Linear speed for parallel application on n processors Sum of single processor linear speeds for n processors

Lincoln University Canterbury New Zealand Linear Efficiency

Lincoln University Canterbury New Zealand Advantages of linear speed and linear efficiency Linear speed for each machine is independent of all other machines Linear speeds can be recalculated dynamically as workloads vary and used to determine each machine’s current performance compared to its serial capacity. Dynamic calculation of linear speeds and linear efficiency can be used for dynamic load- balancing algorithms.

Lincoln University Canterbury New Zealand Conclusions Important to measure overall elapsed time, as well as components of CPU, communication, idle time etc. Speedup and efficiency are not appropriate for evaluating parallel performance, especially for heterogeneous systems Linear speed and linear efficiency provide useful ways of evaluating parallel performance for both heterogeneous and homogeneous systems. Linear speed and linear efficiency can be calculated dynamically and be used in dynamic load balancing algorithms.

Lincoln University Canterbury New Zealand