DCS/1 CENG 532 - Distributed Computing Systems Measures of Performance.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
DCS/2003/1 CENG Distributed Computing Systems Measures of Performance.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Computer System Architectures Computer System Software
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Performance Evaluation of Parallel Processing. Why Performance?
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Distributed Systems: Concepts and Design Chapter 1 Pages
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Parallel Programming with MPI and OpenMP
Classic Model of Parallel Processing
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Classification of parallel computers Limitations of parallel processing.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 11.
30-Sep-16COMP28112 Lecture 21 A few words about parallel computing Challenges/goals with distributed systems: Heterogeneity Openness Security Scalability.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Potential for parallel computers/parallel programming
A few words about parallel computing
4- Performance Analysis of Parallel Programs
September 2 Performance Read 3.1 through 3.4 for Tuesday
SCALABILITY ANALYSIS.
Definition of Distributed System
What Exactly is Parallel Processing?
Parallel Computers.
Chapter 3: Principles of Scalable Performance
Data Structures and Algorithms in Parallel Computing
Chapter 17: Database System Architectures
Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of.
Distributed Systems CS
CS 584.
Multithreaded Programming
A few words about parallel computing
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Potential for parallel computers/parallel programming
Introduction To Distributed Systems
Potential for parallel computers/parallel programming
A few words about parallel computing
Parallel Speedup.
Database System Architectures
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

DCS/1 CENG Distributed Computing Systems Measures of Performance

DCS/2 Grosch’s Law-1960’s “ To sell a computer twice as much, it must be four times as fast” It was Ok at the time, but soon it became meaningless After 1970, it was possible to make faster computers and sell even cheaper…. Ultimately the switching speeds reach a limit, which is the speed of the light on an integrated circuit…

DCS/3 Von Neumann’s Bottleneck Serial single processor computer architectures based on John Von Neumann’s architecture of has: One processor, single control unit, single memory This is no more valid: Low cost parallel computers can easily deliver the performance of the fastest single processor computer…

DCS/4 Amdahl’s Law; 1967 Amdahl’s law is still valid! Let speedup (S) be ratio of serial time (one processor) to parallel time (N processors) S=T 1 /T N < 1/f Where f is the serial fraction of the problem, 1-f is the parallel fraction of the problem, T 1 is one processor sequential time /T N is N processor parallel time, then The proof of Amdahl’s law: T N = T 1 *f+T 1 (1-f)/N S=1/(f+(1-f)/N), thus S < 1/f

DCS/5 Amdahl’s Law; 1967 At f=0.10, Amdahl’ Law predicts, at best a tenfold speedup, which is very pessimistic This was soon broken, encouraged by Gordon Bell Prize*! * Gordon Bell is computer scientist contributing to parallel computing while at DEC

DCS/6 Gustafson-Barsis Law; 1988 The team of researchers of Sandia Labs (John Gustafson and Ed Barsis), using 1024 processor nCube/10, overthrew Amdahl’s Law, by achieving 1000 fold speedup with f=0.004 to According to Amdahl’s Law, the speedup would have been from 125 to 250. The key point was found to be that 1-f was not independent of N. The relationship between N and 1-f may not be linear… Parallel algorithms may perform better than their sequential counter parts.

DCS/7 Gustafson-Barsis Law; 1988 They interpreted the speedup formula, by scaling up the problem to fit the parallel machine: T 1 =f+(1-f)N After redefining T N as T N =f+(1-f)=1, then the speedup can be computed as S=T1/TN= (f+(1-f)N)/1= f+N-Nf= S=N-(N-1)f

DCS/8 Extreme case analysis Assuming Amdahl’s Law, an upper and lower bound can be given for the speedup!: N/log 2 N <= S <= N where logN is based on divide and conquer

DCS/9 Inclusion of the communication time Some researchers (Gelenbe) suggests speedup to be approximated by S=1/C ( N) where C(N) is some function of N For example, C(N) can be estimated as C (N) =A+Blog 2 N where A and B are constants determined by the communication mechanisms

DCS/10 Benchmark Performance Benchmark is a program whose purpose is to measure a performance characteristic of a computer system, such as floating point speed, I/O speed, or for a restricted class of problems The benchmarks are arranged to be either Kernels of real applications, such as Linpacks, Livermore Loops, or Synthetic, approximating the behavior of the real problem, such as Whetstone and Wichmann…These benchmarks were synthetic, consisting of artificial kernels intended to represent the computationally intensive part of certain scientific codes. They have been in use since 1972 …

DCS/11 Scalability example  System and the application should experience performance proportional to the scale of the change in the system and the application.  For example, if more computers are added to the system, the performance experienced should increase proportional to the addition.  A change in a system or an application should not require other changes in the system or applications.  The scalability is generally achieved through replication of hardware, software, and data.

DCS/12 Scalability example  Speedup S of an application on a system of P processors is equal to Ts/Tp= P (ideally)  Efficiency E =S/P=1 (ideally)  Scalability of a system is a measure of its capability to increase speed-up in proportion to the number of processors (and/or other resources) effectively.  Scalability of adding n numbers on a hypercube of p processors.  Let p = 8 (p = 2 d, d is the degree of a processor), let n = func(p log p)

DCS/13 Scalability example  Assume that it takes one time unit both to add two numbers and to communicate a number between two adjacent processors. Thus, parallel time Tp Tp = [n/p] + 2log p  Where n/p is serial time per node and 2log p is parallel addition of two numbers and communication of one number (there are logp or d such operations).

DCS/14 Scalability example  Note that sequential time, Ts = n, Thus, S and E will be as follows: S = Ts/Tp = [np] / [n+2plog p] E = S/p = [n]/[n+ 2plog p] E = 0.8 for n = 64 p = 4, note that n = 8plogp n = 192 p = 8 n = 512 p = 16  Hence, system is scalable! increase in the system size is reflected in the application size, yet the efficiency is fixed.

DCS/15 Fault Tolerance  " The fault tolerant systems will survive faults, thus securing the correct operation of the applications and the servers, thus improving the availability of the system.  Fault tolerance can be achieved by: Hardware and software redundancy Software recovery.  " DS provide high degree of availability in theface of hardware faults.

DCS/16 Transparency  To present the user the system as a whole rather than a collection of components.  This is an important issue in characterization of the distributed systems.  " Transparency allows a distributed system to be distinguished from network systems.  More transparency more distributed system!  " There are eight forms of transparencies accepted by an international standardization work in this field: access, location, concurrency, replication, failure, migration, performance and scaling

DCS/17