Computer Architecture

Slides:



Advertisements
Similar presentations
Performance Evaluation of Architectures Vittorio Zaccaria.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Computer Organization and Architecture 18 th March, 2008.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
Chapter 4 Assessing and Understanding Performance
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Using Standard Industry Benchmarks Chapter 7 CSE807.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Memory/Storage Architecture Lab Computer Architecture Performance.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
Performance. Moore's Law Moore's Law Related Curves.
CpE 442 Introduction to Computer Architecture The Role of Performance
Lecture 2: Performance Evaluation
4- Performance Analysis of Parallel Programs
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
Defining Performance Which airplane has the best performance?
CSCE 212 Chapter 4: Assessing and Understanding Performance
Defining Performance Section /14/2018 9:52 PM.
Parameters that affect it How to improve it and by how much
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

Computer Architecture Part I-C: Performance

What does faster mean? Response time Throughput The time spent to complete an event Also referred to as execution time or latency Throughput Amount of work done in a given time Also referred to as bandwidth In general, faster response time means an improvement in throughput One of the more important things that a company would consider when buying a machine from a vendor is to answer the question “If I buy this machine, how fast will it perform my tasks?” Example :PILTEL -- printer; PLDT - 1 TB of data access But how do we quantify the term “faster”? Use slide Response time: is the most commonly used measure. Why? It’s the most visible. From the time I press the return key to the time that I see the output. That is my response time. Throughput: should also be considered. Although the response time is good, but if it does not produce the necessary output, then management would think that the system is slow. Example: Globe -- activations of cellphones The slowest machine would case a bottleneck. When does a bottleneck occur? When you have an increase in demand for resource versus a limited amount of resource. Your solution to a bottleneck is to either decrease the demand or increase the resource.

Execution Time and Performance Quantitatively, execution time is inversely proportional to performance. improve performance = increase performance improve execution time = decrease execution time X is n times faster than Y means But besides visual representation,we need something quantitative, and that is where more equations come in. (Ughhhh!!!!!) When we say that we want to improve performance of a machine, we mean to decrease the execution time of a program. The faster the machine executes the program, the better the performance of a system. Therefore, as we decrease the execution time, we increase the performance P indicates performance t indicates execution time For example, the execution time of machine x is 30 sec. While machine y is 900 seconds. We can say that X is 30 times faster than Y.

Make the Common Case Fast A rule of thumb in computer design is to make the event that occurs more frequently, faster In making a design trade-off, favor the frequent case over the infrequent case In general, this move should increase overall performance A rule of thumb in the industry right now is the idea that…. Use slide Let’s say this will enable engineers to “cheat” on how well their machine runs. How? Example: Games…..

Amdahl’s Law The performance improvement to be gained from using some faster mode of operations is limited by the fraction of time that faster mode can be used. Speedup due to enhancement E To quantitatively determine how much was the increase acquired when we have an additional component in our system, we use Amdahl’s Law. In simple terms, look at it this way, when we improve the speed of the CDs, is their any immediate effect on the system? No. When do we only see the effect? When we use the CD for installation, for games, for video. Etc. Amdahl’s Law states that the performance that we gain is limited by the amount to times that we use that component. The idea here is we want to SPEEDUP the system by enhancing one component of the system. Use slide for the formula ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E (for an entire task)

Factors Affecting the Speedup The fraction of computation time in the original machine that can be converted to take advantage of the enhancement The improvement gained by the enhanced execution mode, i.e. how much faster the task would run if the enhanced mode were used for the entire program. So, what are the factors that will affect the speedup of a system. 1. The part of the program’s execution time that will use the enhancement. If 30 seconds out of the 900 seconds will utilize the enhancement, then the fraction that we will enhance is 30/900 or 1/30. Time of code which will use enhancement Total time of the code’s execution 2. What is the new speed of the part of the code? If before, we said that the part of the code takes 30 seconds, will now take just 3 seconds, then the speedup is 30/3 or 10 Old time needed to complete code New time to complete code

Applying Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced The new execution time is now computed as : Old execution time x [time when the improvement is not being used - time spent on the enhancement] For the speedup formula, the execution time old is cancelled during the replacement. Speedupenhanced

Using Amdahl’s Law: An Example Suppose that we are considering an enhancement that runs 10 times faster than the original machine but is only usable 40% of the time. What is the overall speedup gained by incorporating the enhancement? Answer: Fractionenhanced = 0.4 Speedupenhanced = 10 Speedupoverall = 1 / [0.6+(0.4/10)] = 1/0.64 = 1.56

Measuring CPU Processing Speed: The Clock A circuit which generates a signal that defines regular time intervals or cycles during which basic CPU steps are performed Provides control as to when each step of the instruction cycle takes place

Clock Cycles pulse cycle One clock pulse is the burst of current when the clock output is equal to 1 A clock cycle is the interval between the beginning of a pulse to the beginning of the next Measured in Hertz, a unit of measurement of electrical vibrations. I Hz = 1 cycle/second Basic unit of CPU speed = 1 million Hz or 1 MHz pulse cycle

Locality of Reference Programs tend to reuse data and instructions they have used recently. A program may spend 90% of its execution time in only 10% of the code. Based on a program’s recent past, one can predict with reasonable accuracy what instructions and data will use in the near future. Some weird things about data. What do you usually do when you notice that the system is taking forever to open a file from your disk? -- You defrag the disk. Why? Because the data is going to be located beside each other. So? It will be easier on the system to access it faster. Why? Because of the principle of locality of reference.

Two Types of Locality Temporal Locality Spatial Locality recently accessed items are likely to be accessed in the near future Spatial Locality items whose addresses (or location) are near one another tend to be referenced close together in time There are two types of locality. Temporal Locality : example -- in programming (loops); in games (doom) Spatial Locality : disk access.

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath In the industry right now, there are a lot of ways to measure the performance of a system. Each measure address a particular component/system level. On the slide are the most commonly used metrics. Clock rate : Each computer has a clock running at a constant rate. This discrete time events are called tick, clock ticks, cycles, clock cycles. The time of a clock period can be expressed in length or rate (MHz) Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

MIPS Benchmark Millions of Instructions Per Second Easy to understand and straightforward Dependent on instruction set Varies between programs on the same computer MIPS can vary inversely with performance! Note : The check means an advantage; the x means a disadvantage Question: Do I discuss the formula or not? Do I reserve the formula until next week? MIPS = clock rate CPI x 10^6 For the third example, assume that we have a UNIX WS. The UNIX WS has a separate HW necessary to do floating point computation. Since it will be necessary to do the FP computationsand tehre are more instructions necessary to do an FP rather than an IP in the HW component, the MIPS will be lower.

MFLOPS Benchmark Millions of Floating-point Operations Per Second (MegaFLOPS) Intended to measure floating-point operations but some programs don’t use any Floating-point operations are not consistent across machines MFLOPS ratings for the same machine may differ depending on instruction mix Another way of comparing performance involves mflops, but it has more disadvantages rather than advantages.

Programs as Evaluators Four types (in decreasing order of accuracy): Real programs Kernels Toy Benchmarks Synthetic Benchmarks So, how else can we determine whether one system is better than another system? The most popular way (or the industry way) is to come up with a group of programs which will try to determine the performance of a system. This group of program is collective called benchmarks. Currently, there are a lot of benchmarks in the industry out there and the best example of benchmarks are the one which you see in computer magazine which try to advise you which notebook/desktop/laptop to buy and why. We can group all of these programs into four general types and they are mentioned above.

Synthetic Benchmarks Programs which try to match the average number and frequency of operations of a typical workload, e.g. dhrystone, whetstone, etc. Not real programs, may not reflect program behavior for factors not measured Compilers and hardware optimizations can artificially inflate results The first type are what you call synthetic benchmarks. Use slide These tasks are concise, abbreviated, and contain well-known representations of the functions and operations of an application. Or workload These programs were gathered based on statistics. In other words, on the average, these are the common programs which are executed by a majority of the users.

Toy Benchmarks Small, simple programs Produce a result the user already knows Example: quicksort, Sieve of Erastosthenes, etc.

Kernel Benchmarks Small, key pieces from real programs put together to evaluate machine performance Examples: Linpack, Livermore Loops, etc. No user would run kernel programs because they exist solely for performance evaluation Best used to isolate performance of individual features of machines to explain the reasons for differences in real programs

Real Programs Common programs like compilers (e.g. C), word processors (e.g. TeX, MS Word), computer-aided design tools (e.g. Spice), etc. Real programs have the input, output, and options that a user can select.

When Benchmarks Disagree What is MMX’s real speed? So if, you see read in the trade magazines that MMX is of this speed and another trade magazine says that it’s this speed. Well, what are you going to answer? It depends on the benchmarks which are being used. So, what should an MIS manager do? Ask the ff. Questions: 1. When were the benchmarks run? By whom? Using what hardware and software configuration? 2. Response time is cited as a critical metric, but response time to what? 3. It is relatively easy to create a benchmark that will show that any given machine will outperform another. 4. Beware of the single figure of merit 5. If use benchmark data from others, caveat emptor Best benchmarks are the ones which are created by your own company to determine how fast the system would perform. source: adapted from Byte April 1998

Popular Benchmarks Bapco SYSmark - application, tests system BYTEmark - synthetic, tests processor Intel Media - synthetic, tests processor (multimedia, uses MMX instructions) CaffeineMark - synthetic, tests JVM SPEC CPU95 - synthetic, tests processor (two suites: integer and floating-point) SPEC Glperf - synthetic, tests 3-D graphics SPEC Viewperf - application, 3-D graphics Norton Multimedia - synthetic, tests system (multimedia, uses MMX instructions)

Popular Benchmarks TPC-C (Transaction Processing Council) - database application, tests transaction-processing performance TPC-D - database application, tests decision support and data-warehousing performance ZDBOp (Ziff-Davis Benchmark Operation): BrowserComp - application, tests browsers CPUmark32 - synthetic, tests processor NetBench - application, tests network performance ServerBench - application, tests server performance WebBench - application, tests web server WinBench - application, tests component subsystems Winstone - application, tests system

Programs as Evaluators Companies may design features that would make their machines run faster on the benchmarks than on real programs A standard set of programs is hard to obtain because each program run differently for each machine and companies would want to use programs that run fast on their machines