Defining Performance Section 2.1 11/14/2018 9:52 PM.

Defining Performance Section 2.1 11/14/2018 9:52 PM

Defining Performance What does it mean when we say that one computer is faster than another? 11/14/2018 9:52 PM

Defining Performance A computer user may say a computer is faster when a program runs in less time. response time or execution time or latency – the time between the start and the completion of an event. 11/14/2018 9:52 PM

Defining Performance A computer center manager may say a computer is faster when it completes more jobs in an hour. throughput or bandwidth – the total amount of work done in a given time. 11/14/2018 9:52 PM

Defining Performance So we want to: decrease the response time and
increase the throughput. To avoid potential confusion, we’ll say “improve the response time” or “improve the throughput” 11/14/2018 9:52 PM

Example Do the following changes to a computer system increase throughput, decrease response time, or both? 1) Replacing the processor in a computer with a faster version. 11/14/2018 9:52 PM

Example Do the following changes to a computer system increase throughput, decrease response time, or both? 1) Replacing the processor in a computer with a faster version. Decreasing response time almost always improves throughput. Hence, both response time and throughput are improved. 11/14/2018 9:52 PM

Example Do the following changes to a computer system increase throughput, decrease response time, or both? 2) Adding additional processors to a system that uses multiple processors for separate tasks. 11/14/2018 9:52 PM

Example Do the following changes to a computer system increase throughput, decrease response time, or both? 2) Adding additional processors to a system that uses multiple processors for separate tasks. In this case, no one task gets work done faster, so only throughput increases. 11/14/2018 9:52 PM

Defining Performance To maximize performance, we want to minimize response time or execution time for some task. So for machine X 11/14/2018 9:52 PM

Defining Performance For two machines X and Y, if the performance of X is greater than the performance of Y, then 11/14/2018 9:52 PM

Defining Performance If X is n times faster than Y, then the execution time on Y is n times longer than it is on X. or 11/14/2018 9:52 PM

Example If machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B? 11/14/2018 9:52 PM

Answer We know that A is n times faster than B if
Thus the performance ratio is 15/10 = 1.5 And A is therefore 1.5 times faster than B. 11/14/2018 9:52 PM

Measuring Performance
Section 2.2 11/14/2018 9:52 PM

Measuring Performance
Time is the measure of computer performance. The computer that performs the same amount of work in the least time is the fastest. Execution time is measured in seconds per program. 11/14/2018 9:52 PM

Time Time can be defined in different ways. 11/14/2018 9:52 PM

Elapsed Time 1. Elapsed time, wall-clock time, response time
This is the total time to complete a task, including: Disk access Memory access I/O activities Operating system overhead 11/14/2018 9:52 PM

CPU Time 2. CPU execution time, or simply CPU time
The time the CPU spends computing a task. Does not include time spent waiting for I/O or running other programs. Can be divided into two parts: User CPU time – time spent in your program, and System CPU time – time spent in the OS performing tasks on behalf of your program. 11/14/2018 9:52 PM

Clock Computer users care about time.
Computer designers want to think about how fast the hardware can perform basic functions. Almost all computers are constructed using a clock that runs at a constant rate, and determines when events take place in the hardware. 11/14/2018 9:52 PM

Clock Cycles These discrete time events are called clock cycles (or ticks, clock ticks, clock periods, clocks, cycles) Refer to the length of a clock cycle both as: 1. the time for one clock cycle (e.g., 10ns) 2. the clock rate; number of clock cycles in one second (e.g., 100MHz) 11/14/2018 9:52 PM

Clock Cycles 11/14/2018 9:52 PM

CPU Time CPU time for a program can be expressed in two ways: CPU time
11/14/2018 9:52 PM

Example A program runs in 10 seconds on computer A, which has a 400MHz clock. We want to build a machine B that will run this program in 6 seconds. The designer has determine that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for this program. What clock rate should we tell the designer to target? 11/14/2018 9:52 PM

Solution First find the number of clock cycles required for the program on machine A: 11/14/2018 9:52 PM

Solution Find the clock rate for B using the CPU time eq.:
11/14/2018 9:52 PM

Number of Instructions
Previous example do not include any reference to the number of instructions in the program. The compiler clearly generated instructions for the machine to execute. The execution time must depend on the number of instructions. 11/14/2018 9:52 PM

Execution Time Execution time equals the number of instructions executed multiplied by the average time per instruction. Therefore, the number of clock cycles required for a program is: 11/14/2018 9:52 PM

Clock Cycles Per Instruction (CPI)
The average number of Clock Cycles Per Instruction (CPI) : or 11/14/2018 9:52 PM

CPU Time Substituting into the CPU Time equation, we get: or
11/14/2018 9:52 PM

CPU Time Expanding into the units of measure: 11/14/2018 9:52 PM

Example Suppose we have two implementation of the same instruction set architecture. Machine A has a clock cycle time of 1 ns and a CPI of 2.0 for some program And machine B has a clock cycle time of 2 ns and a CPI of 1.2 for the same program. Which machine is faster for this program, and by how much? 11/14/2018 9:52 PM

Solution We know that each machine executes the same number of instructions; let’s call this number I. Hence, the CPU time for each machine is: Clearly, machine A is faster 11/14/2018 9:52 PM

Solution The amount faster is given by the ratio of the execution (CPU) times: We conclude that machine A is 1.2 times faster than machine B for this program. 11/14/2018 9:52 PM

CPU Performance Instruction count (IC) – dependent on what?
The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on what? 11/14/2018 9:52 PM

CPU Performance The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on instruction set architecture and compiler technology. 11/14/2018 9:52 PM

CPU Performance The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on instruction set architecture and compiler technology. Clock cycles per instruction (CPI) – dependent on what? 11/14/2018 9:52 PM

CPU Performance The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on instruction set architecture and compiler technology. Clock cycles per instruction (CPI) – dependent on organization and instruction set architecture. 11/14/2018 9:52 PM

CPU Performance The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on instruction set architecture and compiler technology. Clock cycles per instruction (CPI) – dependent on organization and instruction set architecture. Clock cycle time (CT) – dependent on what? 11/14/2018 9:52 PM

CPU Performance The previous equation shows that CPU performance is dependent on three characteristics: Instruction count (IC) – dependent on instruction set architecture and compiler technology. Clock cycles per instruction (CPI) – dependent on organization and instruction set architecture. Clock cycle time (CT) – dependent on hardware technology and organization of hardware. 11/14/2018 9:52 PM

CPU clock cycles Sometimes it is possible to compute the CPU clock cycles by looking at the different types of instructions and using their individual clock cycle counts. where Ci is the number of instructions of class i executed, CPIi is the average number of cycles per instruction for that instruction class, n is the number of instruction classes. 11/14/2018 9:52 PM

Overall CPI The overall CPI is: 11/14/2018 9:52 PM

CPI for this instruction class
Example A compiler designer is trying to decide between two code sequences for a particular machine. The hardware designer has supplied the following facts: Instruction class CPI for this instruction class A 1 B 2 C 3 11/14/2018 9:52 PM

Instruction counts for instruction class
Example For a particular high-level-language statement, the compiler writer is considering two code sequences that require the following instruction counts: Code sequence Instruction counts for instruction class A B C 1 2 4 11/14/2018 9:52 PM

Example Which code sequence executes the most instructions?
Which will be faster? What is the overall CPI for each sequence? 11/14/2018 9:52 PM

Solution Sequence 1 executes 2 + 1 + 2 = 5 instructions.
So sequence 1 executes fewer instructions. 11/14/2018 9:52 PM

Solution Use the equation for CPU clock cycles to find which sequence is faster: Therefore, 11/14/2018 9:52 PM

Solution So code sequence 2 is faster, even though it actually executes one extra instruction. This shows the danger of using only instruction count to assess performance. 11/14/2018 9:52 PM

Solution Since code sequence 2 takes fewer overall clock cycles but has more instructions, it must have a lower CPI. The overall CPI for each sequence is: 11/14/2018 9:52 PM

Example Suppose we are considering two alternatives for a conditional branch instruction CPUA use 2 instructions to do a branch – a condition code is set by a compare instruction and followed by a branch that test the condition. CPUB use 1 instruction to do a branch – a compare is included in the branch. 11/14/2018 9:52 PM

Example 20% of all instructions executed are branches
On both CPUs, branch instruction takes 2 cycles All other instructions take 1 clock cycle. Because CPUA does not have the compare included in the branch, its clock cycle time is 25% faster than CPUB’s cycle time. Why is this possible? Which CPU is faster? 11/14/2018 9:52 PM

Solution Working in percentages, we can assume that CPUA executes a total of 100 instructions, thus we have: CPUA CPUB 20 compare instructions 20 compare and branch 20 branch instructions 60 others 100 total instruction count 80 total instruction count 11/14/2018 9:52 PM

Solution Using the overall CPI equation, we have: 11/14/2018 9:52 PM

Solution Since CTA is 25% faster than CTB, therefore CTB = 1.25CTA
Hence, the CPU execution time is: 11/14/2018 9:52 PM

Example By reworking the organization of CPUB, we can improve the clock cycle time for CPUB so that the clock cycle time for CPUA is now only 10% faster than CPUB’s cycle time. Which CPU is faster now? 11/14/2018 9:52 PM

Solution The performance of CPUA is unchanged at 120CTA.
The only change from the answer above is that CTB is now 1.10  CTA since A is just 10% faster. 11/14/2018 9:52 PM

Solution The performance of CPUB is now
With this improvement CPUB is now faster. 11/14/2018 9:52 PM

Quantitative Principles of Computer Design
Make the common case fast. In making a design tradeoff, favor the frequent case over the infrequent case, i.e. improve the frequent event, rather than the rare event. The performance gain that can be obtained by improving some portion of a computer can be calculated using Amdahl’s Law. 11/14/2018 9:52 PM

Amdahl’s Law Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 11/14/2018 9:52 PM

Amdahl’s Law Depends on two factors:
Fractionenhanced – The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement. Speedupenhanced – The improvement gained by the enhanced execution mode, i.e., how much faster the task would run if only the enhanced mode was used. This value is the time of the original mode divided by the time of the enhanced mode. 11/14/2018 9:52 PM

Execution Time The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced portion of the machine PLUS the time spent using the enhancement: 11/14/2018 9:52 PM

Speedup Amdahl’s Law defines the speedup that can be gained by using a particular feature. 11/14/2018 9:52 PM

Amdahl’s Law 11/14/2018 9:52 PM

Example Suppose a cache is five times faster than main memory.
Suppose that the cache can be used 95% of the time. How much speedup do we gain by using the cache? 11/14/2018 9:52 PM

Solution Using Amdahl’s Law, Speedupoverall is:
Hence, we obtain a speedup from the cache of about 4.2 times. 11/14/2018 9:52 PM

Example Consider the problem of going from Nevada to California over the Sierra Nevada mountains and through the desert to Los Angeles. You have several types of vehicles available, but unfortunately your route goes through ecologically sensitive areas in the mountains where you must walk. Your walk over the mountains will take 20 hours. The last 200 miles, however, can be done by highspeed vehicle. There are five ways to complete the second portion of your journey: 11/14/2018 9:52 PM

Example Walk at an average rate of 4 miles per hour.
Ride a bike at an average rate of 10 miles per hour. Drive a Hyundai Excel in which you average 50 miles per hour. Drive a Ferrari Testarossa in which you average 120 miles per hour. Drive a rocket car in which you average 600 miles per hour. 11/14/2018 9:52 PM

Example How long will it take for the entire trip using these vehicles, And what is the speedup versus walking the entire distance? 11/14/2018 9:52 PM

Solution We can find the answer by determining how long the second part of the trip will take and adding that time to the 20 hours needed to cross the mountains. 11/14/2018 9:52 PM

Solution Vehicle for second portion of trip
Hours for second portion of trip Speedup in the desert (Speedupenhanced) Hours for entire trip (Execution time) Speedup for entire trip (Speedupoverall) Walk (original machine) 200/4 = 50 50/50 = 1.0 20+50 = 70 old 70/70 = 1.0 Bike 200/10 = 20 50/20 = 2.5 = 40 new 70/40 = 1.8 Hyundai 200/50 = 4 50/4 = 12.5 = 24 new 70/24 = 2.9 Ferrari 200/120 = 1.7 50/1.67 = 30 = new 70/21.67 = 3.2 Rocket car 200/600 = .3 50/.33 = 150 = 20.3 new 70/20.3 = 3.4 11/14/2018 9:52 PM

Solution Using Amdahl’s Law to find the speedup for the Hyundai:
11/14/2018 9:52 PM

Locality of Reference Programs tend to reuse data and instructions they have used recently. A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of the code. An implication of locality is that based on the program’s recent past, one can predict with reasonable accuracy what instructions and data a program will use in the near future. 11/14/2018 9:52 PM

Locality of Reference Two different types of locality:
Temporal locality states that recently accessed items are likely to be accessed in the near future. Spatial locality states that items whose addresses are near one another tend to be referenced close together in time. 11/14/2018 9:52 PM

MIPS An alternative metric to measure performance instead of time is MIPS. MIPS = Million Instructions Per Second Faster machines have a higher MIPS rating. 11/14/2018 9:52 PM

MIPS for Comparison The problem with using MIPS as a measure for comparison is threefold: MIPS considers the instruction execution rate but does not take into account the capabilities of the instruction, therefore cannot compare MIPS of computers with different instruction sets. MIPS varies between programs even on the same computer. MIPS can vary inversely to performance. 11/14/2018 9:52 PM

Anomaly due to Instruction Set Architecture
MIPS rating of a machine with floating-point (FP) hardware. FP instruction generally takes more clock cycles to execute than integer instruction. FP programs using the FP hardware instead of software FP routines take less time but have a lower MIPS rating. Why? Software FP executes simpler instructions, resulting in a higher MIPS rating, but it executes so many more that overall execution time is longer. 11/14/2018 9:52 PM

Anomaly due to Compiler
Consider a machine with the following three instruction classes and CPI: Instruction class CPI for this instruction class A 1 B 2 C 3 11/14/2018 9:52 PM

Instruction counts (in billions) for each instruction class
Example We measure the code for the same program from two different compilers and obtained the following data: Code from Instruction counts (in billions) for each instruction class A B C Compiler 1 5 1 Compiler 2 10 11/14/2018 9:52 PM

Example Assume that the machine’s clock rate is 500 MHz.
Which code sequence will execute faster according to MIPS? Which code sequence will execute faster according to execution time? 11/14/2018 9:52 PM

Solution Calculate the execution time for the two compilers using the two equations: 11/14/2018 9:52 PM

Solution therefore Compiler 1 generates the faster program.
11/14/2018 9:52 PM

Solution To compute the MIPS: Compiler 2 has a higher MIPS rating!
11/14/2018 9:52 PM

Defining Performance Section 2.1 11/14/2018 9:52 PM.

Similar presentations

Presentation on theme: "Defining Performance Section 2.1 11/14/2018 9:52 PM."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Defining Performance Section 2.1 11/14/2018 9:52 PM.

Similar presentations

Presentation on theme: "Defining Performance Section 2.1 11/14/2018 9:52 PM."— Presentation transcript:

Similar presentations

About project

Feedback