Presentation is loading. Please wait.

Presentation is loading. Please wait.

4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors

Similar presentations


Presentation on theme: "4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors"— Presentation transcript:

0 4. Assessing and Understanding Performance

1 4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks 4.7 Historical Perspective and Further Reading 4.8 Exercises

2 4.1 Introduction Defining Performance
How to measure, report, and summarize performance Defining Performance An analogy Figure 4.1 Airplane Passenger capacity Cruising range Cruising speed Passenger throughput Boeing 777 375 4630 610 228,750 Boeing 747 470 4150 286,700 BAC/Sud Concorde 132 4000 1350 178,200 Douglas DC-8-50 146 8720 544 79,424 Back to chapter overview

3 Performance of a Computer
Response time ( = execution time ) The time between the start and completion of a task Throughput The total amount of a work done in a given time Performance and execution time Performancex = 1 / Execution timex X is n times faster than Y

4 Measuring Performance
Definitions of time Wall-clock time = Response time = Elapsed time Total time to complete a task Including disk accesses, memory accesses, I/O activities, OS overhead and etc. CPU execution time = CPU time The time CPU spends computing for this task CPU time = User CPU time + System CPU time UNIX time command 90.7u 12.9s 2: % Definitions of performance System performance: based on elapsed time CPU performance: based on user CPU time

5 4.2 CPU Performance and Its Factors
CPU execution time = CPU clock cycles x clock cycle time = CPU clock cycles / clock rate Example: Improving Performance Same instruction sets Computer A : 4 GHz, 10 seconds Computer B : ? GHz, 6 second B requires 1.2 times as many clock cycles as A. Back to chapter overview

6 [Answer] CPU timeA = CPU clock cyclesA / clock rateA
10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec) CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec = 40 X 109 cycles CPU timeB = CPU clock cyclesB / clock rateB = 1.2 X CPU clock cyclesA / clock rateB 6 seconds = 1.2 X 40 X 109 cycles / clock rateB clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz

7 Hardware Software Interface
CPU clock cycles = IC x CPI IC (Instruction Count) Dependent on compilers and architectures CPI (Cycles Per Instruction) Dependent on implementations Performance equation Execution Time = IC x CPI x clock cycle time = (IC x CPI) / clock rate

8 Example: Using the Performance Equation
Same instruction set architecture, same program Clock cycle timeA = 250ps, CPIA = 2.0 Clock cycle timeB = 500ps, CPIB = 1.2 Which is faster, and by how much ? [Answer] Let I = instruction count for the program. CPU timeA = ICA x CPIA x clock cycle timeA = I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps Then Thus, A is 1.2 times faster than B for this program.

9 The Big Picture

10 Example: Comparing Code Segments
Which will be faster ? What is the CPI for each sequence ?

11 [Answer] instruction count1 = 2 + 1 + 2 = 5 and
Thus (1) executes fewer instructions. CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9 Thus (2) is faster. CPI1 = CPU clock cycles1 / instruction count1 = 10 / 5 =2 CPI2 = 9 / 6 = 1.5 (2) has lower CPI.

12 4.3 Evaluating Performance
Benchmarking The process of performance comparison for two or more systems by measurements Benchmark Programs specifically chosen to measure performance A workload that the user hopes will predict the performance of the actual workload Compiler tricks Optimizations in either the architecture or compiler Back to chapter overview

13 Compiler Tricks by IBM

14 Comparing and Summarizing Performance
Difficulties with summarizing performance A is 10 times faster than B for program 1. B is 10 times faster than A for program 2. Total execution time: A Consistent Summary Measure AM: Arithmetic Mean = Weighted arithmetic mean = Figure 4.4

15 4.6 Concluding Remarks Three design criteria
High-performance design Supercomputer and high-end server Low-cost design Embedded system Cost/performance design Desktop computer Execution time of real program as the metrics Back to chapter overview


Download ppt "4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors"

Similar presentations


Ads by Google