SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout

Summary of this paper Creates theoretical foundation for performance measurement of a given system, from a mathematical standpoint Creates theoretical foundation for performance measurement of a given system, from a mathematical standpoint From whose perspective should we measure the performance of a given system? From whose perspective should we measure the performance of a given system? User User System System Combination of both Combination of both

Current performance measurement Researchers have reached the consensus that the performance metric of choice for assessing a single program’s performance is its execution time Researchers have reached the consensus that the performance metric of choice for assessing a single program’s performance is its execution time For single-threaded programs, execution time is proportional to CPI (Cycles Per Instructions) or inversely proportional to IPC (Instructions Per Cycle) For single-threaded programs, execution time is proportional to CPI (Cycles Per Instructions) or inversely proportional to IPC (Instructions Per Cycle)

Performance for multithreaded programs Only CPI calculation is poor performance metrics. Only CPI calculation is poor performance metrics. It should use total execution time while measuring performance. It should use total execution time while measuring performance.

How should I measure system performance?

System-level performance criteria The criteria for evaluating multiprogram computer systems are based on the user’s perspective and the system’s perspective. The criteria for evaluating multiprogram computer systems are based on the user’s perspective and the system’s perspective. What is User’s perspective? What is User’s perspective? How fast a single program is executed How fast a single program is executed What is system’s perspective? What is system’s perspective? Throughput Throughput

Its Time For Some Terminologies

Terminologies Turnaround time: Quantifies the time between submitting a job and its completion. Turnaround time: Quantifies the time between submitting a job and its completion. Response time: Measures the time between submitting a job and receiving its first response; this metric is important for interactive applications. Response time: Measures the time between submitting a job and receiving its first response; this metric is important for interactive applications. Throughput: quantifies the number of programs completed per unit of time. Throughput: quantifies the number of programs completed per unit of time. at

Continues…. Single-program mode: A single program has exclusive access to the computer system. It has all system resources at its disposal and is never interrupted or preempted during its execution. Single-program mode: A single program has exclusive access to the computer system. It has all system resources at its disposal and is never interrupted or preempted during its execution. Multiprogram mode: Multiple programs are coexecuting on the computer system. Multiprogram mode: Multiple programs are coexecuting on the computer system.

Its Time For Some Mathematics and few more terminologies

Turnaround Time Normalized Turnaround Time(NTT): Normalized Turnaround Time(NTT): Average NTT Average NTT Max NTT Max NTT

System throughput Normalized Progress: Normalized Progress: System Throughput System Throughput

Practical (Why I say practical???) Adjusted ANTT: Adjusted ANTT: Adjusted STP: Adjusted STP:

IPC Throughput (…keep this in mind…): IPC Throughput (…keep this in mind…): Weighted Speedup: Weighted Speedup: Harmonic Average (Hmean): Harmonic Average (Hmean):

Co-executing programs in multiprogram mode experience equal relative progress with respect to single-program mode Fairness: Fairness: Proportional Progress ( for different priorities ): Proportional Progress ( for different priorities ):

So….fairness becomes…

Enough theories…….. How can I apply this in real world performance measurements?

OK…….. Then lets do a case study ….

Case study: Evaluating SMT fetch policies What should be used in performance measurements? What should be used in performance measurements? Researchers should use multiple metrics for characterizing multiprogram system performance. Researchers should use multiple metrics for characterizing multiprogram system performance. Combination of ANTT and STP provides a clear picture of overall system performance as a balance between user-oriented program turnaround time and system-oriented throughput. Combination of ANTT and STP provides a clear picture of overall system performance as a balance between user-oriented program turnaround time and system-oriented throughput. Involves user level single-threaded workloads, does not affect the general applicability of the multiprogram performance metrics. Involves user level single-threaded workloads, does not affect the general applicability of the multiprogram performance metrics. ANTT-STP characterization is applicable to multithreaded and full-system workloads. ANTT-STP characterization is applicable to multithreaded and full-system workloads. Used ANTT and STP metrics to evaluate performance and for multithreaded full-system workloads, used the cycle-count-based equations. Used ANTT and STP metrics to evaluate performance and for multithreaded full-system workloads, used the cycle-count-based equations.

Ooops… I have to introduce few more terminologies !!!

Six SMT fetch policies Icount: Icount: Strive to have an equal # of instructions from all co-executing programs Strive to have an equal # of instructions from all co-executing programs Stall fetch: Stall fetch: Stalls the fetch of a program that experiences a long-latency load until data returns from memory. Stalls the fetch of a program that experiences a long-latency load until data returns from memory. Predictive stall fetch: Predictive stall fetch: Extends the stall fetch policy by predicting long-latency loads in the front-end pipeline Extends the stall fetch policy by predicting long-latency loads in the front-end pipeline MLP-aware stall fetch: MLP-aware stall fetch: Predicts long latency loads and their associated memory-level parallelism Predicts long latency loads and their associated memory-level parallelism Flush: Flush: Flushes on long-latency loads Flushes on long-latency loads MLP-aware flush: MLP-aware flush: Extends the MLP aware stall fetch policy by flushing instructions if more than m instructions have been fetched since the first burst of long-latency loads. Extends the MLP aware stall fetch policy by flushing instructions if more than m instructions have been fetched since the first burst of long-latency loads.

….And this was the last theory …I promise !!!

Simulation environment Software used: SimPoint Software used: SimPoint 36 two program workload 36 two program workload 30 four program workload 30 four program workload Simulation points are shosen for SPEC 2000 benchmarks (200 million instructions each) Simulation points are shosen for SPEC 2000 benchmarks (200 million instructions each) Four-wide superscaler, out-of-order SMT processor with an aggressive hardware data prefetcher with eight stream buffers Four-wide superscaler, out-of-order SMT processor with an aggressive hardware data prefetcher with eight stream buffers

MLPaware flush policy outperforms Icount for both the two- and four-program workloads MLPaware flush policy outperforms Icount for both the two- and four-program workloads That is, it achieves a higher system throughput and a lower average normalized turnaround time, while achieving a comparable fairness level. That is, it achieves a higher system throughput and a lower average normalized turnaround time, while achieving a comparable fairness level.

The same is true when we compare MLP-aware flush against flush for the two-program workloads; for the four-program workloads, MLP-aware flush achieves a much lower normalized turnaround time than flush at a comparable system throughput. The same is true when we compare MLP-aware flush against flush for the two-program workloads; for the four-program workloads, MLP-aware flush achieves a much lower normalized turnaround time than flush at a comparable system throughput. MLP-aware stall fetch achieves a smaller ANTT, whereas predictive stall fetch achieves a higher STP. MLP-aware stall fetch achieves a smaller ANTT, whereas predictive stall fetch achieves a higher STP.

Interesting……. So what are you trying to conclude here???

What does this show? Delicate balance between user-oriented and system- oriented views of performance. Delicate balance between user-oriented and system- oriented views of performance. If user-perceived performance is the primary objective, MLP-aware stall fetch is the better fetch policy. If user-perceived performance is the primary objective, MLP-aware stall fetch is the better fetch policy. If system perceived performance is the primary objective, predictive stall fetch is the policy of choice. If system perceived performance is the primary objective, predictive stall fetch is the policy of choice.

While I was introducing terminologies, IPC throughput, I said ……keep this in mind……. remember?

IPC Throughput as performance measurement is misleading Using IPC throughput as a performance metric, you would conclude that the MLP-aware flush policy is comparable to the flush policy. However, it achieves a significantly higher system throughput (STP). Thus, IPC throughput is a potentially misleading performance metric.

Summary Gives theoretical foundation for measuring system performance Gives theoretical foundation for measuring system performance Don’t judge the system performance for multicore systems merely based on IPC throughput or CPI Don’t judge the system performance for multicore systems merely based on IPC throughput or CPI Use quantitative approach for performance measurements for multicore systems. Few of those are mentioned in this paper Use quantitative approach for performance measurements for multicore systems. Few of those are mentioned in this paper

Questions, Comments, Concerns ???

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Similar presentations

Presentation on theme: "SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Similar presentations

Presentation on theme: "SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout."— Presentation transcript:

Similar presentations

About project

Feedback