Download presentation
Presentation is loading. Please wait.
1
Performance Analysis in Out-of-Order Cores
Lecturers: Lihu Rappoport and Adi Yoaz Based on Foils by Ahmad Yasin
2
Bottleneck Analysis Goal: analyze the HW causes for performance bottlenecks of a given SW running on a processor Method: classify uop slots at each allocation/rename cycle This is the boundary between the Core’s frontend (which supplies instructions) and the Core’s backend (which consumes instructions) 4 uops ID Uop cache IDQ EUs RS LB SB ROB DCU I$ IDQ Empty Alloc Stall
3
Allocation Stall Allocation is done all-or-none
if in a given cycle there is a allocation stall (e.g., due to RS full) Allocation is stalled completely If the frontend supplies n≤4 uops in a given cycle, and the backend does not have room for all n uops e.g. RS has fewer than n free entries allocation is stalled, and no uop is allocated 4 uops ID Uop cache IDQ EUs RS LB SB ROB DCU I$ Alloc Stall
4
Allocation Slot Classification
Backend bound: for a cycle with a back-end stall (allocation stall) All 4 allocation slots are backend bound Frontend Bound: in a cycle without a back-end stall If the frontend provides n<4, this cycle is bound by fronted supply: it has (4 – n) frontend bound allocation slosts Retiring: the number (≤4) of uops in the cycle which eventually retire Bad speculation: the number (≤4) of uops in the cycle which eventually do not retire (due to some bad speculation, e.g., jump misprediction) Uop Allocated? Uop ever Retired? Retiring Bad Speculation Back-end stall? Backend Bound Frontend Bound Yes No Yes No Yes No
5
Per Cycle Classification (1)
Each column represents an allocation cycle No Allocation stall No uop is supplied by frontend All 4 slots count as frontend bound Cycle 1 2 3 4 ... x Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation
6
Per Cycle Classification (2)
Each column represents an allocation cycle No Allocation stall 2 uop are supplied by frontend 2 slots are frontend bound, and 2 slots are retiring Cycle 1 2 3 4 ... x Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation
7
Per Cycle Classification (3)
Each column represents an allocation cycle Allocation stall 4 slots are backend bound Cycle 1 2 3 4 ... x Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation
8
Per Cycle Classification (4)
Each column represents an allocation cycle No Allocation stall 4 uop are supplied by frontend 1 slot retires, and 2 slots are flushed due to bad speculation Cycle 1 2 3 4 ... x Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation
9
Per Cycle Classification (5)
Each column represents an allocation cycle No Allocation stall 3 uop are supplied by frontend 1 slots frontend bound, 2 slot retire, and 1 slot bad speculation Cycle 1 2 3 4 5 Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation
10
Bottleneck Summary Each column represents an allocation cycle
IPC = 5 uops / 5 cycles = 1 Cycle 1 2 3 4 5 Back-end Stall Alloc Slot 0 - v Alloc Slot 1 Alloc Slot 2 Alloc Slot 3 Frontend Bound Backend Bound Retiring 2 Bad Speculation 7 7 / 20 =35% 4 4 / 20 =20% 5 5 / 20 =25%
11
The Hierarchy Frontend Bound Bad Speculation Retiring Backend Bound
CPU Bound Analyze Frontend Bound Frontend Latency iTLB Miss iCache Miss Branch Resteers DSB switches MS Switches LCP Bandwidth MITE DSB LSD Bad Speculation Branch Mispred Machine Clears Retiring BASE FP-arith X87 Scalar Vector Other Microcode Sequencer Backend Bound Core Bound Divider Ports Utilization 3+ ports 2 ports 1 port 0 ports Memory Bound Stores Bound Store Miss False Sharing dTLB Store L1 Bound Store fwd blk dTLB Load L2 Bound L3 Bound Contested Access Data Sharing L3 Latency Ext. Memory Bound MEM Bandwidth MEM Latency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.