Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.

Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

2 Background and Problem Core 0Core 1Core 2Core N Shared Cache Memory Controller DRAM Bank 0 DRAM Bank 1 DRAM Bank 2... DRAM Bank K... Shared Memory Resources Chip Boundary On-chip Off-chip 2 Core 0 Prefetcher Core N Prefetcher...

Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques 3

Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources 4

Background and Problem Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers - Network Fair Queuing (Nesbit et. al. MICRO06) - Parallelism Aware Batch Scheduling (Mutlu et. al. ISCA08) Fair management of on-chip interconnect Fair management of multiple shared resources - Fairness via Source Throttling (Ebrahimi et. al., ASPLOS10) 5

Background and Problem 6 Fair memory scheduling technique: Network Fair Queuing (NFQ) Improves fairness and performance with no prefetching Significant degradation of performance and fairness in the presence of prefetching No Prefetching Aggressive Stream Prefetching

Background and Problem Understanding the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques 7

Background and Problem Prior work addresses inter-application interference caused by prefetches Hierarchical Prefetcher Aggressiveness Control (Ebrahimi et. al., MICRO09) Dynamically detects interference caused by prefetches and throttles down overly aggressive prefetchers Even with controlled prefetching, fairness techniques should be made prefetch-aware 8

Outline Problem Statement Motivation for Special Treatment of Prefetches Prefetch-Aware Shared Resource Management Evaluation Conclusion 9

Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA08] Principle 1: Parallelism-awareness Schedules requests from each thread to different banks back to back Preserves each threads bank parallelism Principle 2: Request Batching Marks a fixed number of oldest requests from each thread to form a batch Eliminates starvation & provides fairness 10 Bank 0Bank 1 T1 T0 T2 T3 T2 Batch T0 T1

Impact of Prefetching on Parallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands alike when generating a batch Policy (b): Prefetches are not included alongside demands when generating a batch 11

Impact of Prefetching on Parallelism-Aware Batch Scheduling 12 Bank 1Bank 2 Bank 1Bank 2 Policy (a) Mark Prefetches in PAR-BS Policy (b) Dont Mark Prefetches in PAR-BS P1 D1 D2 P2 P1 D2 P2 Service Order P1 D1 D2 P2 P1 D2 P2 DRAM Bank 1 Bank 2 Core 1 Core 2 P1 D1 D2 P2 P1 D2 P2 Comput e Hit P2 Service Order Bank 1 Bank 2 Core 1 Core 2 P1 D1 D2 P2 P1 D2 P2 Comput e Miss P1 D1 D2 P2 P1 D2 P2 Saved Cycles Saved Cycles Accurate Prefetch Inaccurate Prefetch Accurate Prefetches Too Late Sta ll CC C C Batch

Impact of Prefetching on Parallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands alike when generating a batch Pros: Accurate prefetches will be more timely Cons: Inaccurate prefetches from one thread can unfairly delay demands and accurate prefetches of others Policy (b): Prefetches are not included alongside demands when generating a batch Pros: Inaccurate prefetches can not unfairly delay demands of other cores Cons: Accurate prefetches will be less timely - Less performance benefit from prefetching 13

Prefetch-Aware Shared Resource Management Three key ideas: Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy Fairness via source-throttling technique: Coordinate core and prefetcher throttling decisions Demand boosting for memory non-intensive applications 15

Batch Prefetch-aware PARBS (P-PARBS) 17 Bank 1Bank 2 P1 D1 D2 P2 P1 D2 P2 Service Order DRAM Bank 1 Bank 2 Core 1 Core 2 P1 D1 D2 P2 P1 D2 P2 Comput e Hit P2 Accurate Prefetch Inaccurate Prefetch Sta ll CC Policy (a) Mark Prefetches in PAR-BS

Batch Prefetch-aware PARBS (P-PARBS) 18 Bank 1Bank 2 Policy (b) Dont Mark Prefetches in PAR-BS P1 D1 D2 P2 P1 D2 P2 Service Order Bank 1 Bank 2 Core 1 Core 2 P1 D1 D2 P2 P1 D2 P2 Comput e Miss D2 Saved Cycles Sta ll C C Bank 1Bank 2 Our Policy: Mark Accurate Prefetches P1 D1 D2 P2 P1 D2 P2 Service Order DRAM Bank 1 Bank 2 Core 1 Core 2 P1 D1 D2 P2 P1 D2 P2 Comput e Hit P2 Accurate Prefetch Inaccurate Prefetch Sta ll CC Batch Accurate Prefetches Too Late Underlying prioritization policies need to distinguish between prefetches based on accuracy

Bank 1Bank 2 Serviced First Serviced Last Service Order No Demand BoostingWith Demand Boosting Core1 Dem Core2 Dem Legend: Core2 Pref Core 1 is memory non- intensive Core 2 is memory intensive Core1 Dem Core2 Dem Legend: Core2 Pref Core 1 is memory non- intensive Core 2 is memory intensive Bank 1Bank 2 Demand boosting eliminates starvation of memory non-intensive applications

Evaluation Methodology x86 cycle accurate simulator Baseline processor configuration Per-core - 4-wide issue, out-of-order, 256 entry ROB Shared (4-core system) - 128 MSHRs - 2MB, 16-way L2 cache Main Memory - DDR3 1333 MHz - Latency of 15ns per command (tRP, tRCD, CL) - 8B wide core to memory bus 23

System Performance Results 24 11% 10.9% 11.3%

Max Slowdown Results 25 9.9% 18.4% 14.5%

Conclusion State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetching Their underlying prioritization techniques need to be extended to differentiate prefetches based on accuracy Core and prefetcher throttling should be coordinated with source-based resource management techniques Demand boosting eliminates starvation of memory non-intensive applications Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10% 26

Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.

Similar presentations

Presentation on theme: "Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.

Similar presentations

Presentation on theme: "Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The."— Presentation transcript:

Similar presentations

About project

Feedback