Fetch Directed Prefetching - a Study CS 752 Project Gokul Nadathur Nitin Bahadur Sambavi Muthukrishnan Gokul, Nitin, Sambavi
Motivation Execution engine limited by fetch bandwidth effect of memory latency on fetch correlation between i-cache stalls and branch predictor rate at which branch predictor and BTB can be cycled With increase in ILP, there is a need to increase fetch performance Gokul, Nitin, Sambavi
Fetch Directed Architecture Prefetch Buffer Branch Predictor Prefetch Filtration Mechanism L2 Cache Instruction Fetch Fetch Target Queue Prefetch Instruction Queue Fetch Target Buffer Gokul, Nitin, Sambavi
Decoupled Branch Predictor has its own PC runs independent of fetch pipeline stage makes a prediction each cycle unaffected by i-cache stalls Problem!!! May not have updated branch history Gokul, Nitin, Sambavi
Fetch Target Buffer and Fetch Target Queue Stores fall through and target address for taken branches Accessed with a prediction from branch predictor each cycle Fills in single/multiple cache line blocks into FTQ Fetch Target Queue Contains blocks of instruction addresses to be next executed FTQ entries are dequeued by fetch engine Gokul, Nitin, Sambavi
Prefetch Filter and Prefetch Instruction Queue Contains queue of cache blocks to be prefetched Prefetch mechanism dequeues PIQ and performs the prefetching Prefetch Filter Takes entries from FTQ, filters them and inserts them into PIQ Enables intelligent prefetching ! Gokul, Nitin, Sambavi
Stream Buffers L1 I-cache L2 I-cache Stream buffer FIFO Tag and comparator Cache block Tag FIFO Head Tail Gokul, Nitin, Sambavi
Prefetching in the Fetch Directed Architecture Similar to stream buffers Addresses given by PIQ Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Simulation Results Gokul, Nitin, Sambavi
Conclusions Prefetching definitely helps Fetch directed architecture aids prefetching Optimal results require sophisticated memory hierarchy Gokul, Nitin, Sambavi