Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho.

Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho Sandhya Dwarkadas

ASYNC 2004 - University of Rochester2 Why GALS? Simplified clock distribution network Reduced clock power dissipation Allows modular design of the processor Can run each domain at optimal frequency Can use conventional design and testing methods Fine-grained DVS/DFS

ASYNC 2004 - University of Rochester3 But there is a cost… Inter-domain synchronization can hurt performance Synchronization circuit costs in area and power We have to be careful how we divide the processor

ASYNC 2004 - University of Rochester4 The MCD Microprocessor L2 unified cache L1 data cache LSQ Memory branch predict rename L1 instr. cache fetch IFQ int. register file int. FUs IIQ Integer fp. register file fp. FUs FIQ Floating Pt Main Memory CPU dispatch ROB Frontend

ASYNC 2004 - University of Rochester5 Inter-domain Synchronization Queue design based on Chelcea and Nowick (WVLSI ’00)  Modified for Issue Queue configuration Synchronization circuit based on Nyström and Martin (WCED ’02)  Converted to single-rail logic Timing analysis based on Sjogren and Myers (ARVLSI ’97)  Skip a cycle rather than pause the clock

ASYNC 2004 - University of Rochester6 Synchronization via Queues FIFO QueueIssue Queue

ASYNC 2004 - University of Rochester7 Timing Analysis Source runs with CLK 1, destination with CLK 2 Source writes at edge 1 If T > T s then the data can be used at edge 2 If T < T s then the data can be used at edge 3 25% < T s < 35% T CLK 1 CLK 2 1 23 4

ASYNC 2004 - University of Rochester8 Simulation Methodology Two processor pipelines  Alpha 21264  StrongARM SA-1110 Synchronization penalty was measured against an identical synchronous design 30 benchmarks  MediaBench, Olden, SPEC 2000

ASYNC 2004 - University of Rochester9 Simulation Methodology Simplescalar + Wattch + MCD Independent clock for each domain  Independent jitter for each domain  Next edge based on period, last edge, jitter When source and destination clocks are too close, one cycle penalty is assessed

ASYNC 2004 - University of Rochester10 Synchronization Analysis OoO and superscalar capabilities removed from Alpha

ASYNC 2004 - University of Rochester11 Synchronization Analysis OoO and superscalar capabilities added to StrongARM

ASYNC 2004 - University of Rochester12 What we have learned Synchronization penalty doesn’t mean performance loss Out-of-order execution allows useful work to be performed when instructions are delayed Superscalar design means that synchronization penalties can be “shared” across multiple instructions For Alpha 95% of penalty hidden For StrongARM++ 63% of penalty hidden We have to be careful Cannot have too many domains Careful where you split!

ASYNC 2004 - University of Rochester13 Conclusions GALS is a good idea for real processors  small IPC loss  clock network simplification  reduction in power dissipation  higher frequency  independent domain tuning

Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho.

Similar presentations

Presentation on theme: "Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho.

Similar presentations

Presentation on theme: "Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho."— Presentation transcript:

Similar presentations

About project

Feedback