Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for Reducing Read Latency of Core Bus Wrappers

Similar presentations


Presentation on theme: "Techniques for Reducing Read Latency of Core Bus Wrappers"— Presentation transcript:

1 Techniques for Reducing Read Latency of Core Bus Wrappers
Roman L. Lysecky, Frank Vahid, & Tony D. Givargis Department of Computer Science University of California Riverside, CA 92521 {rlysecky, vahid, This work was supported in part by the NSF and a DAC scholarship. Roman Lysecky University of California, Riverside

2 University of California, Riverside
Introduction Core-based designs are becoming common available as both soft and hard Problem - How can interfacing be simplified to ease integration? Core Library MIPS MEM Cache DSP DMA Core X Core Y Roman Lysecky University of California, Riverside

3 University of California, Riverside
Introduction One Solution - One standard on-chip bus All cores have same interface Appears to be unlikely (VSIA) Another Solution - Divide core into a bus wrapper and internal parts Rowson and Sangiovanni-Vincentelli ‘97 - Interface-Based Design VSIA developing standard for interface between wrapper and internals Far simpler than standard on-chip bus Refer to bus wrapper as an interface module(IM) Roman Lysecky University of California, Riverside

4 Previous Work - Pre-fetching
Analogous to caching, store local copies of registers inside the interface module Enable quick response time Eliminates extra cycles for register reads Transparent to system bus and core internals Easily integrate with different busses No performance overhead Acceptable increases in size and power Pre-fetching was manually added to each core Roman Lysecky University of California, Riverside

5 Previous Work - Architecture of IM
Controller - Interfaces to system bus pre-fetch registers Pre-fetch Unit - Implements the pre-fetching heuristic Goal: maximize the number of hits How can we automate the design of the PFU? Roman Lysecky University of California, Riverside

6 University of California, Riverside
Outline “Real-time” Pre-fetching Mapping to real-time scheduling Update Dependency Model General Register Attributes Petri Net model construction Petri Net model refinement Pre-fetch Scheduling Experiments Conclusions Roman Lysecky University of California, Riverside

7 Real-time Pre-fetching
Age constraint Number of cycles old data may be when read Access-time constraint Maximum number of cycles a read access may take Naïve Schedule More Efficient Schedule A - Age Constraint = 4 B - Age Constraint = 6 Access-time Constraint = 2 Roman Lysecky University of California, Riverside

8 Real-time Pre-fetching
Mapping to Real-time scheduling Register -> Process Internal bus -> Processor Pre-fetch -> Process execution Register age constraint -> Process period Register Access-time constraint -> Process deadline Pre-fetch time -> Process computation time Assume a pre-fetch requires 2 cycles Roman Lysecky University of California, Riverside

9 Real-time Pre-fetching
Cyclic Executive Major cycle = time required to pre-fetch all registers Minor cycle = rate at which highest priority process will be executed Problems Sporadic writes All process periods must be multiples of the minor cycle Computationally infeasible for large register sets Roman Lysecky University of California, Riverside

10 Real-time Pre-fetching
Rate monotonic priority assignment Register with smallest register age constraint will have the highest priority Roman Lysecky University of California, Riverside

11 Real-time Pre-fetching
Utilization-based schedulability test Ci = Computation Time for register i Ai = Pre-fetch Time for register i Roman Lysecky University of California, Riverside

12 Real-time Pre-fetching
Response Time Analysis Response of register I is defined as follows Register set is schedulable if for each register the response time is less than or equal to its age constraint Ri = Response Time for register i Ci = Computation Time for register i Ii = Maximum interference in interval [t, t+Ri) Roman Lysecky University of California, Riverside

13 Real-time Pre-fetching
Sporadic register writes Writes to registers are sporadic Take control of internal bus, thus delaying pre-fetching of registers Deadline monotonic priority Register with smallest register access-time constraint will have the highest priority Add a write register WR to register set Access-time constraint = Deadline Age constraint = maximum rate at which write will occur Roman Lysecky University of California, Riverside

14 Experiments - Area(Gates)
Average increase of IM w/ RTPF over IM w/ BW of 1.4K gates Note: To better evaluate the effects of IM’s, our cores were kept simple, thus resulting in a smaller than normal size. Roman Lysecky University of California, Riverside

15 Experiments - Performance(ns)
Average increase in performance of IM w/ RTPF over IM w/ BW of 11% Roman Lysecky University of California, Riverside

16 Experiments - Energy(nJ)
Average increase in energy of IM w/ RTPF over IM w/ BW of 10% Roman Lysecky University of California, Riverside

17 University of California, Riverside
Register Attributes Register Attributes Update type, access type, notification type, and structure type Update dependencies Internal dependencies dependencies between registers External dependencies updates to register via reads and writes from on-chip bus updates from external ports to internal core register Petri Nets Determined that we could use Petri Nets to model our update dependencies Roman Lysecky University of California, Riverside

18 Petri Net Based Dependency Model
Bus Place Register Places Update Dependencies Random Transition Roman Lysecky University of California, Riverside

19 Refined Petri Net Model
Transition Data Dependency Roman Lysecky University of California, Riverside

20 University of California, Riverside
Pre-fetch Schedule Create a heap registers to be pre-fetched Create a list for update arcs Repeat if request detected then add outgoing arcs to heap set write register access-time to 0 and add to heap if read request detected then add outgoing arcs to update arc list for register at top of heap do if access-time = 0 then pre-fetch register, remove from heap if current age = 0 then pre-fetch register, reset current age, add register to heap while update arcs list is not empty do if transition fires then set register’s access-time to 0 and add to heap Roman Lysecky University of California, Riverside

21 Experiments - Area(Gates)
Average increase of IM w/ PF over IM w/ BW of 1.5K gates Average increase of IM w/ PF over IM w/ RTPF of .1K gates Note: To better evaluate the effects of IM’s, our cores were kept simple, thus resulting in a smaller than normal size. Roman Lysecky University of California, Riverside

22 Experiments - Performance(ns)
Average increase in performance of IM w/ PF over IM w/ BW of 26% Average increase in performance of IM w/ RTPF over IM w/ BW of 16% Roman Lysecky University of California, Riverside

23 Experiments - Energy(nJ)
Average decrease in energy of IM w/ PF over IM w/ BW of 11% Average decrease in energy of IM w/ PF over IM w/ RTPF of 20% Roman Lysecky University of California, Riverside

24 University of California, Riverside
Conclusions Real-time pre-fetching and update dependency pre-fetching produce good results Update dependency model is more efficient in pre-fetching registers Two approaches are complementary Enable the automatic generation of pre-fetching unit Roman Lysecky University of California, Riverside


Download ppt "Techniques for Reducing Read Latency of Core Bus Wrappers"

Similar presentations


Ads by Google