Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caltech CS184 Winter2005 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.

Similar presentations


Presentation on theme: "Caltech CS184 Winter2005 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements."— Presentation transcript:

1 Caltech CS184 Winter2005 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements

2 Caltech CS184 Winter2005 -- DeHon 2 Last Time Saw various compute blocks To exploit structure in typical designs we need programmable interconnect All reasonable, scalable structures: –small to moderate sized logic blocks –connected via programmable interconnect been saying delay across programmable interconnect is a big factor

3 Caltech CS184 Winter2005 -- DeHon 3 Today Interconnect Design Space Dominance of Interconnect Interconnect Delay Simple things –and why they don’t work

4 Caltech CS184 Winter2005 -- DeHon 4 Dominant Area

5 Caltech CS184 Winter2005 -- DeHon 5 Dominant Time

6 Caltech CS184 Winter2005 -- DeHon 6 Dominant Time

7 Caltech CS184 Winter2005 -- DeHon 7 Dominant Power XC4003A data from Eric Kusse (UCB MS 1997)

8 Caltech CS184 Winter2005 -- DeHon 8 For Spatial Architectures Interconnect dominant –area –power –time …so need to understand in order to optimize architectures

9 Caltech CS184 Winter2005 -- DeHon 9 Interconnect Problem –Thousands of independent (bit) operators producing results true of FPGAs today …true for *LIW, multi-uP, etc. in future –Each taking as inputs the results of other (bit) processing elements –Interconnect is late bound don’t know until after fabrication

10 Caltech CS184 Winter2005 -- DeHon 10 Design Issues Flexibility -- route “anything” –(w/in reason?) Area -- wires, switches Delay -- switches in path, stubs, wire length Power -- switch, wire capacitance Routability -- computational difficulty finding routes

11 Caltech CS184 Winter2005 -- DeHon 11 Delay

12 Caltech CS184 Winter2005 -- DeHon 12 Wiring Delay Delay on wire of length L seg : T seg = T gate + 0.4 RC C = L seg  C sq R = L seg  R sq T seg = T gate + 0.4 C sq  R sq  L seg 2

13 Caltech CS184 Winter2005 -- DeHon 13 Wire Numbers R sq = 0.17  /sq. –from ITRS:Interconnect Conductor effective resistance A/R (aspect ratio) C sq = 7  10 -18 F/sq. R sq  C sq  10 -18 s T gate = 30 ps Chip: 7mm side, 70nm sq. (45nm process) –10 5 squares across chip

14 Caltech CS184 Winter2005 -- DeHon 14 Wiring Delay Wire Delay T seg = T gate + 0.4 C sq  R sq  L seg 2 T seg = 30ps + 0.4 10 -18 s  10 10 T seg = 30ps + 4ns  4ns

15 Caltech CS184 Winter2005 -- DeHon 15 Buffer Wire Buffer every L seg T cross = (L cross /L seg ) T seg T cross = (L cross /L seg ) ( T gate + 0.4 C sq  R sq  L seg 2 ) = (L cross ) ( T gate /L seg + 0.4 C sq  R sq  L seg )

16 Caltech CS184 Winter2005 -- DeHon 16 Opt. Buffer Wire T cross = (L cross ) ( T gate /L seg + 0.4 C sq  R sq  L seg ) Minimize:  Take d(T cross )/d(L seg ) = 0  0 = (L cross ) (- T gate /L seg 2 + 0.4 C sq  R sq )  T gate = 0.4 C sq  R sq L seg 2

17 Caltech CS184 Winter2005 -- DeHon 17 Optimization Point Optimized: T cross = (L cross /L seg ) ( T gate + 0.4 C sq  R sq  L seg 2 ) T gate = 0.4 C sq  R sq L seg 2  Says: equalize gate and wire delay

18 Caltech CS184 Winter2005 -- DeHon 18 Optimal Segment Length T gate = 0.4 C sq  R sq L seg 2 L seg = Sqrt( T gate / 0.4 C sq  R sq ) L seg = Sqrt(30 10 -12 s/0.4 10 -18 s) L seg  Sqrt(10 8 )  10 4 sq.

19 Caltech CS184 Winter2005 -- DeHon 19 Buffered Delay Chip: 7mm side, 70nm sq. (45nm process) –10 5 squares across chip L seg  10 4 sq. 10 segments: –Each of delay 2 T gate –T cross = 20  30ps = 600ps – Compare: 4ns

20 Caltech CS184 Winter2005 -- DeHon 20 Unbuffered Switch R~600  (width ~20) –About 3600 squares? [0.17  /sq. ] C~5  10 -16 F –About 100 squares? Not lumped ~2x worse Together contribute roughly 1200 squares Maybe 8 per rebuffer? –…assumes large switch and no wire…

21 Caltech CS184 Winter2005 -- DeHon 21 Buffered Switch Pay T gate at each switch Slows down relative to –Optimally buffered wire –Unbuffered switch …when placed too often

22 Caltech CS184 Winter2005 -- DeHon 22 Stub Capacitance Every untaken switch touching line C~2.5  10 -16 F –About 50 squares …and lumped so 2 

23 Caltech CS184 Winter2005 -- DeHon 23 Delay through Switching 0.6  m CMOS http://www.cs.caltech.edu/~andre/courses/CS294S97/notes/day14/day14.html How far in GHz clock cycle?

24 Caltech CS184 Winter2005 -- DeHon 24 First Attempts

25 Caltech CS184 Winter2005 -- DeHon 25 (1) Shared Bus Familiar case Use single interconnect resource Reuse in Time Consequence?

26 Caltech CS184 Winter2005 -- DeHon 26 Shared Bus Consider operation: y=Ax 2 +Bx +C –3 mpys –2 adds –~5 values need to be routed from producer to consumer Performance lower bound if have design w/: –m multipliers –u madd units –a adders –i simultaneous interconnection busses

27 Caltech CS184 Winter2005 -- DeHon 27 Resource Bounded Scheduling Scheduling in general NP-hard –(find optimum) –can approximate in O(E) time

28 Caltech CS184 Winter2005 -- DeHon 28 Lower Bound: Critical Path ASAP schedule ignoring resource constraints –(look at length of remaining critical path) Certainly cannot finish any faster than that

29 Caltech CS184 Winter2005 -- DeHon 29 Lower Bound: Resource Capacity Sum up all capacity required per resource Divide by total resource (for type) Lower bound on remaining schedule time –(best can do is pack all use densely)

30 Caltech CS184 Winter2005 -- DeHon 30 Example Critical Path Resource Bound (2 resources) Resource Bound (4 resources)

31 Caltech CS184 Winter2005 -- DeHon 31 Example 2 RB = 8/2=4 LB = 5 best delay= 6

32 Caltech CS184 Winter2005 -- DeHon 32 Shared Bus Consider operation: y=Ax 2 +Bx +C –3 mpys –2 adds –~5 values need to be routed from producer to consumer Performance lower bound if have design w/: –m multipliers –u madd units –a adders –i simultaneous interconnection busses

33 Caltech CS184 Winter2005 -- DeHon 33 Viewpoint Interconnect is a resource Bottleneck for design can be in availability of any resource Lower Bound on Delay: Logical Resource / Physical Resources May be worse –Dependencies (critical path bound) –ability to use resource

34 Caltech CS184 Winter2005 -- DeHon 34 Shared Bus Flexibility (+) –routes everything (given enough time) –can be trick to schedule use optimally Delay (Power) (--) –wire length O(kn) –parasitic stubs: kn+n –series switch: 1 –O(kn) –sequentialize I/B Area (++) –kn switches –O(n)

35 Caltech CS184 Winter2005 -- DeHon 35 Term: Bisection Bandwidth Partition design into two equal size halves Minimize wires (nets) with ends in both halves Number of wires crossing is bisection bandwidth

36 Caltech CS184 Winter2005 -- DeHon 36 (2) Crossbar Avoid bottleneck Every output gets its own interconnect channel

37 Caltech CS184 Winter2005 -- DeHon 37 Crossbar

38 Caltech CS184 Winter2005 -- DeHon 38 Crossbar

39 Caltech CS184 Winter2005 -- DeHon 39 Crossbar Flexibility (++) –routes everything (guaranteed) Delay (Power) (-) –wire length O(kn) –parasitic stubs: kn+n –series switch: 1 –O(kn) Area (-) –Bisection bandwidth n –kn 2 switches –O(n 2 )

40 Caltech CS184 Winter2005 -- DeHon 40 Crossbar Better than exponential Too expensive –Switch Area = k*n 2 *2.5K 2 –Switch Area/LUT = k*n* 2.5K 2 –n=1024, k=4  10M 2 What can we do?

41 Caltech CS184 Winter2005 -- DeHon 41 Avoiding Crossbar Costs Typical architecture trick: –exploit expected problem structure We have freedom in operator placement Designs have spatial locality  place connected components “close” together –don’t need full interconnect?

42 Caltech CS184 Winter2005 -- DeHon 42 Exploit Locality Wires expensive Local interconnect cheap 1D versions What does this do to –Switches? –Delay? (quantify on hmwrk)

43 Caltech CS184 Winter2005 -- DeHon 43 Exploit Locality Wires expensive Local interconnect cheap Use 2D to make more things closer Mesh?

44 Caltech CS184 Winter2005 -- DeHon 44 Mesh Analysis Can we place everything close?

45 Caltech CS184 Winter2005 -- DeHon 45 Mesh “Closeness” Try placing “everything” close

46 Caltech CS184 Winter2005 -- DeHon 46 Mesh Analysis Flexibility - ? –Ok w/ large w Delay (Power) –Series switches 1--  n –Wire length w--w  n –Stubs O(w)--O(w  n) Area –Bisection BW -- w  n –Switches -- O(nw) –O(w 2 n) –larger on homework

47 Caltech CS184 Winter2005 -- DeHon 47 Mesh Plausible …but What’s w …and how does it grow?

48 Caltech CS184 Winter2005 -- DeHon 48 Big Ideas [MSB Ideas] Interconnect Dominant –power, delay, area Can be bottleneck for designs Can’t afford full crossbar Need to exploit locality Can’t have everything close


Download ppt "Caltech CS184 Winter2005 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements."

Similar presentations


Ads by Google