Download presentation
Presentation is loading. Please wait.
1
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah
2
2 Device Layer 2 Vertical Interconnect Silicon 1 Multiple layers of active devices Vertical interconnects between layers Device Layer Silicon 1 Courtesy: K.Bernstein, IBM 2D Chip 3D Chip Layer 1 Layer 2 3D Technologies Very Small ~ 10µm
3
3 Benefits of 3D Reduction of global interconnect L L Delay/Power reduction Bandwidth Mix-technology integration
4
4 Previous Proposals Previously in 3D… –Break and stack (Folding) [Puttaswamy et al] Vertical stacking of active devices RegFile Break and Stack All are active HEAT!!! Reduced Intra- block latency
5
5 An alternative approach? 2D Chip 3D Chip Die 1 Die 0 Prudent Stacking Can: Improve Performance Result in better thermal profile
6
6 Wire Delays and Performance
7
7 Clustered Architectures Centralized front-end – I-Cache & D-Cache – LSQ, Rename, Decode – Branch Predictor Clustered back-end –Issue Queue –Regfile, FUs L1 D Cache Cluster Crossbar/Router Front- End Higher clock Frequency, High ILP!!
8
8 Decentralized Cache Banks L1 D Cache L1 D Cache L1 D Cache Possibly better performance
9
9 Decentralized Cache Banks L1 D Cache Replicated Cache Banks L1 D Cache L1 D Cache
10
10 Decentralized Cache Banks L1 D Cache Word Interleaved Cache Banks L1 D Cache Odd WordsEven Words
11
11 Outline Introduction –Motivation –3D Architectures –Clustered Architectures Proposals Results Conclusions
12
12 Architecture 1 Cache-on-cluster Die 1 Die 0 Cache Bank Cluster Inter Die Interconnect Intra Die Interconnect
13
13 Architecture 2 Cluster-on-cluster Die 1 Die 0 Cache Bank Cluster Inter Die Interconnect Intra Die Interconnect
14
14 Architecture 3 Staggered Die 1 Die 0 Cache Bank Cluster Inter Die Interconnect Intra Die Interconnect
15
15 Outline Introduction –Motivation –3D Architectures –Clustered Architectures Proposals Results Conclusions
16
16 Experimental Setup Framework –Simplescalar, Wattch and Hotspot 3.0 –Wire model : 8x global metal plane Benchmarks –SPEC 2K, single threaded Processor Configuration –8 Clusters –64 kB L1 I/D Caches, 2 way set-assoc L1 Data cache Word-Interleaved or Replicated 2D Centralized Cache – Base Case
17
17 Base Case Performances Best Case 2D Config
18
18 The 3D Effect 3D Replicated vs 2D Centralized
19
19 The 3D Effect 3D WI vs 2D Centralized
20
20 Comparisons 3D Replicated3D WI Best Case 3D - RepBest Case 3D - WI 12% Improvement for best case 3D vs best case 2D Best Case 2D 2D Case
21
21 Thermal Analysis Wattch for power numbers HotSpot 3.0 for thermal model (grid) –500x500 grid resolution Interconnect power modeling –Attributed to functional units –8X plane wires –Router + Crossbar modeled as separate entity
22
22 Thermal Profiles Peak Temperature : Hottest on-chip Unit (Celsius)
23
23 Outline Introduction –Motivation –3D Architectures –Clustered Architectures Proposals Results Conclusions
24
24 Conclusions Wire delays are critical to performance –Some are more important than others. Prudent block stacking –Performance improvement upto 12% over 2D WI banks + Arch 3 (3D) –Better thermal profiles compared to folding
25
25 Backup Slides
26
26 Cluster (a) Arch-1 (cache-on-cluster)(b) Arch-2 (cluster on cluster)(c) Arch-3 (staggered) Cache bankIntra-die horizontal wireInter-die vertical wire Die 1 Die 0 4 Cluster Arrangements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.