Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)

Similar presentations


Presentation on theme: "Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)"— Presentation transcript:

1 Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)

2 News..

3 Moral of the story… 3D technology helps in reducing wire delays –Exploit it in as many ways as you can! –They chose L2 caches Also, 3D leads to on-chip hotspots. –Arrange units intelligently, reduce localized hotspots.

4 Major Results/Contributions First 3D CMP design space exploration Proposal of 3D NUCA L2 caches for CMP’s. –Comparison with the existing 2D counterparts. –3D works better even without data migration Proposal of NoC’s as a method of communication between L2 banks. –“Efficiently exploit fast vertical interconnects”

5 Basics… Typical Network-on-Chip architectureMajor types of integration

6 Proposed : 3D Network-in-Mem L2 Cache bank / or CPU Pillar node Processing Element (Cache Bank or CPU) NIC R b bits Single-Stage Router Processing Element (Cache Bank or CPU) NIC R b bits I n p u t B u f f e r O u t p u t B u f f e r dTDMA Bus NoC /Bus Interface b-bit dTDMA Bus (Communication Pillar) orthogonal to slide Single-Stage Router I n p u t B u f f e r O u t p u t B u f f e r dTDMA Bus NoC/Bus Interface b-bit dTDMA Bus (Communication Pillar) orthogonal to slide Router Communication Pillar dTDMA Bus (Dynamic Time-Division Multiple Access)

7 The dTDMA Bus as the Communication Pillar 1500 um 10~100 um Use dTDMA bus (VLSID 2006) V efficient/fast bus V small area/power overhead l a y e r s Router dTDMA Bus Arbiter Do not use multi-hop for vertical communication x vertical distance is so small

8 Proposals (1) Inter-die “communication pillars” Integration of dTDMA buses and NoC routers for a fast communication interface – typical NoC fails due to increased complexity contention issues increased power/area overhead multi-hop vertical comm.

9 3D Benefit: Increased Locality CPU Nodes within 1 hop Nodes within 2 hops Nodes within 3 hops dTDMA pillar 2D vicinity 3D vicinity

10 Proposals (2) Cannot increase # of pillars arbitrarily –Depends on via density –Router complexity So, CPU’s share pillars –Stacking of CPU’s also has to be considered CPU placement algorithm –Stack CPU’s across dies so as to Maintain decent access hop-count Manage thermal profile

11 CPU placement example This way, not stacking CPU’s on top of one another, helps to solve localized hotspot problem

12

13 3D L2 Caches Clusters – Cache banks + tag array –Some clusters have CPU’s, others don’t. Cache Management Search Placement & Replacement Cache Line Migration

14 L2 Cache Management

15 Simulation Environment Simics + in-house NoC simulator All CPU’s issue in-order –8 CPU’s, SPARC ISA –Directory based protocol for coherence between L1’s and the L2 HS3d for temperature modeling 64MB and 32 MB L2 caches

16 Performance

17 Important Results

18 Important Results (2) Impact of # of “pillars” on access latency

19 Important Results (3)

20 Final Word 3D is feasible & scalable… and has arrived. Localized hotspots can be solved by placing hotter units apart. Power savings + performance gain even without data migration –No numbers to support the claim(!) –Would that help the temperature issue as well?

21 Potential HPCA Submission An evaluation of temperature and IPC for a single core 3D processor Leverage clustered architectures for “temperature aware” processor designs. –Basic premise : Stacking cooler units (caches) on top of hotter units Better thermal profile of processor

22 Proposals Arch 1 Arch 2 Arch 3 Cache bank Cache bank Cluster

23 Proposals (2) Cache banks (both data and instruction) are –2 way word-interleaved, or, –Replicated Present study done for 8-cluster architecture

24 Results (Performance) 2-way word interleaved caches

25 Results (Performance) Replicated caches

26 Traffic Analysis

27 Traffic Analysis (2)

28 Results (Thermal)


Download ppt "Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)"

Similar presentations


Ads by Google