1 Billion Transistor Architectures Interconnect design for low power – Naveen & Karthik Computational unit design for low temperature – Karthik Increased reliability and power-efficiency – Niti Hardware for raytracing and OS co-processing – led by Pete and Erik/Dave Interconnect design for high performance
2 Partitioned Architectures Instr Fetch L1 D Cache
3 Interconnect Design Delay Optimized Bandwidth Optimized Power Optimized Power and B/W Optimized
4 Tuning Wire Properties Wire delay sqrt(RC) (Ho, Mai, Horowitz, Proc. of IEEE, 2001) R wire = / (thickness – barrier) (width – 2 barrier) C wire = 2 K horiz thickness/spacing + 2 vert width/layerspacing + fringe( horiz, vert ) Wide wires reduced resistance, slightly higher capacitance Wide spacing reduced capacitance Example (Banerjee et al., IEEE Trans on Electronic Devices, Feb 2004): Factor of 8 increase in width and spacing R L = R B, C L = 0.74 C B, Delay L = 0.43 Delay B
5 Transmission Lines Test chips have demonstrated the potential of transmission lines: 3/4 th the latency of an equally wide RC wire (at 0.18 High associated costs: transmitter/receiver circuits, high width, thickness, vertical and horizontal spacing, power and ground reference planes and shielding lines
6 Latency-Bandwidth Trade-Off Bottomline: low latency wires are possible, but the area and associated costs are high High area cost few wires can be accommodated useful only for low-bandwidth communication Problem: microarchitectural applications of 3 sets of wires B-Wires: high-bandwidth, high-latency, 64-wide L-Wires: low-bandwidth, low-latency, 8-wide PW-Wires: high-bandwidth, high-latency, low-power, 128-wide
7 Interconnect Design Delay Optimized Bandwidth Optimized Power Optimized Power and B/W Optimized
8 Hybrid Interconnects Each link on the network consists of a combination of B, L, and PW-Wires Instr Fetch L1 D Cache
9 L1 Cache Pipeline L1 D Cache LSQLSQ Eff. Address Transfer 10c Mem. Dep Resolution 5c Cache Access 5c Data return at 20c
10 Exploiting L-Wires L1 D Cache LSQLSQ Eff. Address Transfer 10c Partial Mem. Dep Resolution 3c Cache Access 5c 8-bit Transfer 5c Data return at 14c
11 Exploiting Choice Narrow bit-width operands (integers < 256) and narrow control signals (branch mispredicts) can also use L-Wires High-bandwidth power-efficient PW-Wires can transmit non-critical or bursty traffic L-Wires can improve performance by 10%
12 Results Summary ConfigurationMetal Area IPCRelative Dyn Energy Relative Leakage Energy Relative Energy- Delay Comments 64 B Hi-perf 128 PW PW, 8 L Low EDP 128 B B, 128 PW PW, 8 L Low EDP 64 B, 8 L Hi-Perf 192 B B, 8 L Hi-Perf 64 B, 128 PW, 8L Low EDP
13 Title Bullet