1 Billion Transistor Architectures Interconnect design for low power – Naveen & Karthik Computational unit design for low temperature – Karthik Increased.

Slides:



Advertisements
Similar presentations
Chapter Three: Interconnection Structure
Advertisements

Dawei Huang, IEEE Journal of Selected Topics in Quantum Electronics, March/April 2003 Optical Interconnects: Out of the Box Forever? Jeong-Min Lee
CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
University of Utah1 Interconnect-Aware Coherence Protocols for Chip Multiprocessors Liqun Cheng Naveen Muralimanohar Karthik Ramani Rajeev Balasubramonian.
PH4705 ET4305 Interface Standards A number of standard digital data interfaces are used in measurement systems to connect instruments and computers for.
Access Region Locality for High- Bandwidth Processor Memory System Design Sangyeun Cho Samsung/U of Minnesota Pen-Chung Yew U of Minnesota Gyungho Lee.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Fall 06, Sep 14 ELEC / Lecture 5 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits (Formerly ELEC / )
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
1 Modeling and Optimization of VLSI Interconnect Lecture 1: Introduction Avinoam Kolodny Konstantin Moiseev.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
CS 7810 Lecture 17 Managing Wire Delay in Large CMP Caches B. Beckmann and D. Wood Proceedings of MICRO-37 December 2004.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
CSCI-235 Micro-Computer in Science The Network. © Prentice-Hall, Inc Communications  Communication is the process of sending and receiving messages 
Computer Architecture Challenges Shriniwas Gadage.
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
Device Physics – Transistor Integrated Circuit
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Networking Components DAVID INGUANZO 7/10/14. HUB ~$20 ($20 - $1,000+) Best for: home networks light traffic business environment connecting multiple.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Interconnect-Aware Coherence Protocols for Chip Multiprocessors Ozan Akar CMPE 511 Fall 2006.
1 Wire Aware Architecture Naveen Muralimanohar Advisor – Rajeev Balasubramonian University of Utah.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.
INTERCONNECT MODELING M.Arvind 2nd M.E Microelectronics
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Basics of Energy & Power Dissipation
S A N T A C L A R A U N I V E R S I T Y Center for Nanostructures September 25, 2003 On-Chip Interconnects in Sub-100nm Circuits Sang-Pil Sim Sunil Yu.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
By Nasir Mahmood.  The NoC solution brings a networking method to on-chip communication.
CSCI-235 Micro-Computer Applications The Network.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
Inductance Screening and Inductance Matrix Sparsification 1.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
University of Utah 1 Interconnect Design Considerations for Large NUCA Caches Naveen Muralimanohar Rajeev Balasubramonian.
The Interconnect Delay Bottleneck.
Lecture 18: Core Design, Parallel Algos
SECTIONS 1-7 By Astha Chawla
CS161 – Design and Architecture of Computer Systems
Architecture & Organization 1
Architecture & Organization 1
Inductance Screening and Inductance Matrix Sparsification
A Case for Interconnect-Aware Architectures
Presentation transcript:

1 Billion Transistor Architectures Interconnect design for low power – Naveen & Karthik Computational unit design for low temperature – Karthik Increased reliability and power-efficiency – Niti Hardware for raytracing and OS co-processing – led by Pete and Erik/Dave Interconnect design for high performance

2 Partitioned Architectures Instr Fetch L1 D Cache

3 Interconnect Design Delay Optimized Bandwidth Optimized Power Optimized Power and B/W Optimized

4 Tuning Wire Properties Wire delay  sqrt(RC) (Ho, Mai, Horowitz, Proc. of IEEE, 2001) R wire =  / (thickness – barrier) (width – 2 barrier) C wire = 2 K  horiz thickness/spacing + 2  vert width/layerspacing + fringe(  horiz,  vert ) Wide wires  reduced resistance, slightly higher capacitance Wide spacing  reduced capacitance Example (Banerjee et al., IEEE Trans on Electronic Devices, Feb 2004): Factor of 8 increase in width and spacing  R L = R B, C L = 0.74 C B, Delay L = 0.43 Delay B

5 Transmission Lines Test chips have demonstrated the potential of transmission lines: 3/4 th the latency of an equally wide RC wire (at 0.18  High associated costs: transmitter/receiver circuits, high width, thickness, vertical and horizontal spacing, power and ground reference planes and shielding lines

6 Latency-Bandwidth Trade-Off Bottomline: low latency wires are possible, but the area and associated costs are high High area cost  few wires can be accommodated  useful only for low-bandwidth communication Problem: microarchitectural applications of 3 sets of wires  B-Wires: high-bandwidth, high-latency, 64-wide  L-Wires: low-bandwidth, low-latency, 8-wide  PW-Wires: high-bandwidth, high-latency, low-power, 128-wide

7 Interconnect Design Delay Optimized Bandwidth Optimized Power Optimized Power and B/W Optimized

8 Hybrid Interconnects Each link on the network consists of a combination of B, L, and PW-Wires Instr Fetch L1 D Cache

9 L1 Cache Pipeline L1 D Cache LSQLSQ Eff. Address Transfer 10c Mem. Dep Resolution 5c Cache Access 5c Data return at 20c

10 Exploiting L-Wires L1 D Cache LSQLSQ Eff. Address Transfer 10c Partial Mem. Dep Resolution 3c Cache Access 5c 8-bit Transfer 5c Data return at 14c

11 Exploiting Choice Narrow bit-width operands (integers < 256) and narrow control signals (branch mispredicts) can also use L-Wires High-bandwidth power-efficient PW-Wires can transmit non-critical or bursty traffic L-Wires can improve performance by 10%

12 Results Summary ConfigurationMetal Area IPCRelative Dyn Energy Relative Leakage Energy Relative Energy- Delay Comments 64 B Hi-perf 128 PW PW, 8 L Low EDP 128 B B, 128 PW PW, 8 L Low EDP 64 B, 8 L Hi-Perf 192 B B, 8 L Hi-Perf 64 B, 128 PW, 8L Low EDP

13 Title Bullet