University of Rostock Institute of Applied Microelectronics and Computer Engineering Monitoring and Control of Temperature in Networks-on- Chip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24
2 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary
3 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 1. Introduction Increasing integration density → rising complexity, shrinking device sizes NoCs able to deal with arising requirements (e.g. for communication) But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems Increasing integration density → rising complexity, shrinking device sizes NoCs able to deal with arising requirements (e.g. for communication) But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems Impacts of technological development
4 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary
5 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 2. Networks-on-Chip Infrastructure for on-chip interconnection Point-to-point links replace long global busses Parallel packet-based communication Separation of communication & computation Globally asynchronous locally synchronous (GALS) Modularity of IP cores (not part of actual NoC) reusability, high abstraction level Infrastructure for on-chip interconnection Point-to-point links replace long global busses Parallel packet-based communication Separation of communication & computation Globally asynchronous locally synchronous (GALS) Modularity of IP cores (not part of actual NoC) reusability, high abstraction level Properties NoCs are able to satisfy requirements of modern VLSI systems
6 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary
7 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 3. Impact of Temperature on Reliability Increasing integration densities, progress of nanotechnology Growing number of transistors per chip = raised probability of failure decreasing structural size of ICs = higher susceptibility to environmental influences & deterioration Increasing integration densities, progress of nanotechnology Growing number of transistors per chip = raised probability of failure decreasing structural size of ICs = higher susceptibility to environmental influences & deterioration Impacts of technological progress Intel 8086 (1978): ≈879 transistors/mm² Intel Bloomfield (2008): ≈2,78 Mio. transistors/mm²
8 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 3. Impact of Temperature on Reliability Particular physical effects (e.g. TDDB, EM) contribute to deterioration Abetted by high temperatures Correlation between temperature & failure mechanisms established by Arrhenius model Exponential decrease of IC lifetime with temperature Particular physical effects (e.g. TDDB, EM) contribute to deterioration Abetted by high temperatures Correlation between temperature & failure mechanisms established by Arrhenius model Exponential decrease of IC lifetime with temperature Why is thermal awareness important? Growing influence of on-chip temperature distribution on lifetime, operability, performance etc.
9 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary
Mitigate effects contributing to deterioration & delay occurrence of failures Control of on-chip temperature distribution Mitigate effects contributing to deterioration & delay occurrence of failures Control of on-chip temperature distribution 10 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4. Monitoring and Control of Temperature for NoCs Objective: Effective mechanisms to monitor & control on-chip temperature Integration into existing NoC Preservation of modularity & reusability Minimum costs (area, frequency) Maximum performance of monitoring and control Minimum impact on system performance Effective mechanisms to monitor & control on-chip temperature Integration into existing NoC Preservation of modularity & reusability Minimum costs (area, frequency) Maximum performance of monitoring and control Minimum impact on system performance Requirements:
11 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4.1 Mechanisms for monitoring Concept: attach physical monitoring probes to every IP core temperature variation ∆T Continuous checking of T IPC |T IPC,old - T IPC,new | ≥ ∆T ? Report T IPC,new Area: 66 LUT/FF pairs Frequency: 227 MHz temperature variation ∆T Continuous checking of T IPC |T IPC,old - T IPC,new | ≥ ∆T ? Report T IPC,new Area: 66 LUT/FF pairs Frequency: 227 MHz Event-driven: Period of time ∆t Report T IPC,new every ∆t Area: 80 LUT/FF pairs Frequency: 338 MHz Period of time ∆t Report T IPC,new every ∆t Area: 80 LUT/FF pairs Frequency: 338 MHz Time-driven:
12 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4.2 Mechanisms for control Reception & interpretation of probe packets Instructions for Dynamic Frequency Scaling to probes (if necessary) Area: 507 LUT/FF pairs Frequency: 165 MHz Reception & interpretation of probe packets Instructions for Dynamic Frequency Scaling to probes (if necessary) Area: 507 LUT/FF pairs Frequency: 165 MHz Central Control Unit (CCU): !!! Not the smartest approach, but suffices to test functionality !!!
Area penalty: 30,5% Freq. penalty: 8,2% Area penalty: 30,5% Freq. penalty: 8,2% Area penalty: 7,3% Freq. penalty: / (but Mux/Demux) Area penalty: 7,3% Freq. penalty: / (but Mux/Demux) Area penalty: / Freq. penalty: / Area penalty: / Freq. penalty: / 13 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4.3 Integration of monitoring 3 approaches Different impact on performance & costs Into IP core:Router port of IP core:Extra router port:
14 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4.4 Impact on system performance
15 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 4.5 Performance of monitoring & control
16 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs 5. Summary Event-driven approach preferable (situational monitoring, better performance, no redundant traffic, lower area costs) Integration into NoC using router port of IP core best trade-off between costs & preservation of modularity/non-intrusiveness Event-driven approach preferable (situational monitoring, better performance, no redundant traffic, lower area costs) Integration into NoC using router port of IP core best trade-off between costs & preservation of modularity/non-intrusiveness Conclusion Implementation of 2 approaches for monitoring on-chip temperature + 3 methods for integration into NoC Investigation of: Costs (area, frequency) Impact on system performance Performance of monitoring & control
Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Thanks for your attention! Any questions? University of Rostock, Germany Institute of Applied Microelectronics and Computer Engineering Contact: Homepage:
Establishes relationship between temperature and failure mechanisms Describes dependence of chemical reactions on temperature changes Assumption: all other parameters constant 18 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Arrhenius Model Lifetime of ICs decreases exponentially with temperature Monitoring and Control of Temperature in NoCs
19 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Monitoring and Control of Temperature in NoCs Inoperability of transistor through gate oxide breakdown (long-term) Time Dependent Dielectric Breakdown (TDDB) Formation of charge traps Current flow !!! HEAT !!! More charge traps Conducting path through gate oxide
20 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Transport of material in conductors (i.e. wires) Cause: ion movement induced by current flow (ions’ mobility increases with temperature) Effects: Hillocks short circuits Voids interruption of current paths Electromigration (EM) Monitoring and Control of Temperature in NoCs
21 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Intel Bloomfield: Year: Mio. Transistors 263mm² Tr./mm2 Intel 8086: Year: k transistors 33mm² 879 Tr./mm² Intel Processors Monitoring and Control of Temperature in NoCs
22 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Impact on system performance Monitoring and Control of Temperature in NoCs
23 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Performance of monitoring & control Monitoring and Control of Temperature in NoCs
24 Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October Synthesis results for monitoring & control ComponentIntegration method Event- driven probe Time- driven probe Central Control Unit Into IP core Using IP core port Extra port Frequency [MHz] Area [LUT/FF pairs] Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz Monitoring and Control of Temperature in NoCs