Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Similar presentations


Presentation on theme: "Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,"— Presentation transcript:

1 Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner, Intel Corporation Subhasish Mitra, Stanford University 1

2 Overcoming CMOS Reliability Challenges 2 Circuit aging Early-life failures Lifetime Time Failure rate Burn-in difficult Guardbands expensive On-line self-test and diagnostics Soft errors Built-In Soft Error Resilience (BISER)

3 Uncore Components Significant in SoCs Cisco Network Processing Engine Uncore Components Uncore Components NVIDIA Tegra Uncore Components IBM Power 7 © techvishal.wordpress.com © news.cnet.com © ciscosistemas.org Uncore examples  Controllers for cache & DRAM  Crossbar  I/O interfaces 3

4 Robust Uncore Essential Uncore 12% Processor cores 12% Memories 76% New on-line self-test for uncore CASP for processor cores [Li DATE 08, ICCAD 09] ECC, Memory BIST & repair for memories 8-cores 64-threads OpenSPARC T2 SoC © opensparc.net Uncore 4

5 Challenge 1: High Test Coverage CASPLogic BISTRoving Emulation CoverageHigh?Depends CostLowHigh Design effortModerateHigh CASP: Concurrent, Autonomous, Stored Patterns  High-coverage patterns  off-chip FLASH  System-level on-line test access FLASH cheap, test compression pervasive 5

6 © intel.com Challenge 2: Power, Performance, Area Costs Stall-and-test inadequate  4-core Intel ® Core™ i7 system results On-line self-test Requests from multiple cores DRAM Controller Core Caches and Interconnects Core  Unresponsiveness or system hang Multiple cores stall 6

7 Naïve Approaches Inadequate for Uncore Stall-and-test  Unresponsiveness or complete hang Spare unit for each uncore type  12% area overhead* Small area cost Small performance impact Uncore CASP  new techniques required * OpenSPARC T2 design 7

8 New Uncore On-line Self-Test Principles I. Resource reallocation and sharing (RRS) II. No-performance-impact testing III. Smart backup < 1% area impact, < 3% performance impact ©opensparc.net OpenSPARC T2 SoC 8

9 I. Resource Reallocation and Sharing (RRS) Components with “similar” functionality in SoCs Temporary reallocation and sharing  Small performance hit without replication ©opensparc.net 4 cores On-line self-test 4. Reroute Crossbar blocks CASP controller L2 banks 4 cores 2. Transfer dirty lines 3. Invalidate 1.Stall and drain requests OpenSPARC T2 9

10 II. No-Performance-Impact Testing ©opensparc.net 4 cores On-line self-test RRS CASP controller L2 banks 4 cores OpenSPARC T2 IDLE Implication-relations among SoC components Component(s) tested when idle  During test of another component Crossbar blocks 10

11 III. Smart Backup DMA for network DMA for disks I/O interface Support in smart backup Stall or handle slowly via Programmed I/O Programmed I/O Operations with different requirements Backup unit for performance-critical operations  Absolute minimal additional hardware OpenSPARC T2 11

12 Application Performance Impact Memory-centric I/O-centric on 4-core Intel system  Disk access: 3% impact  Uncore CASP emulated 4-core Intel ® Core™ i7 © intel.com Execution time impact PARSEC benchmarks No visible unresponsiveness 1.5% performance impact 12

13 Area and Power Impact CASP controller (< 0.01% area) OFF-CHIP FLASH 200 MB On-chip buffer (8KB) Uncore on-line self-test principles applied © opensparc.net Minimal area impact: < 1% Minimal power impact: < 1% 13

14 Test Results for Uncore Components 200 MB off-chip FLASH  10X test compression 7 ms – 300 ms test time per component Total pattern countTest coverage Stuck-at5,57799.2% - 99.9% Transition11,04992.8% - 97.8% Inexpensive FLASH Thorough on-line self-test 14

15 Logic BIST Concurrent BIST [Saluja IEEE TCAD 88] Uncore CASP [This work] Coverage High with high costs DependsHigh Area Cost High High costs possible Low Design complexity Moderate Performance impact Low with our uncore principles Low Uncore CASP vs. Existing Techniques 15

16 CASP Applicable for Other SoCs Cisco Network Processing Engine NVIDIA Tegra IBM Power 7 I.RRS II.No-performance- impact testing III.Smart backup IV.Core CASP © techvishal.wordpress.com © news.cnet.com © ciscosistemas.org 16

17 CASP  adaptive on-line self-test & diagnostics 3 new principles for uncore CASP I. Resource reallocation and sharing (RRS) II. No-performance-impact testing III. Smart backup Effective and practical  High test coverage  1% power, 3% performance, 1% area Conclusions 17

18 18 Backup Slides

19 CASP on Actual Intel ® Core ™ i7 System Intel Research collaboration Quad-core Intel ® Core ™ i7 (3.2 GHz)  Thermoelectric temperature controller  Debug tool Unique real-life experiment Development of adaptive self-diagnostics Debut Tool Adapter Temperature Controller 19

20 20 CASP Flow 4. Resume operation Scan chain 3. Apply / analyze high- quality test patterns (test compression, at-speed test…) 1. Select uncore or core component 2. Isolate SoC with CASP controller (mulit-core SoC proliferation) Inexpensive off-chip FLASH (non-volatile storage technology)

21 RRS Example: L2 Cache Banks 3b. Transfer necessary states (dirty blocks) Write-back to main memory if necessary Crossbar DRAM Controller 0 Bank 0 (under test) Data Tag etc. Controller 1. Stall cache controller 2. Drain outstanding requests 3a. Invalidate clean blocks; Invalidate directory; Invalidate L1 4. Route packets with destination {bank 0, bank 1} to bank 1 Bank 1 (helper) Controller Data Tag etc. … 21

22 22 No-Performance-Impact Testing Example: CCX (Crossbar) 8 cores, 64 threads L2 Bank 0L2 Bank 7 CCX: multiplexers and arbitration logic 0 CCX: multiplexers and arbitration logic 7 Separate scan chains Separate scan chains Packets reallocated to helper Test at the same time …

23 23 Smart Backup Example: Non-Cachable Unit 5. Select outputs from backup 3.Turn on Reset 4. Transfer states MUX PIO Boot ROM interface 1. Stall 2. Drain outstanding requests Interrupt status table Interrupt processing Config. status register interface Original (under test) PIO Interrupt processing Backup Minimize area costs at acceptable performance impact

24 Naïve Approaches Inadequate for Uncore Simple stall-and-test technique OS timer interrupt handler on core i DRAM controller Request to DRAM Under test Stall Demonstration on actual 4-core Intel ® Core™ i7 system Infrequent Test Noticeable unresponsiveness Frequent Test System hang Identical backup units: 12% area overhead OS timer interrupt handler on core 1 Stall … 24

25 Performance Impact Simulated Latency Overhead (PARSEC Benchmark Suite) Tool: GEMS simulator (modified for RRS) Workload: PARSEC benchmark suite 4 threads on 4 cores, CASP runs 1 sec. every 10 sec. 25

26 III. Smart Backup DMA for network DMA for disks I/O interface Support in smart backup Stall or handle slowly via Programmed I/O Programmed I/O Operations with different requirements Backup unit for performance-critical operations  Absolute minimal additional hardware OpenSPARC T2 Ethernet port interface Layers 3 and 4 acceleration Network interface Support in smart backup OS orchestration Layer 2 packet process OpenSPARC T2 26


Download ppt "Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,"

Similar presentations


Ads by Google