Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,

Slides:



Advertisements
Similar presentations
The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Advertisements

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
EE30332 Ch8 DP – 1 Ch 8 Interfacing Processors and Peripherals Buses °Fundamental tool for designing and building computer systems divide the problem into.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Interfacing. This Week In DIG II  Basic communications terminology  Communications protocols  Microprocessor interfacing: I/O addressing  Port and.
Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Memory Management, File Systems, I/O How Multiprogramming Issues Mesh ECEN 5043 Software Engineering of Multiprogram Systems University of Colorado, Boulder.
Chapter 1 and 2 Computer System and Operating System Overview
TECH CH03 System Buses Computer Components Computer Function
Chapter 8: Part II Storage, Network and Other Peripherals.
BIST vs. ATPG.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
CS-334: Computer Architecture
Towards a Hardware-Software Co-Designed Resilient System Man-Lap (Alex) Li, Pradeep Ramachandran, Sarita Adve, Vikram Adve, Yuanyuan Zhou University of.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
1 Robust System Design to Overcome CMOS Reliability Challenges Subhasish Mitra Kevin Brelsford, Young Moon Kim, Kelin Lee, Yanjing Li Robust Systems Group.
I/O 1 Computer Organization II © McQuain Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner:
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Samsung ARM S3C4510B Product overview System manager
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Top Level View of Computer Function and Interconnection.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 35: Chapter 6 Today’s topic –I/O Overview 1.
Presenter : Cheng-Ta Wu David Lin1, Ted Hong1, Farzan Fallah1, Nagib Hakim3, Subhasish Mitra1, 2 1 Department of EE and 2 Department of CS Stanford University,
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Computer Organization CS224 Fall 2012 Lessons 47 & 48.
EEE440 Computer Architecture
OPERATING SYSTEM SCHEDULING FOR EFFICIENT ONLINE SELF- TEST IN ROBUST SYSTEMS PRIDHVI RAJ RAMANUJULA CSc 8320.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Boot Sequence, Internal Component & Cisco 3 Layer Model 1.
Introduction to virtualization
CS2100 Computer Organisation Input/Output – Own reading only (AY2015/6) Semester 1 Adapted from David Patternson’s lecture slides:
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
CSCE 385: Computer Architecture Spring 2014 Dr. Mike Turi I/O.
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
I/O System Chapter 5 Designed by .VAS.
CS703 - Advanced Operating Systems
Introduction I/O devices can be characterized by I/O bus connections
Lecture 2: Snooping-Based Coherence
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CSC3050 – Computer Architecture
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
William Stallings Computer Organization and Architecture 7th Edition
Chapter 13: I/O Systems.
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner, Intel Corporation Subhasish Mitra, Stanford University 1

Overcoming CMOS Reliability Challenges 2 Circuit aging Early-life failures Lifetime Time Failure rate Burn-in difficult Guardbands expensive On-line self-test and diagnostics Soft errors Built-In Soft Error Resilience (BISER)

Uncore Components Significant in SoCs Cisco Network Processing Engine Uncore Components Uncore Components NVIDIA Tegra Uncore Components IBM Power 7 © techvishal.wordpress.com © news.cnet.com © ciscosistemas.org Uncore examples  Controllers for cache & DRAM  Crossbar  I/O interfaces 3

Robust Uncore Essential Uncore 12% Processor cores 12% Memories 76% New on-line self-test for uncore CASP for processor cores [Li DATE 08, ICCAD 09] ECC, Memory BIST & repair for memories 8-cores 64-threads OpenSPARC T2 SoC © opensparc.net Uncore 4

Challenge 1: High Test Coverage CASPLogic BISTRoving Emulation CoverageHigh?Depends CostLowHigh Design effortModerateHigh CASP: Concurrent, Autonomous, Stored Patterns  High-coverage patterns  off-chip FLASH  System-level on-line test access FLASH cheap, test compression pervasive 5

© intel.com Challenge 2: Power, Performance, Area Costs Stall-and-test inadequate  4-core Intel ® Core™ i7 system results On-line self-test Requests from multiple cores DRAM Controller Core Caches and Interconnects Core  Unresponsiveness or system hang Multiple cores stall 6

Naïve Approaches Inadequate for Uncore Stall-and-test  Unresponsiveness or complete hang Spare unit for each uncore type  12% area overhead* Small area cost Small performance impact Uncore CASP  new techniques required * OpenSPARC T2 design 7

New Uncore On-line Self-Test Principles I. Resource reallocation and sharing (RRS) II. No-performance-impact testing III. Smart backup < 1% area impact, < 3% performance impact ©opensparc.net OpenSPARC T2 SoC 8

I. Resource Reallocation and Sharing (RRS) Components with “similar” functionality in SoCs Temporary reallocation and sharing  Small performance hit without replication ©opensparc.net 4 cores On-line self-test 4. Reroute Crossbar blocks CASP controller L2 banks 4 cores 2. Transfer dirty lines 3. Invalidate 1.Stall and drain requests OpenSPARC T2 9

II. No-Performance-Impact Testing ©opensparc.net 4 cores On-line self-test RRS CASP controller L2 banks 4 cores OpenSPARC T2 IDLE Implication-relations among SoC components Component(s) tested when idle  During test of another component Crossbar blocks 10

III. Smart Backup DMA for network DMA for disks I/O interface Support in smart backup Stall or handle slowly via Programmed I/O Programmed I/O Operations with different requirements Backup unit for performance-critical operations  Absolute minimal additional hardware OpenSPARC T2 11

Application Performance Impact Memory-centric I/O-centric on 4-core Intel system  Disk access: 3% impact  Uncore CASP emulated 4-core Intel ® Core™ i7 © intel.com Execution time impact PARSEC benchmarks No visible unresponsiveness 1.5% performance impact 12

Area and Power Impact CASP controller (< 0.01% area) OFF-CHIP FLASH 200 MB On-chip buffer (8KB) Uncore on-line self-test principles applied © opensparc.net Minimal area impact: < 1% Minimal power impact: < 1% 13

Test Results for Uncore Components 200 MB off-chip FLASH  10X test compression 7 ms – 300 ms test time per component Total pattern countTest coverage Stuck-at5, % % Transition11, % % Inexpensive FLASH Thorough on-line self-test 14

Logic BIST Concurrent BIST [Saluja IEEE TCAD 88] Uncore CASP [This work] Coverage High with high costs DependsHigh Area Cost High High costs possible Low Design complexity Moderate Performance impact Low with our uncore principles Low Uncore CASP vs. Existing Techniques 15

CASP Applicable for Other SoCs Cisco Network Processing Engine NVIDIA Tegra IBM Power 7 I.RRS II.No-performance- impact testing III.Smart backup IV.Core CASP © techvishal.wordpress.com © news.cnet.com © ciscosistemas.org 16

CASP  adaptive on-line self-test & diagnostics 3 new principles for uncore CASP I. Resource reallocation and sharing (RRS) II. No-performance-impact testing III. Smart backup Effective and practical  High test coverage  1% power, 3% performance, 1% area Conclusions 17

18 Backup Slides

CASP on Actual Intel ® Core ™ i7 System Intel Research collaboration Quad-core Intel ® Core ™ i7 (3.2 GHz)  Thermoelectric temperature controller  Debug tool Unique real-life experiment Development of adaptive self-diagnostics Debut Tool Adapter Temperature Controller 19

20 CASP Flow 4. Resume operation Scan chain 3. Apply / analyze high- quality test patterns (test compression, at-speed test…) 1. Select uncore or core component 2. Isolate SoC with CASP controller (mulit-core SoC proliferation) Inexpensive off-chip FLASH (non-volatile storage technology)

RRS Example: L2 Cache Banks 3b. Transfer necessary states (dirty blocks) Write-back to main memory if necessary Crossbar DRAM Controller 0 Bank 0 (under test) Data Tag etc. Controller 1. Stall cache controller 2. Drain outstanding requests 3a. Invalidate clean blocks; Invalidate directory; Invalidate L1 4. Route packets with destination {bank 0, bank 1} to bank 1 Bank 1 (helper) Controller Data Tag etc. … 21

22 No-Performance-Impact Testing Example: CCX (Crossbar) 8 cores, 64 threads L2 Bank 0L2 Bank 7 CCX: multiplexers and arbitration logic 0 CCX: multiplexers and arbitration logic 7 Separate scan chains Separate scan chains Packets reallocated to helper Test at the same time …

23 Smart Backup Example: Non-Cachable Unit 5. Select outputs from backup 3.Turn on Reset 4. Transfer states MUX PIO Boot ROM interface 1. Stall 2. Drain outstanding requests Interrupt status table Interrupt processing Config. status register interface Original (under test) PIO Interrupt processing Backup Minimize area costs at acceptable performance impact

Naïve Approaches Inadequate for Uncore Simple stall-and-test technique OS timer interrupt handler on core i DRAM controller Request to DRAM Under test Stall Demonstration on actual 4-core Intel ® Core™ i7 system Infrequent Test Noticeable unresponsiveness Frequent Test System hang Identical backup units: 12% area overhead OS timer interrupt handler on core 1 Stall … 24

Performance Impact Simulated Latency Overhead (PARSEC Benchmark Suite) Tool: GEMS simulator (modified for RRS) Workload: PARSEC benchmark suite 4 threads on 4 cores, CASP runs 1 sec. every 10 sec. 25

III. Smart Backup DMA for network DMA for disks I/O interface Support in smart backup Stall or handle slowly via Programmed I/O Programmed I/O Operations with different requirements Backup unit for performance-critical operations  Absolute minimal additional hardware OpenSPARC T2 Ethernet port interface Layers 3 and 4 acceleration Network interface Support in smart backup OS orchestration Layer 2 packet process OpenSPARC T2 26