Presenter: Jyun-Yan Li A Software-Based Self-Test Methodology for On-Line Testing of Processor Caches G. Theodorou, N. Kranitis, A. Paschalis, D. Gizopoulos.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Presenter: Jyun-Yan Li On the Generation of Functional Test Programs for the Cache Replacement Logic W. J. Perez H. Universidad del Valle Grupo de Bionanoelectrónica.
DSPs Vs General Purpose Microprocessors
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Performance of Cache Memory
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Computer Organization and Architecture
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
44 nd DAC, June 4-8, 2007 Processor External Interrupt Verification Tool (PEVT) Fu-Ching Yang, Wen-Kai Huang and Ing-Jer Huang Dept. of Computer Science.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Presenter: Jyun-Yan Li Design Fault Directed Test Generation for Microprocessor Validation Deepak A. Mathaikutty, Sandeep K. Shukla FERMAT Lab, Virginia.
SLAM: SLice And Merge – Effective Test Generation for Large Systems ICCAD’13 Review Reviewer: Chien-Yen Kuo.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Embedded Hardware and Software Self-Testing Methodologies for Processor Cores Li Chen, Sujit Dey, Pablo Sanchez, Krishna Sekar, and Ying Chen Design Automation.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
Functional Coverage Driven Test Generation for Validation of Pipelined Processors P. Mishra and N. Dutt Proceedings of the Design, Automation and Test.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
BIST vs. ATPG.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Presenter: Jyun-Yan Li A software-based self-test methodology for in-system testing of processor cache tag arrays G. Theodorou, N. Kranitis, A. Paschalis.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Microprocessor-based systems Curse 7 Memory hierarchies.
Presenter: Jyun-Yan Li Systematic Software-Based Self-Test for Pipelined Processors Mihalis Psarakis Dimitris Gizopoulos Miltiadis Hatzimihail Dept. of.
Presenter : Ching-Hua Huang 2013/9/16 Visibility Enhancement for Silicon Debug Cited count : 62 Yu-Chin Hsu; Furshing Tsai; Wells Jong; Ying-Tsai Chang.
Lecture 19: Virtual Memory
Mikko Viitanen Measuring Media Gateway Software Efficiency Using Performance Monitor Counters Mikko Viitanen S Thesis seminar on networking.
Presenter : Ching-Hua Huang 2013/7/15 A Unified Methodology for Pre-Silicon Verification and Post-Silicon Validation Citation : 15 Adir, A., Copty, S.
Presenter: Jyun-Yan Li Effective Software-Based Self-Test Strategies for On-Line Periodic Testing of Embedded Processors Antonis Paschalis Department of.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Presenter: Jyun-Yan Li A hybrid approach to the test of cache memory controllers embedded in SoCs’ W. J. Perez, J. Velasco Universidad del Valle Grupo.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Presenter: PCLee. Semiconductor manufacturers aim at delivering high-quality new devices within shorter times in order to gain market shares.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Computer Architecture Lecture 26 Fasih ur Rehman.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
TITLE : types of BIST MODULE 5.1 BIST basics
Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.
Nios II Processor: Memory Organization and Access
rePLay: A Hardware Framework for Dynamic Optimization
What Are Performance Counters?
Chapter 4 The Von Neumann Model
Presentation transcript:

Presenter: Jyun-Yan Li A Software-Based Self-Test Methodology for On-Line Testing of Processor Caches G. Theodorou, N. Kranitis, A. Paschalis, D. Gizopoulos Department of Informatics and Telecommunications, University of Athens, Athens, Greece Test Conference (ITC), 2011 IEEE International Cite count: 3

Nowadays, on-line testing is essential for modern high- density microprocessors to detect either latent hardware defects or new defects appearing during lifetime both in logic and memory modules. For cache arrays, the flexibility to apply online different March tests is a critical requirement. For small memory arrays that may lack programmable Memory Built-In Self-Test (MBIST) circuitry, such as L1 cache arrays, Software-Based Self-Test (SBST) can be a flexible and low-cost solution for on-line March test application. In this paper, an SBST program development methodology is proposed for online periodic testing of L1 data and instruction cache, both for tag and data arrays. 2

The proposed SBST methodology utilizes existing special purpose instructions that modern Instruction Set Architectures (ISAs) implement to access caches for debug-diagnostic and performance purposes, termed hereafter Direct Cache Access (DCA) instructions, as well as, performance monitoring mechanisms to overcome testability challenges. The methodology has been applied to 2 processor benchmarks, OpenRISC and LEON3 to demonstrate its high adaptability, and experimental comparison results against previous contributions show that the utilization of DCA instructions significantly improves test code size (83%) and test duration (72%) when applied to the same benchmark (LEON3). 3

Why is essential to test cache  The caches of mulprocessor occupy almost 90% in the chip area  The hardware defects may cause 。 Erroneous cache miss  In the tag array 。 Unpredicted system behavior  In the data array Memory built-in self test (MBIST) can be used for on-line test  The cost of MBIST is higher than the cache 。 Chip area, performance  The Software-based self test (SBST) is utilized to overcome the lack of MBIST 4

5 SBST Embedded processor [7,8] Embedded processor [7,8] Cache memory [12, 13] Cache memory [12, 13] Classified according to ISA or RTL A Software-Based Self-Test Methodology for On-Line Testing of Processor Caches This paper: Direct mapped data cache [15] Direct mapped data cache [15] Set-associative cache [19] Set-associative cache [19] Aim at the some components to generate test pattern RAMSES memory fault simulator [22] RAMSES memory fault simulator [22] SBST classifying [11] SBST classifying [11] Test cache by March C- algorithm & cache controller Test data & instruction cache Test cache when no special instruction in the ISA

Not direct access by generic ISA D-Data  Implement March write/read by ISA D-Tag & I-Tag  March write by load instruction (occur cache miss)  Can’t implement March read 。 Observation by unexpected cache miss  I-tag may be covered by SBST routine and can’t implement descending order I-Data  Test pattern is composed by valid instructions  be covered by SBST routine and can’t implement descending order 6

Using Direct Cache Access (DCA) instructions  Special instructions for debug-diagnostic and performance purposes 。 Direct controllability and observability of cache arrays  Alternate space identifier (ASI) store/load instructions in the SPARC 7  System control coprocessor (CP15) debug operations in the ARM  Prefetch instruction 。 As DCA to cache controllability 。 With generic instructions to cache observability

Applying March tests to verify cache arrays  Ex: March SS Main features  High controllability of DCA instructions for March write  High observability of DCA instructions for March read  if no DAC instructions, implementation March read for tag arrays by the performance monitor  Compact test response 8

Have DCA instructions  Initial D-Data and D-tag array Haven’t DCA instructions  Using prefetch instruction to fill cache lines and tags Neither DCA nor prefetch instruction  Cache load miss and refill by the generic load instructions 9

Have DCA instructions  Accessing both D-Tag and D-data arrays and comparing them Haven’t DCA instructions  Using performance monitor to determine cache hit or not Neither DCA nor performance monitor  Tag arrays are verified indirectly by comparing data in [13] & [15] 10

Implement ascending or descending by loop Have DCA instructions  DD_write():initial cache line/word  DD_read():read cache line/word  can’t select way 。 Repeat k times for a k-ways with k different DB sets Haven’t DCA instructions  Prefetch(A)+m load instructions 。 m:number of word per line  performance monitor 。 determine cache hit or not  Data backgrounds (DB) is defined in a memory data segment 11

SBST routine affect the instruction cache testing  Place in the same cache during test 。 Replace test data when fetch instruction of pattern  SBST routine should be placed in a non-cacheable 。 If no non-cacheable area  Using cache enable/disable mechanism  Increase code size & test duration  descending order 12

Have DCA instructions  Initial I-Data and I-tag array  No limitary content of data array Haven’t DCA instructions  Using prefetch instruction to fill cache lines and tags 。 Should be valid instructions 。 1 return instruction in every cache line Neither DCA nor prefetch instruction  Cache load miss and refill by the generic load instructions 13

Have DCA instructions  Accessing both I-Tag & I-data arrays and comparing them Haven’t DCA instructions  Call instruction and the target is desired cache line  Verifying by the operating result of target for I-Data  Using performance monitor to determine cache hit or not for I-Tag Neither DCA nor performance monitor  Inserting control instruction at the different word in a cache line 14

Have DCA instructions  Limitation of accessing cache line in descending AO by SBST routine Haven’t DCA instructions  Prefetch(A) + performance monitor for the descending AO 。 By software loop 15

How many  instructions in the test program  Clock cycle for the simulation  Faults in the hardware How much  Fault coverage 。 Which kind of the faults Comparing with ? 16

RAMSES memory fault simulator  Consist of a simulation engine & fault descriptors LEON3  4KB 2-way L1 data & instruction cache OpenRISC 1200  4KB direct mapped L1 data & instruction cache 17

LEON3  Using DCA instructions for the March read/write  Expect response is zero  Fault coverage: 100% OpenRISC 1200  Using prefetch instruction for the March write  Using enable/disable cache for the instruction cache  Monitor cache miss by performance counter  Data cache fault coverage: 100%  Instruction cache fault coverage: 。 Single cell fault: 100% 。 Coupling fault: 99%  Corrupt a single test 18

March test: March C- Cache array is similar with LEON3 Data cache verification Result: 19

March test: March SS Both are LEON3 processor Instruction cache verification Result:  Code size (I-Tag & I-Data) improves 83%  Time duration improves 72% 20

Overcomes testability challenges of L1 cache array by  Special purpose instructions for debug-diagnostic and  Performance monitor Improve 72% time duration and 83% code size in the LEON3 of experimental result My comment  Using them to reduce code size and execution time to achieve low-cost test pattern  Hard to implement I-Data and I-tag test pattern when no DCA 21