Presenter : Ching-Hua Huang 2014/4/14 A Configurable Bus-Tracer for Error Reproduction in Post-Silicon Validation Shing-Yu Chen ; Ming-Yi Hsiao ; Wen-Ben.

Slides:



Advertisements
Similar presentations
Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
Advertisements

PRESENTER: PCLee System-on-chip (SoC) designs use bus protocols for high performance data transfer among the Intellectual Property (IP) cores.
Computer Architecture
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Presenter : Shao-Chieh Hou VLSI Design, Automation and Test, VLSI-DAT 2007.
Reporter :LYWang We propose a multimedia SoC platform with a crossbar on-chip bus which can reduce the bottleneck of on-chip communication.
Presenter : Shih-Tung Huang 2015/4/30 EICE team Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto,
Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu.
Feng-Xiang Huang 2015/5/4 International Symposium Quality Electronic Design (ISQED), th M. H Neishaburi, Zeljko Zilic, McGill University, Quebec.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Presenter: Jyun-Yan Li A Software-Based Self-Test Methodology for On-Line Testing of Processor Caches G. Theodorou, N. Kranitis, A. Paschalis, D. Gizopoulos.
Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Feng-Xiang Huang A Low-Cost SOC Debug Platform Based on On-Chip Test Architectures.
Presenter : Shao-Jay Hou. Today’s complex integrated circuit designs increasingly rely on post-silicon validation to eliminate bugs that escape from pre-silicon.
1 Presenter: Chien-Chih Chen. 2 An Assertion Library for On- Chip White-Box Verification at Run-Time On-Chip Verification of NoCs Using Assertion Processors.
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.
Operating System Support Focus on Architecture
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Presenter: Shao-Jay Hou. Embedded logic analysis has emerged as a powerful technique for identifying functional bugs during post- silicon validation,
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
~ EDA lab ~ Interconnect Verification for SOC Jing-Yang Jou Department of Electronics Engineering National Chiao Tung University Hsinchu, Taiwan
1 Multi-Core Debug Platform for NoC-Based Systems Shan Tang and Qiang Xu EDA&Testing Laboratory.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
1 Design For Debug Using DAFCA system Gadi Glikberg 15/6/06.
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Feng-Xiang Huang A Design-for-Debug (DfD) for NoC-based SoC Debugging via NoC Hyunbean Yi 1, Sungju Park 2, and Sandip Kundu 1 1 Department of Electrical.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
New Challenges in Cloud Datacenter Monitoring and Management
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Presenter : Shao-Cheih Hou Sight count : 11 ASPDAC ‘08.
Presenter : Ching-Hua Huang 2012/11/3 Implementation and Prototyping of a Complex Multi-Project System-on-a-Chip Chun-Ming Huang, Chien-Ming Wu, Chih-Chyau.
0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang.
L i a b l eh kC o m p u t i n gL a b o r a t o r y Trace-Based Post-Silicon Validation for VLSI Circuits Xiao Liu Department of Computer Science and Engineering.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Reporter: PCLee. Assertions in silicon help post-silicon debug by providing observability of internal properties within a system which are.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Presenter: Hong-Wei Zhuang On-Chip SOC Test Platform Design Based on IEEE 1500 Standard Very Large Scale Integration (VLSI) Systems, IEEE Transactions.
Presenter : Ching-Hua Huang 2013/9/16 Visibility Enhancement for Silicon Debug Cited count : 62 Yu-Chin Hsu; Furshing Tsai; Wells Jong; Ying-Tsai Chang.
Presenter : Ching-Hua Huang 2013/7/15 A Unified Methodology for Pre-Silicon Verification and Post-Silicon Validation Citation : 15 Adir, A., Copty, S.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Reporter :PCLee The decisions on when to acquire debug data during post-silicon validation are determined by trigger events that are programmed.
Presenter: PCLee Post-silicon validation is used to identify design errors in silicon. Its main limitation is real-time observability of the.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Presenter: PCLee. Semiconductor manufacturers aim at delivering high-quality new devices within shorter times in order to gain market shares.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept Presenter:
Feng-Xiang Huang Test Symposium(ETS), th IEEE European Ko, Ho Fai; Nicolici, Nicola; Department of Electrical and Computer Engineering,
25 April 2000 SEESCOASEESCOA STWW - Programma Evaluation of on-chip debugging techniques Deliverable D5.1 Michiel Ronsse.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 13. Review Shared Data Software Architectures – Black board Style architecture.
- 1 - ©2009 Jasper Design Automation ©2009 Jasper Design Automation JasperGold for Targeted ROI JasperGold solutions portfolio delivers competitive.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
R ECONFIGURABLE SECURITY SUPPORT FOR EMBEDDED SYSTEMS 1 AKSHATA VARDHARAJ.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
Paper by D.L Parnas And D.P.Siewiorek Prepared by Xi Chen May 16,2003
Presentation transcript:

Presenter : Ching-Hua Huang 2014/4/14 A Configurable Bus-Tracer for Error Reproduction in Post-Silicon Validation Shing-Yu Chen ; Ming-Yi Hsiao ; Wen-Ben Jone ; Tien-Fu Chen Nat. Chia-Tung Univ., Hsinchu, Taiwan 2013 International Symposium on VLSI Design, Automation, and Test (VLSI-DAT) National Sun Yat-sen University Embedded System Laboratory

2 In today’s modern system-on-chips (SoCs), there are several intellectual properties (IPs) on the system to provide different functionality. However, the more complex communications on SoCs are, the harder the programmer could discover all errors before first silicon during verification. Therefore, we provide a reconfigurable unit for recording the transactions between IPs and adopt logical vector clock [1] as timestamp of each trace. The programmable trigger unit (PTU) in debugging node (DN) could be configured by the validator to cache their interest sequences of transaction. Because the traces of transactions would have their own timestamp, during the post-silicon validation, we could reproduce the errors in faulty transactions between IPs and get more information for bypassing or fixing the problems.

3 Furthermore, due to several entries of traces, which would shrink observation window very quickly, we also implement a compressor to compress traces before we store them into trace buffer. Finally, our experiments demonstrate that the proposed debugging architecture is capable of recording the critical transactions, and by the proposed reconfigurable debugging unit the debugging execution time can be reduced more than 80%.

4  SoCs have several indispensable benefits. ◦ Re-usable IP blocks simplifies the complexity. ◦ The flexibility for specified applications. ◦ Lowering the power consumption.  SoCs still have some challenges need to be overcome: Validation and Debugging. ◦ SoC validation requires identifying errors in individual IP blocks, their interactions and whole system by running test programs.  The test program wouldn’t stop until a system failure occurs. ◦ SoC debugging will exploits debugging software to localize the failure to a small region and find the root cause.  Fix or bypass the failure in the end.

5 What’s the problem  With the rapid progress of process techniques, dozens of Ips and communication fabrics are integrated into a chip. ◦ The more complex communication behaviors will raise the probability of interaction errors, which is due to unexpected communications between Ips.  The main challenges in SoC validation : ◦ Limit length of on-chip trace buffer  The design time would increase because of the hard-to-detect bugs. ◦ Re-generate the faulty sequences because of non-deterministic execution.  A methodology called “cyclic debugging” for removing a hardware bug is often adopted in today SoC development.

6 [This paper] [8] Scalable DFD architecture with distributed ELA [8] Scalable DFD architecture with distributed ELA [5] Distrubuted Hardware Matcher Framework [5] Distrubuted Hardware Matcher Framework [7] About Post-Silicon Validation [7] About Post-Silicon Validation [6] A reconfigurable debugging instrument [6] A reconfigurable debugging instrument [1] Lamport’s logical vector clock [1] Lamport’s logical vector clock [9] SigRace [9] SigRace [10] IMITATOR [10] IMITATOR To order the transaction sequences and identify relation between each transaction among IPs [11,12] Some other works based on transaction debugging [11,12] Some other works based on transaction debugging [13] AMBA Open Specifications - ARM [13] AMBA Open Specifications - ARM To better utilize the available on-chip storage for distributed trace buffers This method is based on AHB buses To replay the error interactions and fetch crucial information in transactions Post-silicon validation has four crucial steps: (1)Detecting a problem (2)Localizing the problem (3)Identifying the root cause of the problem (4)Fixing or bypassing the problem It monitor particular sequences of signals about potential errors for each IP. Dynamically create new hardware structures in existing silicon for debugging purposes.

7  A hardware debugging solution is proposed to solve the above challenges. ◦ Key idea is caching transactions of interest as well as their timestamps  Via a hardware monitor piggybacked on the IP’s interface. ◦ Targeted to stored the compressed timestamp in the on-chip trace buffer.  This work are emphasize on debugging communications among Ips : ◦ Propose a debugging architecture, debugging node, with programmable unit, for monitoring the interactions between IPs. ◦ Propose a timestamp recording mechanism for ordering non- deterministic interactions between IPs and with compression technique in order to widen observation windows of traces.

8  This work is based on the trace-based debugging. ◦ Each master interface and arbiter with a DN, respectively. ◦ A Global Timing Vector (GTV) for maintaining latest timing vector.  In order to watch erroneous transaction sequences, a DN is attached to IP’s wrapper interface. ◦ Master wrapper provides DN sufficient information to watching any types of transfers the master requested. ◦ DN observes the wrapper signals and compares the signals with the triggered conditions in PTU.

9  First, the trigger condition in the Programmable Trigger Unit (PTU) can be reconfigured by the Control Unit.  After running a test program, each DN record the timing information of the triggered communication into a timestamp unit, called Local Timing Vector (LTV).

10  When PTU had been triggered, the LTV would request the latest Global Timing Vector (GTV) and increase the number in field of vector and update GTV.  Subsequently, the detected communication event and its new LTV would be compressed and stored into trace buffer.  When the test program finished, the traces could be utilized for localizing or reproducing error sequences. ◦ It not only processes data for locating bug but also analyzes the timing information for bug reproduction.

11  This work have four masters and an arbiter in the SoC system, to consider that each field in a timing vector is 32 bits and each vector has five fields, therefore each LTV will be 160 bits in total.  Obviously, LTV is too large to the trace buffer. ◦ LTV will occupy a lot of space of the trace buffer and will shrink the trace buffer observation window.  To overcome this problem ◦ Use a compressor before the trace buffer  Widen the observation window ◦ Only recorded the difference of LTVs in the trace buffer.

12  The progress of LTV compression is as follows: ◦ (1) When compressor gets the latest LTV, it computes the difference between the incoming LTV and writes the difference to the trace buffer. ◦ (2) Compressor copies the incoming LTV ◦ (3) The difference will be processed to recover the order info.  Finally, it get several recovered timing vectors from each masters or arbiters in the system.

13  Limit length of on-chip trace buffer ◦ Reduce Design time ? ◦ Observation window length ?  Re-generate the faulty sequences because of non-deterministic execution. ◦ Vector timestamp ?

14  The debugging system is implemented in C++ and embed it into a 4-processor SoC system [14]. ◦ The SoC has 4 ARM cores with individual 32-Kbyte L1 cache, a 64-Kbyte shared L2 cache that are connected in a system bus. ◦ The debugging system has 5 DNs including 1-Kbyte trace buffer.  To capture the WSR and RWDR errors, DNs are configured to watch write and read request of these two maters. ◦ WSR is a master’s write transfers continuously own a slave’s access  System performance may severely slowdown if WSR occurs frequently. ◦ RWDR is another error caused by data race of two masters  It may result system crush due to wrong data operations.

15  Below shows the normalized debugging time between different DN configurations and the observation window length for different DN configurations.  The debugging execution time is significantly reduced if DN is configured precisely for detecting specified errors.  For communication errors, the experimental result shows that the observation is significantly improved by using appropriate DN configurations.

16  Below shows the normalized log size overhead of vector timestamp in 20000, and transactions processed.  The proposed compression technique improves the log size and trace buffer observation window length.

17  Conclusion ◦ SoC validation and debugging solutions are increasingly important for future SoC designs. This work presents a debugging system for recording and reproducing system interaction errors.  The execution history is recorded into trace buffers on each DN.  After the execution of test program, the special post-analysis algorithms would be utilized for reproducing errors between IPs. ◦ The results of experiments demonstrate the debugging system  It is capable of dealing with a variety of system-level errors  Improves the debugging execution time more than 80%  Storage overhead with compression technique.  My comments ◦ I think the way of the bug reproduction is difficult to implementation.  Because I can’t forecast that the error happened. ◦ The experimental result is simple to comprehend.