Presenter : Ching-Hua Huang 2014/4/14 A Configurable Bus-Tracer for Error Reproduction in Post-Silicon Validation Shing-Yu Chen ; Ming-Yi Hsiao ; Wen-Ben Jone ; Tien-Fu Chen Nat. Chia-Tung Univ., Hsinchu, Taiwan 2013 International Symposium on VLSI Design, Automation, and Test (VLSI-DAT) National Sun Yat-sen University Embedded System Laboratory
2 In today’s modern system-on-chips (SoCs), there are several intellectual properties (IPs) on the system to provide different functionality. However, the more complex communications on SoCs are, the harder the programmer could discover all errors before first silicon during verification. Therefore, we provide a reconfigurable unit for recording the transactions between IPs and adopt logical vector clock [1] as timestamp of each trace. The programmable trigger unit (PTU) in debugging node (DN) could be configured by the validator to cache their interest sequences of transaction. Because the traces of transactions would have their own timestamp, during the post-silicon validation, we could reproduce the errors in faulty transactions between IPs and get more information for bypassing or fixing the problems.
3 Furthermore, due to several entries of traces, which would shrink observation window very quickly, we also implement a compressor to compress traces before we store them into trace buffer. Finally, our experiments demonstrate that the proposed debugging architecture is capable of recording the critical transactions, and by the proposed reconfigurable debugging unit the debugging execution time can be reduced more than 80%.
4 SoCs have several indispensable benefits. ◦ Re-usable IP blocks simplifies the complexity. ◦ The flexibility for specified applications. ◦ Lowering the power consumption. SoCs still have some challenges need to be overcome: Validation and Debugging. ◦ SoC validation requires identifying errors in individual IP blocks, their interactions and whole system by running test programs. The test program wouldn’t stop until a system failure occurs. ◦ SoC debugging will exploits debugging software to localize the failure to a small region and find the root cause. Fix or bypass the failure in the end.
5 What’s the problem With the rapid progress of process techniques, dozens of Ips and communication fabrics are integrated into a chip. ◦ The more complex communication behaviors will raise the probability of interaction errors, which is due to unexpected communications between Ips. The main challenges in SoC validation : ◦ Limit length of on-chip trace buffer The design time would increase because of the hard-to-detect bugs. ◦ Re-generate the faulty sequences because of non-deterministic execution. A methodology called “cyclic debugging” for removing a hardware bug is often adopted in today SoC development.
6 [This paper] [8] Scalable DFD architecture with distributed ELA [8] Scalable DFD architecture with distributed ELA [5] Distrubuted Hardware Matcher Framework [5] Distrubuted Hardware Matcher Framework [7] About Post-Silicon Validation [7] About Post-Silicon Validation [6] A reconfigurable debugging instrument [6] A reconfigurable debugging instrument [1] Lamport’s logical vector clock [1] Lamport’s logical vector clock [9] SigRace [9] SigRace [10] IMITATOR [10] IMITATOR To order the transaction sequences and identify relation between each transaction among IPs [11,12] Some other works based on transaction debugging [11,12] Some other works based on transaction debugging [13] AMBA Open Specifications - ARM [13] AMBA Open Specifications - ARM To better utilize the available on-chip storage for distributed trace buffers This method is based on AHB buses To replay the error interactions and fetch crucial information in transactions Post-silicon validation has four crucial steps: (1)Detecting a problem (2)Localizing the problem (3)Identifying the root cause of the problem (4)Fixing or bypassing the problem It monitor particular sequences of signals about potential errors for each IP. Dynamically create new hardware structures in existing silicon for debugging purposes.
7 A hardware debugging solution is proposed to solve the above challenges. ◦ Key idea is caching transactions of interest as well as their timestamps Via a hardware monitor piggybacked on the IP’s interface. ◦ Targeted to stored the compressed timestamp in the on-chip trace buffer. This work are emphasize on debugging communications among Ips : ◦ Propose a debugging architecture, debugging node, with programmable unit, for monitoring the interactions between IPs. ◦ Propose a timestamp recording mechanism for ordering non- deterministic interactions between IPs and with compression technique in order to widen observation windows of traces.
8 This work is based on the trace-based debugging. ◦ Each master interface and arbiter with a DN, respectively. ◦ A Global Timing Vector (GTV) for maintaining latest timing vector. In order to watch erroneous transaction sequences, a DN is attached to IP’s wrapper interface. ◦ Master wrapper provides DN sufficient information to watching any types of transfers the master requested. ◦ DN observes the wrapper signals and compares the signals with the triggered conditions in PTU.
9 First, the trigger condition in the Programmable Trigger Unit (PTU) can be reconfigured by the Control Unit. After running a test program, each DN record the timing information of the triggered communication into a timestamp unit, called Local Timing Vector (LTV).
10 When PTU had been triggered, the LTV would request the latest Global Timing Vector (GTV) and increase the number in field of vector and update GTV. Subsequently, the detected communication event and its new LTV would be compressed and stored into trace buffer. When the test program finished, the traces could be utilized for localizing or reproducing error sequences. ◦ It not only processes data for locating bug but also analyzes the timing information for bug reproduction.
11 This work have four masters and an arbiter in the SoC system, to consider that each field in a timing vector is 32 bits and each vector has five fields, therefore each LTV will be 160 bits in total. Obviously, LTV is too large to the trace buffer. ◦ LTV will occupy a lot of space of the trace buffer and will shrink the trace buffer observation window. To overcome this problem ◦ Use a compressor before the trace buffer Widen the observation window ◦ Only recorded the difference of LTVs in the trace buffer.
12 The progress of LTV compression is as follows: ◦ (1) When compressor gets the latest LTV, it computes the difference between the incoming LTV and writes the difference to the trace buffer. ◦ (2) Compressor copies the incoming LTV ◦ (3) The difference will be processed to recover the order info. Finally, it get several recovered timing vectors from each masters or arbiters in the system.
13 Limit length of on-chip trace buffer ◦ Reduce Design time ? ◦ Observation window length ? Re-generate the faulty sequences because of non-deterministic execution. ◦ Vector timestamp ?
14 The debugging system is implemented in C++ and embed it into a 4-processor SoC system [14]. ◦ The SoC has 4 ARM cores with individual 32-Kbyte L1 cache, a 64-Kbyte shared L2 cache that are connected in a system bus. ◦ The debugging system has 5 DNs including 1-Kbyte trace buffer. To capture the WSR and RWDR errors, DNs are configured to watch write and read request of these two maters. ◦ WSR is a master’s write transfers continuously own a slave’s access System performance may severely slowdown if WSR occurs frequently. ◦ RWDR is another error caused by data race of two masters It may result system crush due to wrong data operations.
15 Below shows the normalized debugging time between different DN configurations and the observation window length for different DN configurations. The debugging execution time is significantly reduced if DN is configured precisely for detecting specified errors. For communication errors, the experimental result shows that the observation is significantly improved by using appropriate DN configurations.
16 Below shows the normalized log size overhead of vector timestamp in 20000, and transactions processed. The proposed compression technique improves the log size and trace buffer observation window length.
17 Conclusion ◦ SoC validation and debugging solutions are increasingly important for future SoC designs. This work presents a debugging system for recording and reproducing system interaction errors. The execution history is recorded into trace buffers on each DN. After the execution of test program, the special post-analysis algorithms would be utilized for reproducing errors between IPs. ◦ The results of experiments demonstrate the debugging system It is capable of dealing with a variety of system-level errors Improves the debugging execution time more than 80% Storage overhead with compression technique. My comments ◦ I think the way of the bug reproduction is difficult to implementation. Because I can’t forecast that the error happened. ◦ The experimental result is simple to comprehend.