Presenter : Shau-Jay Hou Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/12 EICE team TraceDo: An On-Chip Trace System for Real-Time Debug and Optimization in Multiprocessor.

Slides:



Advertisements
Similar presentations
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Advertisements

Presenter : Shao-Chieh Hou VLSI Design, Automation and Test, VLSI-DAT 2007.
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Programmable Interval Timer
Presenter : Shih-Tung Huang 2015/4/30 EICE team Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto,
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.
Feng-Xiang Huang MCORE Architecture implements Real-Time Debug Port based on Nexus Consortium Specification David Ruimy Gonzales Senior Member of Technical.
The ARM7TDMI Hardware Architecture
Feng-Xiang Huang A Low-Cost SOC Debug Platform Based on On-Chip Test Architectures.
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.
Presenter: Shao-Jay Hou. This paper introduces a new unobtrusive and cost-effective method for the capture and compression of program execution traces.
1 Multi-Core Debug Platform for NoC-Based Systems Shan Tang and Qiang Xu EDA&Testing Laboratory.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
ARM Processor Architecture
Cortex-M3 Debugging System
Presenter : Shao-Cheih Hou Sight count : 11 ASPDAC ‘08.
CS-334: Computer Architecture
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
MICE III 68000/20/30 MICETEK International Inc. CPU MICEIII MICEView Examples Contents Part 1: An introduction to the MC68000,MC68020 and Part.
Ch. 9 Interrupt Programming and Real-Time Sysstems From Valvano’s Introduction to Embedded Systems.
Reporter: PCLee. Assertions in silicon help post-silicon debug by providing observability of internal properties within a system which are.
National Taiwan University JTAG and Multi-ICE Speaker : 沈文中.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
MICROPROCESSOR INPUT/OUTPUT
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Top Level View of Computer Function and Interconnection.
Reporter :PCLee The decisions on when to acquire debug data during post-silicon validation are determined by trigger events that are programmed.
Presenter: PCLee Post-silicon validation is used to identify design errors in silicon. Its main limitation is real-time observability of the.
EEE440 Computer Architecture
25 April 2000 SEESCOASEESCOA STWW - Programma Evaluation of on-chip debugging techniques Deliverable D5.1 Michiel Ronsse.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Presenter: Shao-Chieh Hou International Database Engineering & Application Symposium (IDEAS’05)
بسم الله الرحمن الرحيم MEMORY AND I/O.
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Aditya Dayal M. Tech, VLSI Design ITM University, Gwalior.
Computer Architecture Organization and Architecture
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA February 2005 Jim Callister Intel Corporation.
ECE354 Embedded Systems Introduction C Andras Moritz.
UNIT – Microcontroller.
Embedded Systems Design
Assembly Language for Intel-Based Computers, 5th Edition
Chapter 3 Top Level View of Computer Function and Interconnection
JTAG and Multi-ICE National Taiwan University
JTAG, Multi-ICE and Angel
William Stallings Computer Organization and Architecture
Presentation transcript:

Presenter : Shau-Jay Hou Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/12 EICE team TraceDo: An On-Chip Trace System for Real-Time Debug and Optimization in Multiprocessor SoC Xiao Hu, Pengyong Ma, Shuming Chen, Yang Guo, and Xing Fang School of Computer, National University of Defense Technology, Changsha, Hunan, P.R. of China, M. Guo et al. (Eds.): ISPA 2006, LNCS 4330, pp. 806 – 817, © Springer-Verlag Berlin Heidelberg 2006

Traditional debug techniques using breakpoints and single stepping are hard to meet the requirements of debug and optimization problems related with temporal behavioral of the real-time programs in multiprocessors. In this paper an on-chip trace system TraceDo (Trace for Debug and Optimization) of a multiprocessor SoC (YHFT-QDSP) is introduced to overcome the debug challenge. Several novel methods including LS encoder, branch configuration bits and configuration instructions, have been presented in TraceDo to trace the program paths, data access and events with timestamps from four Digital Signal Processor (DSP) cores of YHFT-QDSP efficiently. The results of benchmarks show that TraceDo with LS encoder can improve the compression ratio of trace information by 27% than the best reference result on average. When using branch configuration bits, this value goes to 64%. 2 Abstract

Debug and Optimization in multiprocessor SoC Traditional method Breakpoint and signal stepping Change behavior Easily damaged the mechanical part Logic analyzer Not useful in higher integration SoC Soft instruments and profiling Intrusive Consumes excessive system resources This paper propose method TraceDo(Trace for Debug and Optimization) On system YHFT-QDSP( 銀河飛騰 -32 位元微處理器 ) Use configuration to control Low area in chip 3 What’s the problem?

embedding debugging architecture for SoCs is introduced [9] Triggers, filters and timestamps are mentioned to support debug of multipleprocessors [12]. An onchip events monitor on system level is introduced in [17] CoreSight Frame of ARM core defines a multi-core debug and trace solution in [2][3][4] The IEEE-ISTO NEXUS 5001 STD defines the basic multiple core debug support for embedded processors and external tools [15][13] Framework of on-chip real time trace of multiprocessor [8] Graph 4 Related work

F Bit(follow bit) ◦ indicates if there is another byte followed in this message 5 TraceDo-overview

6 Trace Hardware

Path Trace  Program Flow Change Model  Branches and interrupts 。 Direct branches to constant address(BC) 。 Conditional direct branches to constant address(IBC) 。 Indirect branches to target address in register(BR) 。 Conditional indirect branches to target address in register(IBR) 。 Interrupt to interrupt service routine  Configuration bits 。 Enable Bits 。 Force Bits 。 Degrade Bits 7

 Messages 。 Long Chart Message 。 Short Chart Message 。 Indirect Branch Message 。 Synchronization Message 。 Interrupt Message 8

Data Trace and Event Trace  Data Trace 。 Data transfer by load/store 。 XOR compression 。 Synchronization message  Event Trace 。 Pipelines stall 。 Cache missing and DMA busy 。 Record the counter of taken times or valid cycles of stalls at a configurable interval 9

Trace Messages Combination and Timestamps  Two-level combination 。 First level :  Sort trace messages of three trace unit into trace FIFO 。 Second level :  Sort trace messages of FIFOs into trace port  Arbitrator 。 Control which message should be written into FIFO  FIFO 。 Eight input port 。 One output pot 。 Eight two-port registers 。 Overflow Synchronization message 10

Trace Port  Transfer massages package to Emulator 11

The configurable functions of trace are programmed through configuration registers.  Access by JTAG instructions  Also can access by CPU instrucions NOP_config  no-operation instruction  High instruction ratio  Use to communication with on-chip trace hardware 12

Use Verilog-HDL models Benchmark 13

Compression of path trace 14

Compression of path trace by use degrade bit 15

TraceDo is modular and scalable, and it record :  Program path  Data access  pipeline stall With several novel methods, TraceDo can reduce the quantity of trace messages effectively and support the tradeoff between precision and bandwidth by configuration. Future work  Degrade Bits selected manually in benchmarks will be selected by tools automatically 16