Download presentation
Presentation is loading. Please wait.
1
Presenter: Shao-Jay Hou
2
In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.
3
MPSoCs multi-threaded program Traditional debug method can’t be use Non-invasive method is a good way(on-chip emulation) immense amount of data that must be either stored on-chip or transferred off-chip in real-time trace of a 32-bit processor, 1 clock per instruction, 100 MHz 400 MB/s data Data need to be compressed
4
This Paper Compression algorithms[5] Combin e MTF and LZ [1] Combin e MTF and LZ [1] DMTF [17] DMTF [17] Multi-stage compression [11] Multi-stage compression [11] Lempel- Ziv(LZ) [18] Lempel- Ziv(LZ) [18] MCDS [12] ARM ETM[2] Trace compression schemes Compression methods Some example tools
7
Why? instructions consecutively until a branch is reached Branch target address How? Divided into two part 。 address 。 length Example:
9
Why? Branch will be taken or not taken Sequential locality How? similar to a cache 。 miss the first time a set of instructions is encountered 。 hit for every subsequent encounter that matches the prediction
11
Why? MTF 。 Increase the relevance Prefix 。 Assist for differential compression How? Input address and predicted address Differential compression
13
Why? Prefix byte compression Probability of prefix How? Huffman encoding
15
Why? The input for data form MTF/AE stage is 5bytes But the output to LZ stage is 1byte How? Use a little buffer to save
17
Why? The input data has high Repeatability How? Use LZ compression 。 Create a dictionary to save the repeat part 。 But don’t output the dictionary 。 While decompression, create a same dictionary Don’t output every cycle
18
Benchmark : Mibench CPU: Apple PowerMac G4 with a 1.25 GHz PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel 2.6.32-24. Simulation software: ModelSim SE-64 v6.5c
19
Logic utilization Usage Scenario JTAG software fault 10 -3
20
This paper presented a parameterizable microarchitecture for address trace compression, suited to implementation on ASICs and modern FPGAs. Better compression ratio to others
21
The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer. The paper give a inspiration for future work for our tracer CPUGPU Bus B.T. P.T. T.M.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.