Presenter: Shao-Jay Hou
In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.
MPSoCs multi-threaded program Traditional debug method can’t be use Non-invasive method is a good way(on-chip emulation) immense amount of data that must be either stored on-chip or transferred off-chip in real-time trace of a 32-bit processor, 1 clock per instruction, 100 MHz 400 MB/s data Data need to be compressed
This Paper Compression algorithms[5] Combin e MTF and LZ [1] Combin e MTF and LZ [1] DMTF [17] DMTF [17] Multi-stage compression [11] Multi-stage compression [11] Lempel- Ziv(LZ) [18] Lempel- Ziv(LZ) [18] MCDS [12] ARM ETM[2] Trace compression schemes Compression methods Some example tools
Why? instructions consecutively until a branch is reached Branch target address How? Divided into two part 。 address 。 length Example:
Why? Branch will be taken or not taken Sequential locality How? similar to a cache 。 miss the first time a set of instructions is encountered 。 hit for every subsequent encounter that matches the prediction
Why? MTF 。 Increase the relevance Prefix 。 Assist for differential compression How? Input address and predicted address Differential compression
Why? Prefix byte compression Probability of prefix How? Huffman encoding
Why? The input for data form MTF/AE stage is 5bytes But the output to LZ stage is 1byte How? Use a little buffer to save
Why? The input data has high Repeatability How? Use LZ compression 。 Create a dictionary to save the repeat part 。 But don’t output the dictionary 。 While decompression, create a same dictionary Don’t output every cycle
Benchmark : Mibench CPU: Apple PowerMac G4 with a 1.25 GHz PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel Simulation software: ModelSim SE-64 v6.5c
Logic utilization Usage Scenario JTAG software fault 10 -3
This paper presented a parameterizable microarchitecture for address trace compression, suited to implementation on ASICs and modern FPGAs. Better compression ratio to others
The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer. The paper give a inspiration for future work for our tracer CPUGPU Bus B.T. P.T. T.M.