Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong.

Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Motivation & Goals Instruction fetches and decodes are the major on-chip power consumers Optimize the power consumption by reducing instruction fetches and decodes Simulate the DFC architecture using simplescalar To test the performance of DFC

Prediction Mechanism Each sector in DFC has the following fields. (tag, sector_valid, next_address) If A is not equal to C, a different control path will be taken tag(A) != tag(C) (1) A and B are consecutively accessed. If they belonged to a small loop tag(A) == tag(B) (2) Based on (1) and (2), the prediction for next fetch : tag(C) == tag(B) (3)

Working Process

The Platform Host computer: ACPI x86-based PC Host computer operating system: Microsoft Windows Vista Ultimate Virtual Machine: VMware Workstation version 6.03 Linux operating system: Fedora Core 6 Simulator: SimpleScalar version 3.0

Work have done so far… Setup the platform Reading the source code of SimpleScalar Apply my DFC structure and working process to SimpleScalar Find benchmarks and compile in the platform Do simulation using given memory hierarchy parameters

MiBench dijkstra: it constructs a large graph in an adjacency matrix representation and then calculates the shortest path between every pair of nodes using repeated applications of Dijkstra’s algorithm. stringsearch: it searches for given words in phrases using a case insensitive comparison algorithm. rijndael encrypt/decrypt: it was selected as the National Institute of Standards and Technologies Advanced Encryption Standard (AES). CRC32: This benchmark performs a 32-bit Cyclic Redundancy Check (CRC) on a file. CRC checks are often used to detect errors in data transmission.

Memory hierarchy parameters ParameterValue Instr. size4B DFCdirect-mapped, 32 secotors, 4 decoded instr. per sector, 8B per decoded instr. L1 I-cache16KB, 2-way, 32B line, 1 cycle hit latency L1 D-cache8KB, 2-way, 32B line, 1-cycle hit latency Memory30-cycle latency

Simulation results % reduction in instruction fetches and decodes

Simulation results Prediction hit rate

Simulation results dijkstrastringsearchrijndaelCRC32 sim_num_insn2556203044437612391487315533385529 il1.accesses435089181605417236160209972328 il1.hits433995001568976228694324971600 il1.misses109418364417465885728 il1.miss_rate0.00250.02270.03160.0007 dfc.accesses2157401653269067232531480532674172 dfc.hits2121113862832195155327106532413201 dfc.misses362877943687277204374260971 dfc.miss_rate0.01680.13360.33200.0005

Conclusion The DFC stores decoded instructions and can be very small and energy-efficient. Use of the DFC eliminates both the access to a much larger instruction cache and the entire decoding step. From the simulation results, we can see that most instruction fetch and decode can be eliminated by using DFC. Therefore, it is a very efficient way to optimize the power consumption of embedded processors.

Thank you!

Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong.

Similar presentations

Presentation on theme: "Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong.

Similar presentations

Presentation on theme: "Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong."— Presentation transcript:

Similar presentations

About project

Feedback