Download presentation
Presentation is loading. Please wait.
Published byEverett Harris Modified over 9 years ago
1
Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong
2
Motivation & Goals Instruction fetches and decodes are the major on-chip power consumers Optimize the power consumption by reducing instruction fetches and decodes Simulate the DFC architecture using simplescalar To test the performance of DFC
3
Prediction Mechanism Each sector in DFC has the following fields. (tag, sector_valid, next_address) If A is not equal to C, a different control path will be taken tag(A) != tag(C) (1) A and B are consecutively accessed. If they belonged to a small loop tag(A) == tag(B) (2) Based on (1) and (2), the prediction for next fetch : tag(C) == tag(B) (3)
4
Working Process
5
The Platform Host computer: ACPI x86-based PC Host computer operating system: Microsoft Windows Vista Ultimate Virtual Machine: VMware Workstation version 6.03 Linux operating system: Fedora Core 6 Simulator: SimpleScalar version 3.0
6
Work have done so far… Setup the platform Reading the source code of SimpleScalar Apply my DFC structure and working process to SimpleScalar Find benchmarks and compile in the platform Do simulation using given memory hierarchy parameters
7
MiBench dijkstra: it constructs a large graph in an adjacency matrix representation and then calculates the shortest path between every pair of nodes using repeated applications of Dijkstra’s algorithm. stringsearch: it searches for given words in phrases using a case insensitive comparison algorithm. rijndael encrypt/decrypt: it was selected as the National Institute of Standards and Technologies Advanced Encryption Standard (AES). CRC32: This benchmark performs a 32-bit Cyclic Redundancy Check (CRC) on a file. CRC checks are often used to detect errors in data transmission.
8
Memory hierarchy parameters ParameterValue Instr. size4B DFCdirect-mapped, 32 secotors, 4 decoded instr. per sector, 8B per decoded instr. L1 I-cache16KB, 2-way, 32B line, 1 cycle hit latency L1 D-cache8KB, 2-way, 32B line, 1-cycle hit latency Memory30-cycle latency
9
Simulation results % reduction in instruction fetches and decodes
10
Simulation results Prediction hit rate
11
Simulation results dijkstrastringsearchrijndaelCRC32 sim_num_insn2556203044437612391487315533385529 il1.accesses435089181605417236160209972328 il1.hits433995001568976228694324971600 il1.misses109418364417465885728 il1.miss_rate0.00250.02270.03160.0007 dfc.accesses2157401653269067232531480532674172 dfc.hits2121113862832195155327106532413201 dfc.misses362877943687277204374260971 dfc.miss_rate0.01680.13360.33200.0005
12
Conclusion The DFC stores decoded instructions and can be very small and energy-efficient. Use of the DFC eliminates both the access to a much larger instruction cache and the entire decoding step. From the simulation results, we can see that most instruction fetch and decode can be eliminated by using DFC. Therefore, it is a very efficient way to optimize the power consumption of embedded processors.
13
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.