Download presentation
Presentation is loading. Please wait.
1
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine Joerg Henkel NEC C&C Research Princeton, New Jersey This work was supported by the National Science Foundation under grant # CCR-9876006, and by a Design Automation Conference graduate scholarship.
2
System-on-a-chip (SOC) Want to explore alternative cores, parameter settings, and applications Micro- processor CacheMemory Bridge Application1 Application2 SOC Peripheral1 Peripheral2 …. Core database Peripheral1 Peripheral2_aPeripheral2_b Gate/RT level simulation too slow
3
SOC System-level Power Estimation Microprocessor Tiwari/Malik/Wolfe 94 Instruction set simulator Marculescu/Pedram 96 Instruction trace reduction Micro- processor CacheMemory Bridge Application SOC: System-level model Peripheral Micro- processor CacheMemory Bridge Application SOC: Gate-level model Peripheral Still need system-level method for peripherals 3-step method Plus cache, memory & bus Simunic/Benini/DeMicheli 99 Extended instruct. simulator Givargis/Vahid/Henkel 99 Trace reductions Micro- processor CacheMemory Bridge
4
…. Core database Core Provider’s Step 1: Instruction- based System-Level Model Creation System simulation model already commonly used, and required in VSIA standard Executes ~1000x faster than gate-level model UART Reset() … Enable_tx() … Enable_rx() … Send() … Rcceive() … UART JPEG decode
5
Core Provider’s Step 2: Low-level Per-instruction Power Evaluation Measure power of gate/layout model, per instruction Use unique testbench per instruction, may take hours/days Low-level model differentiates cores from other SOC modules enabling accurate power estimation UART instruction 2 bytes4 bytes8 bytes16 bytes Reset 13 J 14 J Enable_tx 23 J25 J24 J Enable_rx 18 J19 J Send 76 J77 J89 J115 J Receive 44 J49 J55 J64 J Buffer size Instruction UART instruction Energy Reset 13 J Enable_tx 23 J Enable_rx 18 J Send 76 J Receive 44 J Must account for core parameters
6
Core Provider’s Step 3: Back Annotation of System Model JPEG decode …. Core database Energy Reset 13 J Enable_tx 23 J Enable_rx 18 J Send 76 J Receive 44 J Reset() … uJtot += 13 Enable_tx() … uJtot += 23 Enable_rx() … uJtot += 18 Send() … uJtot += 76 Rcceive() … uJtot += 44 UART
7
Core “Power Modes” Requires Extra Effort by Core Provider Unlike microprocessor, certain peripheral core instructions can greatly modify power consumption of other instructions Must create power mode transition function, and measure power per instruction per mode. 2 bytes4 bytes8 bytes16 bytes Mode 1: Idle Reset 11 J13 J14 J Enable_tx 27 J32 J31 J Enable_rx 17 J18 J19 J18 J Send 17 J19 J 20 J Receive 14 J15 J17 J18 J Mode 2 : Enabled Reset 13 J 14 J Enable_tx 23 J25 J24 J Enable_rx 18 J19 J Send 76 J77 J89 J115 J Receive 44 J49 J55 J64 J Mode1: Idle Mode2: Enabled Enable_tx or Enable_rx Reset
8
User Performs System Simulation, Which Yields Power Data Simulation takes only seconds or minutes Micro- processor CacheMemory Bridge Application SOC Peripheral UART JPEG decode …. Core database UART + Total energy
9
Results: Image-decode Accelerator Examined 3 peripheral cores: UART, DMA, JPEG Compared our instruction-based system-level method with: Gate-level simulation: slow but accurate “Databook” RT-level: cycle-accurate simulation, used databook average- power values 0 200 400 600 800 1000 1200 1400 1600 1800 2000 UARTDMAJPEG Energy (mJ) 113 519 1573 Gate-level: 40,980 sec “Databook” RT-level: 2,700 sec 155 717 1793 37% 38% 14% 115 493 1550 Instr.-based system-level: 14 sec 2% 5% 1%
10
Results: Importance of Power Modes Proper power-mode selection is critical for peripheral cores Too few modes or wrong modes can lead to much error Gate-level energy (mJ) System-level energy (mJ) Error Single- mode 113 8623.0% Two- modes 1048.6% Four- modes 1151.7% UART example
11
Conclusions Introduced instruction-based method is Accurate (less than 5% error) Fast (1000x speedup over gate-level) Fits with current core-based methodology Concept of power modes is necessary for accuracy Future work includes: Trace-simulator-based approach (10x speedup) Trace-analysis-based approach (100x speedup)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.