Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering.

Slides:



Advertisements
Similar presentations
1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Digitally-Bypassed Transducers: Interfacing Digital Mockups to Real-Time Medical Equipment Scott Sirowy*, Tony Givargis and Frank Vahid* This work was.
I/O Organization popo.
Experiments with the Peripheral Virtual Component Interface Roman L. Lysecky, Frank Vahid*, Tony D. Givargis Dept. of Computer Science & Engineering University.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
1 A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
DUSD(Labs) EE249 Project: High-Level Power Estimation in Metropolis Mentor: John Moondanos, GSRC Visiting Fellow, UC Berkeley & Strategic CAD Labs Intel.
Roman LyseckyUniversity of California, Riverside1 Techniques for Reducing Read Latency of Core Bus Wrappers Roman L. Lysecky, Frank Vahid, & Tony D. Givargis.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
Parameterized Systems-on-a-Chip Frank Vahid Tony Givargis, Roman Lysecky, Leslie Tauro, Susan Cotterell Department of Computer Science and Engineering.
A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
Dynamic Loop Caching Meets Preloaded Loop Caching – A Hybrid Approach Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Tony GivargisUniversity of California, Riverside & NEC USA1 Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design Tony D. Givargis.
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
Synthesis of Customized Loop Caches for Core-Based Embedded Systems Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
A One-Shot Configurable- Cache Tuner for Improved Energy and Performance Ann Gordon-Ross 1, Pablo Viana 2, Frank Vahid 1, Walid Najjar 1, and Edna Barros.
Just-in-Time Compilation for FPGA Processor Cores This work was supported in part by the National Science Foundation (CNS ) and by the Semiconductor.
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Frank Vahid, UC Riverside 1 Self-Improving Configurable IC Platforms Frank Vahid Associate Professor Dept. of Computer Science and Engineering University.
Propagating Constants Past Software to Hardware Peripherals Frank Vahid*, Rilesh Patel and Greg Stitt Dept. of Computer Science and Engineering University.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Does Student Crowdsourcing of Practice Questions and Animations Lead to Good Quality Materials? Alex Edgcomb, Joshua Yuen, and Frank Vahid University of.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Higher Computing Computer structure. What we need to know! Detailed description of the purpose of the ALU and control unitDetailed description of the.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
Extreme Makeover for EDA Industry
Microprocessor Dr. Rabie A. Ramadan Al-Azhar University Lecture 2.
Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept. of Computer Science and Engineering.
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
Power Estimation and Optimization for SoC Design
Embedded Network Interface (ENI). What is ENI? Embedded Network Interface Originally called DPO (Digital Product Option) card Printer without network.
An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna.
Minimum Effort Design Space Subsetting for Configurable Caches + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work.
PowerMixer IP : IP-Level Power Modeling for Processors Shan-Chien Fang 1 Jia-Lu Liao 2 Chen-Wei Hsu 2 Chia-Chien Weng 2 Shi-Yu Huang 2 Wen-Tsan Hsieh 3.
Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
Ee314 Microprocessor Systems Dr. Mircea DABACAN Electrical Engineering & Computer Science Dept., Washington State University Office: EE/ME 504 Phone:
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Using Uncacheable Memory to Improve Unity Linux Performance
Chapter 11 System-Level Verification Issues. The Importance of Verification Verifying at the system level is the last opportunity to find errors before.
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
1 Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering.
Techniques for Reducing Read Latency of Core Bus Wrappers
Ann Gordon-Ross and Frank Vahid*
A High Performance SoC: PkunityTM
A Self-Tuning Configurable Cache
Portable SystemC-on-a-Chip
Automatic Tuning of Two-Level Caches to Embedded Applications
Presentation transcript:

Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine Joerg Henkel NEC C&C Research Princeton, New Jersey This work was supported by the National Science Foundation under grant # CCR , and by a Design Automation Conference graduate scholarship.

System-on-a-chip (SOC) Want to explore alternative cores, parameter settings, and applications Micro- processor CacheMemory Bridge Application1 Application2 SOC Peripheral1 Peripheral2 …. Core database Peripheral1 Peripheral2_aPeripheral2_b Gate/RT level simulation too slow

SOC System-level Power Estimation Microprocessor Tiwari/Malik/Wolfe 94 Instruction set simulator Marculescu/Pedram 96 Instruction trace reduction Micro- processor CacheMemory Bridge Application SOC: System-level model Peripheral Micro- processor CacheMemory Bridge Application SOC: Gate-level model Peripheral Still need system-level method for peripherals 3-step method Plus cache, memory & bus Simunic/Benini/DeMicheli 99 Extended instruct. simulator Givargis/Vahid/Henkel 99 Trace reductions Micro- processor CacheMemory Bridge

…. Core database Core Provider’s Step 1: Instruction- based System-Level Model Creation System simulation model already commonly used, and required in VSIA standard Executes ~1000x faster than gate-level model UART Reset() … Enable_tx() … Enable_rx() … Send() … Rcceive() … UART JPEG decode

Core Provider’s Step 2: Low-level Per-instruction Power Evaluation Measure power of gate/layout model, per instruction Use unique testbench per instruction, may take hours/days Low-level model differentiates cores from other SOC modules enabling accurate power estimation UART instruction 2 bytes4 bytes8 bytes16 bytes Reset 13  J 14  J Enable_tx 23  J25  J24  J Enable_rx 18  J19  J Send 76  J77  J89  J115  J Receive 44  J49  J55  J64  J Buffer size Instruction UART instruction Energy Reset 13  J Enable_tx 23  J Enable_rx 18  J Send 76  J Receive 44  J Must account for core parameters

Core Provider’s Step 3: Back Annotation of System Model JPEG decode …. Core database Energy Reset 13  J Enable_tx 23  J Enable_rx 18  J Send 76  J Receive 44  J Reset() … uJtot += 13 Enable_tx() … uJtot += 23 Enable_rx() … uJtot += 18 Send() … uJtot += 76 Rcceive() … uJtot += 44 UART

Core “Power Modes” Requires Extra Effort by Core Provider Unlike microprocessor, certain peripheral core instructions can greatly modify power consumption of other instructions Must create power mode transition function, and measure power per instruction per mode. 2 bytes4 bytes8 bytes16 bytes Mode 1: Idle Reset 11  J13  J14  J Enable_tx 27  J32  J31  J Enable_rx 17  J18  J19  J18  J Send 17  J19  J 20  J Receive 14  J15  J17  J18  J Mode 2 : Enabled Reset 13  J 14  J Enable_tx 23  J25  J24  J Enable_rx 18  J19  J Send 76  J77  J89  J115  J Receive 44  J49  J55  J64  J Mode1: Idle Mode2: Enabled Enable_tx or Enable_rx Reset

User Performs System Simulation, Which Yields Power Data Simulation takes only seconds or minutes Micro- processor CacheMemory Bridge Application SOC Peripheral UART JPEG decode …. Core database UART + Total energy

Results: Image-decode Accelerator Examined 3 peripheral cores: UART, DMA, JPEG Compared our instruction-based system-level method with: Gate-level simulation: slow but accurate “Databook” RT-level: cycle-accurate simulation, used databook average- power values UARTDMAJPEG Energy (mJ) Gate-level: 40,980 sec “Databook” RT-level: 2,700 sec % 38% 14% Instr.-based system-level: 14 sec 2% 5% 1%

Results: Importance of Power Modes Proper power-mode selection is critical for peripheral cores Too few modes or wrong modes can lead to much error Gate-level energy (mJ) System-level energy (mJ) Error Single- mode % Two- modes % Four- modes % UART example

Conclusions Introduced instruction-based method is Accurate (less than 5% error) Fast (1000x speedup over gate-level) Fits with current core-based methodology Concept of power modes is necessary for accuracy Future work includes: Trace-simulator-based approach (10x speedup) Trace-analysis-based approach (100x speedup)