System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
The Design Process Outline Goal Reading Design Domain Design Flow
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
System Partitioning Kris Kuchcinski
Figure 1.1 Interaction between applications and the operating system.
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.
1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
EECE **** Embedded System Design
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Extreme Makeover for EDA Industry
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 H ardware D escription L anguages Modeling Digital Systems.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
SystemC and Levels of System Abstraction: Part I.
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
CprE 588 Embedded Computer Systems Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #5 – System-Level.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Martin Kruliš by Martin Kruliš (v1.1)1.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
Sunpyo Hong, Hyesoon Kim
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
System-on-Chip Design
ASIC Design Methodology
Software Architecture
Improving cache performance of MPEG video codec
Design Flow System Level
Introduction to cosynthesis Rabi Mahapatra CSCE617
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Architectural-Level Synthesis
HIGH LEVEL SYNTHESIS.
Research: Past, Present and Future
Software Architecture
Presentation transcript:

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung

Power Estimation & Analysis ; power calculation needs three models ; architecture, component, and activity clock & power network Lower-level specification Architecture ; component allocation Scheduling operations

Estimation vs. Analysis Analysis ; –for a given structure, i.e., netlist of components Estimation (=design prediction followed by analysis) ; –when the information on the structure of the design is incomplete –Used to explore different design alternatives, and find the best Example ; to estimate the interconnect power, one needs a floorplan prediction with clock and power network –In exploring the alternatives, often times, maintaining relative order between the prediction and actual implementation is enough.

System-level power analysis System-level design Process ; –1) allocation of components –2) partitioning system’s task onto these components (or, sub-systems) –3) organizing cooperation among components bound System-level design Inputs ; –Specification ; E.g., CDFG… –Environmental constraints ; E.g., performance, power, cost, form factors, TTM, number/load of I/O’s –Design space restriction ; E.g., enforced using some cores, available chip area, bus structure, etc.

Implementation model Should be used when execution model is not available, typically using spread sheet Usually start with a platform ; HW- and SW-platform Basically three components ; –COTS (off-the-shelf components); maybe only a single figure available from vendors such as for a processor Guess based on experience, know-how –Customer-specific module; Needs estimation based on prediction on number of gates, activity factor, and technology scaling factor Power consumption of this module may be insignificant, but its use can replace the power-hungry processor. –Communication power; Data transfer between blocks Clock power, cross-coupling What was ignored; software structure, data

Execution model Typically given as a program in C, HDL, SystemC, or some heterogeneous combination of these Allows more detailed power analysis as the dynamic system behavior is simulated ; –component power model, –system architecture, and –component activation pattern needed For example, BFM (bus functional model) and the activity information for each processor components such as issue queue, branch prediction unit, execution unit, cache, register file are needed

Memory model DTSE work by Catthor –Assume that memory is the dominant power consuming part in signal processing applications –Memory optimization in terms of power should be dome first –Objective; increase data locality Suppress memory access Optimize memory hierarchy –By doing Perform global loop and control flow transformations Data reuse analysis Storage cycle distribution Memory allocation and assignment In-place optimization

Memory model Memory chip ; power model is available in the data sheet Compiled memory core ; –Power model should be parameterized, at least, in terms of size. For that, simulation model is needed. But due to flat hierarchy simulation model of memory takes too long time. –Therefore, abstraction model is needed. Capacitance model is difficult to get as it reveals critical information of the memory vendor. -> Functional models not disclosing any internal cell structure is okay.

Other things to include in the execution model Interconnect power model –Input ; physical layout and material properties –Built based on measurement and simulation –However, on-chip interconnect is difficult to model, especially when complex bus encoding is used. Models for power management policy –Hardware for DPM (dynamic power mgmt) –Software –RTOS

Algorithm-Level Power Estimation in Orinoco Activity estimation ; –Code instrumentation ; inserts protocol statements to capture the activity during execution Architecture estimation ; –High-level synthesis ; Scheduling Allocation Binding –Physical Planning Floorplanning Clock tree generation

Algorithmic-level power estimation and analysis Algorithmic-level design –Objective; optimize in terms of performance, cost and power –Means; Selection of algorithm performing the requested function Optimization of the algorithm Partitioning the algorithm into HW and SW

Algorithm selection ; selecting the most power- efficient one –Comparison is based on the most power-efficient realization without actual implementation. Optimization ; –Reducing # of control statements, e.g., by loop unrolling, local statement reordering, memory access reordering –Floating-point for SW vs. fixed-point arithmetic for HW Partitioning ; –Trade-off analysis between HW and SW implementations –SW-to-HW transformation ; moving the computational kernels of the algorithm to power-optimized application- specific hardware No need for consecutive control steps to perform a single instruction No need for memory access to find out what to do next Minimal datapath just for performing the given task Maximal concurrency exploitable compared to processor core