L6: Lower Power Architecture Design

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
CPU Review and Programming Models CT101 – Computing Systems.
Altera FLEX 10K technology in Real Time Application.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
NDG-L39Introduction to ASIC Design1 Design of a Simple Customizable Microprocessor * Chapter 7 and 15, “Digital System Design and Prototyping”  Pipelined.
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Data Manipulation Computer System consists of the following parts:
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
GCSE Computing - The CPU
The Processor Andreas Klappenecker CPSC321 Computer Architecture.
What’s on the Motherboard? The two main parts of the CPU are the control unit and the arithmetic logic unit. The control unit retrieves instructions from.
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
VADA Lab.SungKyunKwan Univ. 1 Lower Power Architecture Design 성균관대학교 조 준 동 교수
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
A Simple Computer consists of a Processor (CPU-Central Processing Unit), Memory, and I/O Memory Input Output Arithmetic Logic Unit Control Unit I/O Processor.
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
General Concepts of Computer Organization Overview of Microcomputer.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
TEAM FRONT END ECEN 4243 Digital Computer Design.
L38: Viterbi Decoder저전력 설계
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
The Central Processing Unit (CPU)
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
GCSE Computing - The CPU
Hiba Tariq School of Engineering
Low-power Digital Signal Processing for Mobile Phone chipsets
Central Processing Unit Architecture
3.3.3 Computer architectures
Advanced Topic: Alternative Architectures Chapter 9 Objectives
15-740/ Computer Architecture Lecture 7: Pipelining
Architecture & Organization 1
CDA 3101 Spring 2016 Introduction to Computer Organization
Digital Signal Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
Architecture & Organization 1
Morgan Kaufmann Publishers Computer Organization and Assembly Language
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
Computer Architecture
GCSE Computing - The CPU
Objectives Describe common CPU components and their function: ALU Arithmetic Logic Unit), CU (Control Unit), Cache Explain the function of the CPU as.
Presentation transcript:

L6: Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수 http://vada.skku.ac.kr SungKyunKwan Univ.

Through WAVE PIPELINING SungKyunKwan Univ.

Wave-pipelining on FPGA Pipeline의 문제점 Balanced partitioning Delay element overhead Tclk > Tmax - Tmin + clock skew + setup/hold time Area, Power, 전체 지연시간의 증가 Clock distribution problem Wavepipelining = high throughput w/o such overhead =Ideal pipelining SungKyunKwan Univ.

FPGA on WavePipeline LUT의 delay는 다양한 logic function에서도 비슷하다. FPGA element delay (wire, LUT, interconnection) Powerful layout editor Fast design cycle SungKyunKwan Univ.

WP advantages Area efficient - register, clock distribution network & clock buffer 필요 없음. Low power dissipation Higher throughput Low latency SungKyunKwan Univ.

Disadvantage Degraded performance in certain case Difficult to achieve sharp rise and fall time in synchronous design Layout is critical for balancing the delay Parameter variation - power supply and temperature dependence SungKyunKwan Univ.

Experimental Results By 이재형, SKKU SungKyunKwan Univ.

Observation WP multiplier는 delay를 조절하기 위한 LUTs의 추가가 많아서 전력소모 면에서 큰 이득은 보지 못했다. FPGA에서 delay를 조절하기 위해 LUTs나 net delay를 사용하지 않고 별도의 delay 소자를 사용하면 보다 효과적 또한, 동일한 level을 가지는 multiplier를 설계하면 WP 구현이 용이하고 pipeline 구조보다 전력소모나 면적에서 큰 이득을 얻을 수 있을 것이다. SungKyunKwan Univ.

VON NEUMANN VERSUS HARVARD SungKyunKwan Univ.

Power vs Area of Micro-coded Microprocessor 1.5V and 10MHz clock rate: instruction and data memory accesses account for 47% of the total power consumption. SungKyunKwan Univ.

Memory Architecture SungKyunKwan Univ.

Exploiting Locality for Low-Power Design A spatially local cluster: group of algorithm operations that are tightly connected to each other in the flow graph representation. Two nodes are tightly connected to each other on the flow graph representation if the shortest distance between them, in terms of number of edges traversed, is low. Power consumption (mW) in the maximally time-shared and fully-parallel versions of the QMF sub-band coder filter Improvement of a factor of 10.5 at the expense of a 20% increase in area The interconnect elements (buses, multiplexers, and buffers) consumes 43% and 28% of the total power in the time-shared and parallel versions. SungKyunKwan Univ.

Cascade filter layouts (a)Non-local implementation from Hyper (b)Local implementation from Hyper-LP SungKyunKwan Univ.

Frequency Multipliers and Dividers SungKyunKwan Univ.

Low Power DSP Instruction Buffer (또는 Cache) Decoded Instruction Buffer locality 이용 Program memory의 access를 줄인다. Decoded Instruction Buffer LOOP의 첫번째 iteration의 decoding결과를 RAM에 저장한 후 재사용 Fetch/Decoding 과정을 제거 30~40% Power Saving SungKyunKwan Univ.

Stage-Skip Pipeline SungKyunKwan Univ. The power savings is achieved by stopping the instruction fetch and decode stages of the processor during the loop execution except its first iteration. DIB = Decoded Instruction Buffer 40 % power savings using DSP or RISC processor. SungKyunKwan Univ.

Stage-Skip Pipeline SungKyunKwan Univ. Selector: selects the output from either the instruction decoder or DIB The decoded instruction signals for a loop are temporarily stored in the DIB and are reused in each iteration of the loop. The power wasted in the conventional pipeline is saved in our pipeline by stopping the instruction fetching and decoding for each loop execution. SungKyunKwan Univ.

Stage-Skip Pipeline SungKyunKwan Univ. Majority of execution cycles in signal processing programs are used for loop execution : 40% reduction in power with area increase 2%. SungKyunKwan Univ.

Two’s complement implementation of an accumulator SungKyunKwan Univ.

Sign magnitude implementation of an accumulator. SungKyunKwan Univ.

Number representation trade-off for arithmetic SungKyunKwan Univ.

Signal statistics for Sign Magnitude implementation of the accumulator datapath assuming random inputs. SungKyunKwan Univ.