A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
Instruction Level Parallelism (ILP) Colin Stevens.
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
1 SODA: A Low-power Architecture For Software Radio Yuan Lin 1, Hyunseok Lee 1, Mark Woh 1, Yoav Harel 1, Scott Mahlke 1, Trevor.
Define Embedded Systems Small (?) Application Specific Computer Systems.
11 1 The Next Generation Challenge for Software Defined Radio Mark Woh 1, Sangwon Seo 1, Hyunseok Lee 1, Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
A Scalable Low-power Architecture For Software Radio
University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
University of Michigan Electrical Engineering and Computer Science 1 Streamroller: Automatic Synthesis of Prescribed Throughput Accelerator Pipelines Manjunath.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Techniques for Low Power Turbo Coding in Software Radio Joe Antoon Adam Barnett.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,
RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia.
DSP base-station comparisons. Second generation (2G) wireless 2 nd generation: digital: last decade: 1990’s Voice and low bit-rate data –~14.4 – 28.8.
EKT303/4 Superscalar vs Super-pipelined.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
University of Michigan Electrical Engineering and Computer Science 1 Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System Kevin Fan,
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
University of Michigan Electrical Engineering and Computer Science Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES Ganesh Dasika 1,
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Low-power Digital Signal Processing for Mobile Phone chipsets
Embedded Systems Design
Architecture & Organization 1
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Flow Path Model of Superscalars
Hyperthreading Technology
Anne Pratoomtong ECE734, Spring2002
Superscalar Processors & VLIW Processors
Architecture & Organization 1
Dynamically Reconfigurable Architectures: An Overview
STUDY AND IMPLEMENTATION
Coe818 Advanced Computer Architecture
The Vector-Thread Architecture
Mapping DSP algorithms to a general purpose out-of-order processor
DSPs for Future Wireless Base-Stations
Presentation transcript:

A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture Lab University of Michigan Sept. 2004

Introduction Growing need to support multiple wireless protocols Software defined radio: implementing DSP algorithms in software rather than hardware ASIC: high performance, low flexibility Processor: high flexibility, low performance Objective: achieve real time performance with processor flexibility and programmability

802.11b 11Mbps Hiperlan2 36Mbps UWB 200Mbps Performance Requirements

DSP Algorithms Characteristics Streaming data Short variable liveness High data throughput High data level parallelism Low control flow overhead Counted loops Low data-dependent branches

Proposed Coprocessor Architecture: MAPP Stream Data Macro pipeline architecture No cache structure High Data Level Parallelism Vector architecture Low Control Flow Overhead No branch predictors Programmability to support multiple protocols

MAPP Architectural Diagram PPU Data Cache VPP Controller ARM Core Instruction Cache Vector Processing Pipeline

PPU Architectural Diagram In Queue Out Queue Vector Register File Vector ALU Internal Instruction Buffer Data In Data Out VPP Controller VPP Controller Pipeline Processing Unit

Mapping DSP Algorithms: Viterbi ACS v v1 move{g} s’, v2 l l g e e g l g mask s’ s0bm0 s1 bm1 vadd v0, s0, bm0 vadd v1, s1, bm1 cmp v0, v1 move{le} s’, v1 bm1 bm0 S0 S1 mux S’2

Increase Area/Power Efficiency Data slice architecture Most DSP algorithms do not need 32-bit precision Viterbi decoding operates on 8 bits data Filters may need 16 bit precisions Partial processor execution Statically determined code Turn off architecture units not used Energy saving, no area saving

Vector Cluster Diagram (4x8 bit data slice) Register FileIn QueueOut Q.ALU Register FileIn QueueOut Q.ALU Register FileIn QueueOut Q.ALU Register FileIn QueueOut Q.ALU 4x4 Local Interconnect Network

Performance Results

Simplistic Power Analysis Based on ARM9 data in 0.13u Viterbi Decoder (K=7): 0.75W ~ 1W 64x4 8 bit ALU: ~240mW 12KB Mem: ~310mW Clock: ~200mW Others: ~250mW ASIC implementations: 7.65mW ~ 0.7W (with different throughputs)

Conclusion & Future Work Programmable coprocessor architecture Can support multiple protocols Achieves real-time computational requirements Reasonable power consumptions Future work Realistic power model simulation Implement complete protocols Algorithm behavior studies Shrink processor area