Www.eecs.umich.edu/~sdrg 1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.

Slides:



Advertisements
Similar presentations
Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.
Advertisements

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Electronics’2004, Sozopol, September 23 Design of Mixed Signal Circuits and Systems for Wireless Applications V. LANTSOV, Vladimir State University
1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian.
Embedded Systems Programming
11 University of Michigan Electrical Engineering and Computer Science Exploring the Design Space of LUT-based Transparent Accelerators Sami Yehia *, Nathan.
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
1 SODA: A Low-power Architecture For Software Radio Yuan Lin 1, Hyunseok Lee 1, Mark Woh 1, Yoav Harel 1, Scott Mahlke 1, Trevor.
A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science High Performance.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
11 1 The Next Generation Challenge for Software Defined Radio Mark Woh 1, Sangwon Seo 1, Hyunseok Lee 1, Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
A Scalable Low-power Architecture For Software Radio
University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,
11 1 SPEX: A Programming Language for Software Defined Radio Yuan Lin, Robert Mullenix, Mark Woh, Scott Mahlke, Trevor Mudge, Alastair Reid 1, and Krisztián.
University of Michigan Electrical Engineering and Computer Science 1 Streamroller: Automatic Synthesis of Prescribed Throughput Accelerator Pipelines Manjunath.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
A Compact and Efficient FPGA Implementation of DES Algorithm Saqib, N.A et al. In:International Conference on Reconfigurable Computing and FPGAs, Sept.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
University of Michigan Electrical Engineering and Computer Science 1 Integrating Post-programmability Into the High-level Synthesis Equation* Scott Mahlke.
Techniques for Low Power Turbo Coding in Software Radio Joe Antoon Adam Barnett.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
11 1 AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh 1, Sangwon Seo 1, Scott Mahlke 1,Trevor Mudge 1, Chaitali Chakrabarti 2, Krisztian Flautner.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Introduction of Low Density Parity Check Codes Mong-kai Ku.
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Design and Implementation of Turbo Decoder for 4G standards IEEE e and LTE Syed Z. Gilani.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Chair MPSoC MPSoC Programming Solution “ CoreManager” hardware unit for:  Dependency checking  Task scheduling  Local memory management of PEs  C programmable.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath.
1 Aggregated Circulant Matrix Based LDPC Codes Yuming Zhu and Chaitali Chakrabarti Department of Electrical Engineering Arizona State.
Waseda University Low-Density Parity-Check Code: is an error correcting code which achieves information rates very close to the Shanon limit. Message-Passing.
System on a Programmable Chip (System on a Reprogrammable Chip)
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Backprojection Project Update January 2002
Cache Memory Presentation I
January 2004 Turbo Codes for IEEE n
Anne Pratoomtong ECE734, Spring2002
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
High Throughput LDPC Decoders Using a Multiple Split-Row Method
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
CS 286 Computer Organization and Architecture
DSPs in emerging wireless systems
Mapping DSP algorithms to a general purpose out-of-order processor
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
Presentation transcript:

1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali Chakrabarti 2, Alastair Reid 3, Krisztian Flautner 3 1 Advanced Computer Architecture Lab, University of Michigan 2 Department of Electrical Engineering, Arizona State University 3 ARM, Ltd.

2 Advantages of Software Defined Radio Multi-mode operations Lower costs –Faster time to market –Prototyping and bug fixes –Chip volumes –Longevity of platforms Protocol complexity favors software dominated solutions Enables future wireless communication innovations –Cognitive radio

3 SDR Design Objectives for W-CDMA Programmable processor –Same hardware should support Turbo decoder as well as other DSP algorithms Throughput requirements –2Mbps Power constraints –100mW ~ 500mW

4 SODA: DSP Processor for SDR

5 SODA PE SIMD Pipeline

6 SODA PE SIMD Shuffle Network

7 SODA PE Scalar Pipeline

8 Turbo Decoder on SODA Most computationally intensive algorithm in W-CDMA Hardest algorithm to parallelize Implementation outline –MaxLogMAP trellis computation with SIMD operations –Parallelizing trellis computations through sliding window –Interleaver implementation

9 Trellis Computation on SODA Two types of trellis diagram configurations –Blue edges: (0-branch), Red edges: (1-branch) Mapping trellis of size S onto SODA of SIMD size T

10 Forward Trellis on SODA (S = T) Misaligned SIMD operation

11 Handling SIMD Misalignment

12 Sliding Window on SODA Problem: –W-CDMA uses K=4, 8 wide trellis –SODA has 32-wide SIMD Solution: –parallelize trellis computation by implementing sliding window fully utilize SIMD width achieving higher-throughput in the process

13 Sliding Window Parallelization

14 Sliding Window on SODA (S < T)

15 Turbo Decoder System Operations

16 SODA DMA Modifications Traditional DMA controller –Designed for block data transfer –1 source and 1 destination address per block Modified DMA controller –Adding data interleaving functionality to DMA –Needs to handle scalar data transfers –1 source and 1 destination address per scalar

17 Achieved Performance on SODA SODA operates at 400MHz Can achieve 2.08Mbps with I = 5 Average number of cycles for one trellis block dummy calculation size of one trellis block 1 bit of Alpha, Beta and LLC computation data memory access Number of sliding windows processed in parallel 1 bit of Alpha, Beta and LLC computation Overall Turbo decoder throughput SODA operation frequency Number of Turbo iterations Cycles for 1bit trellis computaion = T block /L Extrinsic scaling

18 Conclusion & Future Work Implementation summary –SODA consumes <100mW in 90nm –Meets W-CDMA throughput requirements –Hardware features wide SIMD execution SIMD permutation network smart DMA Beyond 3G –Support for higher throughput 3G+ protocols Multi-processor SODA for Turbo decoder –LDPC decoding

19 Questions?