Techniques for Low Power Turbo Coding in Software Radio Joe Antoon Adam Barnett.

Slides:

Advertisements

Similar presentations

Control Unit Implemntation

Advertisements

Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.

Give qualifications of instructors: DAP

6.375 Project Arthur Chang Omid Salehi-Abari Sung Sik Woo May 11, 2011

Image Compression System Megan Fuller and Ezzeldin Hamed 1.

Maximum Likelihood Sequence Detection (MLSD) and the Viterbi Algorithm

Submission May, 2000 Doc: IEEE / 086 Steven Gray, Nokia Slide Brief Overview of Information Theory and Channel Coding Steven D. Gray 1.

CS 151 Digital Systems Design Lecture 37 Register Transfer Level

UNIVERSITY OF MASSACHUSETTS Dept

Turbo Codes – Decoding and Applications Bob Wall EE 548.

Turbo Codes Azmat Ali Pasha.

UNIVERSITY OF MASSACHUSETTS Dept

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst.

1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.

Improving the Performance of Turbo Codes by Repetition and Puncturing Youhan Kim March 4, 2005.

Chapter 6 Memory and Programmable Logic Devices

Computer Arithmetic Integers: signed / unsigned (can overflow) Fixed point (can overflow) Floating point (can overflow, underflow) (Boolean / Character)

Digital Communication Techniques

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.

RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696

Low-Power Wireless Sensor Networks

Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.

Efficient FPGA Implementation of QR

Digital Logic Design Lecture 3 Complements, Number Codes and Registers.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.

A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of.

Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 5 Turbo Code.

RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.

Introduction of Low Density Parity Check Codes Mong-kai Ku.

Computer Architecture Lecture 32 Fasih ur Rehman.

Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,

Real-Time Turbo Decoder Nasir Ahmed Mani Vaya Elec 434 Rice University.

1 Channel Coding (III) Channel Decoding. ECED of 15 Topics today u Viterbi decoding –trellis diagram –surviving path –ending the decoding u Soft.

Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

Multipe-Symbol Sphere Decoding for Space- Time Modulation Vincent Hag March 7 th 2005.

A simple rate ½ convolutional code encoder is shown below. The rectangular box represents one element of a serial shift register. The contents of the shift.

Minufiya University Faculty of Electronic Engineering Dep. of Electronic and Communication Eng. 4’th Year Information Theory and Coding Lecture on: Performance.

Muhammad Shoaib Bin Altaf. Outline Motivation Actual Flow Optimizations Approach Results Conclusion.

MICROPROGRAMMED CONTROL

PAPR Reduction Method for OFDM Systems without Side Information

1/30/ :20 PM1 Chapter 6 ─ Digital Data Communication Techniques CSE 3213 Fall 2011.

Log-Likelihood Algebra

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Sunpyo Hong, Hyesoon Kim

Coding No. 1  Seattle Pacific University Digital Coding Kevin Bolding Electrical Engineering Seattle Pacific University.

Overview of MB-OFDM UWB Baseband Channel Codec for MB-OFDM UWB 2006/10/27 Speaker: 蔡佩玲.

An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.

Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.

1 Code design: Computer search Low rate: Represent code by its generator matrix Find one representative for each equivalence class of codes Permutation.

Channel Coding and Error Control 1. Outline Introduction Linear Block Codes Cyclic Codes Cyclic Redundancy Check (CRC) Convolutional Codes Turbo Codes.

FEC decoding algorithm overview VLSI 자동설계연구실 정재헌.

Computer Organization and Architecture + Networks

UNIVERSITY OF MASSACHUSETTS Dept

Embedded Systems Design

An Efficient Software Radio Implementation of the UMTS Turbo Codec

Pipelined Architectures for High-Speed and Area-Efficient Viterbi Decoders Chen, Chao-Nan Chu, Hsi-Cheng.

January 2004 Turbo Codes for IEEE n

Error Correction Code (2)

Physical Layer Approach for n

Sridhar Rajagopal, Srikrishna Bhashyam,

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

COS 463: Wireless Networks Lecture 9 Kyle Jamieson

Presentation transcript:

Techniques for Low Power Turbo Coding in Software Radio Joe Antoon Adam Barnett

Software Defined Radio Single transmitter for many protocols Protocols completely specified in memory Implementation: – Microprocessors – Field programmable logic

Why Use Software Radio? Wireless protocols are constantly reinvented – 5 Wi-Fi protocols – 7 Bluetooth protocols – Proprietary mice and keyboard protocols – Mobile phone protocol alphabet soup Custom DSP logic for each protocol is costly

So Why Not Use Software Radio? Requires high performance processors Consumes more power Inefficient general fork Efficient application specific fork Inefficient Field-programmable fork

Turbo Coding Channel coding technique Throughput nears theoretical limit Great for bandwidth limited applications – CDMA2000 – WiMAX – NASA ‘s Messenger probe

Turbo Coding Considerations Presents a design trade-off Turbo coding is computationally expensive But it reduces cost in other areas – Bandwidth – Transmission power

Reducing Power in Turbo Decoders FPGA turbo decoders – Use dynamic reconfiguration General processor turbo decoders – Use a logarithmic number system

Generic Turbo Encoder Component Encoder Component Encoder Interleave p1 s p2 Data stream

q1 r q2 Generic Turbo Decoder Decoder Interleave

Decoder Design Options Multiple algorithms used to decode Maximum A-Posteriori (MAP) – Most accurate estimate possible – Complex computations required Soft-Output Viterbi Algorithm – Less accurate – Simpler calculations Decoder

FPGA Design Options Goal Make an adaptive decoder Decoder Received Data Parity Original sequence Tunable Parameter Low power, accuracy High power, accuracy

Component Encoder M blocks are 1-bit registers Memory provides encoder state MM Generator Function

Encoder State 00 Time GF

Viterbi’s Algorithm Determine most likely output Simulate encoder state given received values s0s0 s1s1 s2s2 r 0 p 0 r 1 p 1 r 2 p 2 d0d0 d1d1 d2d2 … Time

Viterbi’s Algorithm Write: Compute branch metric (likelihood) Traceback: Compute path metric, output data Update: Compute distance between paths Rank paths by path metric and choose best For N memory: – Must calculate 2 N-1 paths for each state

Adaptive SOVA SOVA: Inflexible path system scales poorly Adaptive SOVA: Heuristic – Limit to M paths max – Discard if path metric below threshold T – Discard all but top M paths when too many paths

Implementing in Hardware Branch Metric Unit Add Compare Select Survivor memory Control q r

Implementing in Hardware Controller – – Control memory – select paths Branch Metric Unit – Compute likelihood – Consider all possible “next” states Add, Compare, Select – Append path metric – Discard paths Survivor Memory – Store / discard path bits

Implementing in Hardware Add, Compare, Select Unit Present State Path Values Next State Path Values Compute, Compare Paths Branch Values > T Path Distance Threshold

Dynamic Reconfiguration Bit Error Rate (BER) – Changes with signal strength – Changes with number of paths used Change hardware at runtime – Weak signal: use many paths, save accuracy – Strong signal: use few paths, save power – Sample SNR every 250k bits, reconfigure

Dynamic Reconfiguration

Experimental Results K (Number of encoder bits) proportional to average speed, power

Experimental Results FPGA decoding has a much higher throughput Due to parallelism

Experimental Results ASOVA performs worse than commercial cores However, in other metrics it is much better – Power – Memory usage – Complexity

Future Work Use present reconfiguration means to design – Partial reconfiguration – Dynamic voltage scaling Compare to power efficient software methods

Power-Efficient Implementation of a Turbo Decoder in SDR System Turbo coding systems are created by using one of three general processor types – Fixed Point (FXP) Cheapest, simplest to implement, fastest – Floating Point (FLP) More precision than fixed point – Logarithmic Numbering System (LNS) Simplifies complex operations Complicates simple add/subtract operations

Logarithmic Numbering System X = {s, x = log(b)[|x|]} – S = sign bit, remaining bits used for number value Example – Let b = 2, – Then the decimal number 8 would be represented as log(2)[8] = 3 – Numbers are stored in computer memory in 2’s compliment form (3 = ) (sign bit = 0)

Why use Logarithmic System? Greatly simplifies multiplication, division, roots, and exponents – Multiplication simplifies to addition E.g. 8 * 4 = 32, LNS => = 5 (2^5 = 32) – Division simplifies to subtraction E.g. 8 / 4 = 2, LNS => 3 – 2 = 1 (2^1 = 2)

Why use Logarithmic System? Roots are done as right shifts – E.g. sqrt(16) = 4, LNS => 4 shifted right = 2 (2^2 = 4) Exponents are done as left shifts – E.g. 8^2 = 64, LNS => 3 shifted left = 6 (2^6 = 64)

So why not use LNS for all processors? Unfortunately addition and subtraction are greatly complicated in LNS. – Addition: log(b)[|x| + |y|] = x + log(b)[1 + b^z] – Subtraction: log(b)[|x| - |y|] = x + log(b)[1 - b^z] Where z = y – x Turbo coding/decoding is computationally intense, requiring more mults, divides, roots, and exps, than adds or subtracts

Turbo Decoder block diagram Use present reconfiguration means to design – Partial reconfiguration – Dynamic voltage scaling Compare to power efficient software methods Each bit decision requires a subtraction, table look up, and addition

Proposed new block diagram As difference between e^a and e^b becomes larger, error between value stored in lookup table vs. computation becomes negligible. For this simulation a difference of >5 was used

How it works For d > 5 New Mux (on right) ignores SRAM input and simply adds 0 to MAX result. d > 5, pre-Decoder circuitry disables the SRAM for power conservation.

Comparing the 3 simulations Comparisons were done between a 16-bit fixed point microcontroller, a 16-bit floating point processor, and a 20-bit LNS processor. 11-bits would be sufficient for FXP and FLP, but 16-bit processors are much more common Similarly 17-bits would suffice for LNS processor, but 20-bit is common type

Power Consumption

Latency Recall: Max*(a,b) = ln(e^a+e^b)

Power savings Pre-Decoder circuitry adds 11.4% power consumption compared to SRAM read. So when an SRAM read is required, we use 111.4% of the power compared to the unmodified system However, when SRAM is blocked we only use 11.4% of the power we used before.

Power savings The CACTI simulations for the system reported that the Max* operation accounted for 40% of all operations in the decoder The Max* operations for the modified system required 69% of the power when compared to the unmodified system. This leads to an overall power savings of 69% * 40% = 27.6%

Conclusion Turbo codes are computationally intense, requiring more complex operations than simple ones LNS processors simplify complex operations at the expense of making adding and subtracting more difficult

Conclusion Using a LNS processor with slight modifications can reduce power consumption by 27.6% Overall latency is also reduced due to ease of complex operations in LNS processor when compared to FXP or FLP processors.