High Throughput LDPC Decoders Using a Multiple Split-Row Method

Slides:



Advertisements
Similar presentations
Cyclic Code.
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Cost-Effective Pipeline FFT/IFFT VLSI Architecture for DVB-H System Present by: Yuan-Chu Yu Chin-Teng Lin and Yuan-Chu Yu Department of Electrical and.
Error Correction and LDPC decoding CMPE 691/491: DSP Hardware Implementation Tinoosh Mohsenin 1.
Inserting Turbo Code Technology into the DVB Satellite Broadcasting System Matthew Valenti Assistant Professor West Virginia University Morgantown, WV.
Houshmand Shirani-mehr 1,2, Tinoosh Mohsenin 3, Bevan Baas 1 1 VCL Computation Lab, ECE Department, UC Davis 2 Intel Corporation, Folsom, CA 3 University.
Improving BER Performance of LDPC Codes Based on Intermediate Decoding Results Esa Alghonaim, M. Adnan Landolsi, Aiman El-Maleh King Fahd University of.
Arbitrary Bit Generation and Correction Technique for Encoding QC-LDPC Codes with Dual-Diagonal Parity Structure Chanho Yoon, Eunyoung Choi, Minho Cheong.
Cooperative Multiple Input Multiple Output Communication in Wireless Sensor Network: An Error Correcting Code approach using LDPC Code Goutham Kumar Kandukuri.
11 1 The Next Generation Challenge for Software Defined Radio Mark Woh 1, Sangwon Seo 1, Hyunseok Lee 1, Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
Generalized Communication System: Error Control Coding Occurs In Right Column. 6.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 7 Low Density Parity Check Codes.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
An Efficient FPGA Implementation of IEEE e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku.
Introduction of Low Density Parity Check Codes Mong-kai Ku.
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multiprocessor Systems Zhiyi Yu, Bevan M. Baas VLSI Computation Lab, ECE department,
ISSCC 2008 Student Forum An 18 Gbps 2048-bit 10GBASE-T Ethernet LDPC Decoder Tinoosh Mohsenin Electrical & Computer Engineering, UC Davis
Part 1: Overview of Low Density Parity Check(LDPC) codes.
Low Density Parity Check codes
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Semi-Parallel Reconfigurable Architecture for Real-time LDPC decoding Karkooti, M.; Cavallaro, J.R.; Information Technology: Coding and Computing, 2004.
Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder Lei Yang, Hui Liu, C.-J Richard Shi Transactions.
Implementation of Turbo Code in TI TMS320C8x Hao Chen Instructor: Prof. Yu Hen Hu ECE734 Spring 2004.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Overview of MB-OFDM UWB Baseband Channel Codec for MB-OFDM UWB 2006/10/27 Speaker: 蔡佩玲.
Tinoosh Mohsenin 2, Houshmand Shirani-mehr 1, Bevan Baas 1 1 University of California, Davis 2 University of Maryland Baltimore County Low Power LDPC Decoder.
1 Aggregated Circulant Matrix Based LDPC Codes Yuming Zhu and Chaitali Chakrabarti Department of Electrical Engineering Arizona State.
Waseda University Low-Density Parity-Check Code: is an error correcting code which achieves information rates very close to the Shanon limit. Message-Passing.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
5G Wireless Technology.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Error Correction and LDPC decoding
Hamming Code In 1950s: invented by Richard Hamming
ESE532: System-on-a-Chip Architecture
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Low-power Digital Signal Processing for Mobile Phone chipsets
VLSI Architectures For Low-Density Parity-Check (LDPC) Decoders
Q. Wang [USTB], B. Rolfe [BCA]
Architecture & Organization 1
A Scalable Architecture for LDPC Decoding
Cache Memory Presentation I
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Rate 7/8 (1344,1176) LDPC code Date: Authors:
Progress report of LDPC codes
January 2004 Turbo Codes for IEEE n
Parallel and Multiprocessor Architectures
An Improved Split-Row Threshold Decoding Algorithm for LDPC Codes
Chapter 6.
Architecture & Organization 1
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Physical Layer Approach for n
Low-Density Parity-Check Codes (LDPC Codes)
On-line arithmetic for detection in digital communication receivers
Five Key Computer Components
Error Trapping on LFBSR
Low-Density Parity-Check Codes
UNIVERSITY OF MASSACHUSETTS Dept
DSP Architectures for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
Summary of HNS Partial Proposal for n Physical Layer
DSPs for Future Wireless Base-Stations
Presentation transcript:

High Throughput LDPC Decoders Using a Multiple Split-Row Method Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis

Outline Introduction to LDPC Codes and Decoders Multi-Split-Row Decoding Method Implementing Multi-Split-Row Decoders Conclusion

Error Correction in Communication Systems Error correction is widely used in communication systems

LDPC Codes Applications Standards Digital Video Broadcasting (DVB-S2): 2005 10 Gigabit Ethernet (10GBASE-T): 2006 Next generation of WiMAX Challenges with LDPC decoders High memory bandwidth requirement High interconnect complexity Many target applications are power and cost constrained

LDPC Decoding: Message Passing Algorithm α Performs row and column operations iteratively Example (9,5) LDPC Code Code length (N) = 9 Information length = 5 Row weight (Wr) = 3 Column weight (Wc) = 2 Row processing Column processing β ú û ù ê ë é 1 Row Processing Column Processing H =

Message Passing (Row processing ) ú û ù ê ë é = 1 H SPA: MinSum:

Message Passing (Column processing ) ú û ù ê ë é = 1 H Column Processing is the received information from the channel

Decoder Architectures Serial decoders Single row processor, column processor, shared memory Simple and small area Disadvantages Low throughput: 100 Kbps - 10 Mbps Semi-parallel decoders Multiple row and column processors, multiple memory banks Higher throughput Example: 2048-bit, rate-1/2, (3,6) programmable decoder [Mansour 2006] 14.3 mm2, 0.18 μm CMOS 125 MHz, 640 Mbps

Full Parallel Decoders Row and column processors are directly mapped according to the parity check matrix Highest throughput Major challenges Routing congestion due to extrinsic information passed between row and column processors Large delay, area, and power caused by long wires Example: 1024-bit, irregular code, 4 bits per symbol, [Blanksby 2002] 52.5 mm2, 0.16 μm CMOS 64 MHz, 1Gbit/sec 5x384x32 =61440 5x2048x6 Row 1 2 384 Col 3 2048 M N

Multi-Split-Row Decoding Method Outline Introduction to LDPC Codes Split-Row Decoder Algorithm Multi-Split-Row Decoding Method Implementing Multi-Split-Row Decoders Conclusion

Goals Very high throughputs Area efficient (small circuit area) Therefore more energy efficient Well suited for long-length LDPC codes Well suited for hardware implementations

The Multi-Split-Row Decoder Key ideas H matrix is split into multiple blocks Each block is processed almost independently Minimal information is shared between blocks Results Lower interconnect complexity Reduced processor complexity Hardware results Higher throughput Smaller decoder area and higher area utilization Slightly increased error rate

Standard vs. Multi-Split-Row Decoder

Multi-Split-Row Algorithm The magnitude portion of the row processor output α is larger for the Multi-Split-Row decoder Sign Magnitude By normalizing the α values with a scale factor S<1 the error performance of Multi-Split-Row decoder is improved S

Optimum Scale factor Multi-Split-2 Multi-Split-4 Bit Error Probability Bit Error Probability Scale Factor = 0.2 Scale Factor = 0.3 (2048,1723) RS-based LDPC code used by 10 Gbit Ethernet standard Row weight: 32 Column weight: 6 No. of iterations:15

Bit Error Rate Performance Comparison Code length: 2048 bits Message length: 1723 bits Row weight: 32 Column weight: 6 No. of iterations:15 SPA: Sum Product Algorithm [Mackay 1999] MinSum: [Fossorier 2002] WBF: Weighted Bit Flipping [Kou, Lin 2001] Improved WBF: [Fossorier 2004] BF: Bit Flipping [Gallager 1963] 0.35dB 0.25dB

Bit Error Rate Performance Comparison Code length: 5256 bits Message length: 4823 bits Row weight: 72 Column weight: 6 No. of iterations: 15 0.25 dB 0.3 dB

Optimum Scale Factors for Different Codes (N, K) (Wc,Wr) Optimum Scale Factor SP-2 SP-4 SP-6 SP-8 SP-12 (1536,770) (3,6) 0.45 - (1008,507) (4,8) 0.35 (1536,1155) (4,16) 0.4 0.25 (8088,6743) (4,24) 0.27 0.22 (2048,1723) (6,32) 0.3 0.2 * 0.15 (16352,14329) 0.16 (5248,4842) (5,64) (5256,4823) (6,72) 0.18 0.14 Multi-split row works best for: Regular codes High row-weight codes The optimum scale factor decreases as the partitioning of the H matrix increases

Implementing Multi-Split-Row Decoders Outline Introduction to LDPC Codes and Decoder Arch Multi-Split-Row Decoding Method Implementing Multi-Split-Row Decoders Conclusion

Sign-wire implementation

Full-Parallel Decoder Implementations Standard Multi-Split-Row-2 Multi-Split-Row-4 (2048,1723) RS-based (6,32) LDPC code

A Full-Parallel Decoder Implementation Number of sign-passing wires is negligible compared to the total number of wires. TotalNumofWires = 2bMWr + 2(Spn-1)M (2048,1723) LDPC code with N = 2048 M (number of rows) = 384 b (bits per symbol) = 5 Wr = 32 Total number of wires Sign passing wires Ratio sign/total wires Split-Row-2 123,648 768 0.6% Split-Row-4 125,184 2304 2.0%

Full Parallel Decoder Chips 0.18 µm CMOS Technology, 6M layer No. of input + output registers Number of column processors row processors Individual row processor area (μm2) Standard 2x2048 2048 384 31,411 Split-Row-2 768 2 x 13,014 = 26,028 Split-Row-4 1536 4 x 5897 = 23,588

Three Full Parallel MinSum Decoders Avg. wire length (mm) Chip size (mm2) Worst case speed (MHz) Decoding throughput (Gbps) Standard 0.32 139.1 10 1.4 Split-Row-2 0.20 75.8 16 2.2 Split-Row-4 0.11 43.9 52 7.1 Improvements for 2.9x 3.2x 5.1x (6,32) (2048,1723) RS-based LDPC code Resolution of 5 bits per message Throughputs calculated at 15 decoding iterations Results based on 0.18 µm CMOS, 1.8 V @ 85 C

Conclusion Multi-Split-Row decoder method provides a significant reduction in circuit area Results in: Reduced wire interconnect complexity Increased circuit area utilization Increased speed Simpler implementation A good tradeoff between hardware complexity and error performance

Acknowledgments Support Thanks Intel Corporation UC MICRO NSF Grant No. 0430090 NSF CAREER Award No. 0546907 UCD Faculty Research Grant Thanks Prof. Shu Lin Lan Lan Eric Work Zhiyi Yu