An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku.

Slides:



Advertisements
Similar presentations
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Advertisements

Cyclic Code.
Error Control Code.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Data and Computer Communications Tenth Edition by William Stallings Data and Computer Communications, Tenth Edition by William Stallings, (c) Pearson Education.
Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design.
Houshmand Shirani-mehr 1,2, Tinoosh Mohsenin 3, Bevan Baas 1 1 VCL Computation Lab, ECE Department, UC Davis 2 Intel Corporation, Folsom, CA 3 University.
Data and Computer Communications
Improving BER Performance of LDPC Codes Based on Intermediate Decoding Results Esa Alghonaim, M. Adnan Landolsi, Aiman El-Maleh King Fahd University of.
1 Channel Coding in IEEE802.16e Student: Po-Sheng Wu Advisor: David W. Lin.
Arbitrary Bit Generation and Correction Technique for Encoding QC-LDPC Codes with Dual-Diagonal Parity Structure Chanho Yoon, Eunyoung Choi, Minho Cheong.
Submission May, 2000 Doc: IEEE / 086 Steven Gray, Nokia Slide Brief Overview of Information Theory and Channel Coding Steven D. Gray 1.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
The Design of Improved Dynamic AES and Hardware Implementation Using FPGA 游精允.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Distributed Arithmetic: Implementations and Applications
Generalized Communication System: Error Control Coding Occurs In Right Column. 6.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Low Density Parity Check (LDPC) Code Implementation Matthew Pregara & Zachary Saigh Advisors: Dr. In Soo Ahn & Dr. Yufeng Lu Dept. of Electrical and Computer.
Memory and Programmable Logic
USING THE MATLAB COMMUNICATIONS TOOLBOX TO LOOK AT CYCLIC CODING Wm. Hugh Blanton East Tennessee State University
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Block-LDPC: A Practical LDPC Coding System Design Approach
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
Registers CPE 49 RMUTI KOTAT.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
CODING/DECODING CONCEPTS AND BLOCK CODING. ERROR DETECTION CORRECTION Increase signal power Decrease signal power Reduce Diversity Retransmission Forward.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
COEN 180 Erasure Correcting, Error Detecting, and Error Correcting Codes.
MIMO continued and Error Correction Code. 2 by 2 MIMO Now consider we have two transmitting antennas and two receiving antennas. A simple scheme called.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Basic Characteristics of Block Codes
Introduction of Low Density Parity Check Codes Mong-kai Ku.
§6 Linear Codes § 6.1 Classification of error control system § 6.2 Channel coding conception § 6.3 The generator and parity-check matrices § 6.5 Hamming.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
ADVANTAGE of GENERATOR MATRIX:
Information Theory Linear Block Codes Jalal Al Roumy.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Error Detection and Correction – Hamming Code
Semi-Parallel Reconfigurable Architecture for Real-time LDPC decoding Karkooti, M.; Cavallaro, J.R.; Information Technology: Coding and Computing, 2004.
FEC Linear Block Coding
Doc.: aj SubmissionSlide 1 LDPC Coding for 45GHz Date: Authors: July 2014 NameAffiliationsAddressPhone Liguang LiZTE CorporationShenzhen.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder Lei Yang, Hui Liu, C.-J Richard Shi Transactions.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Error Control Coding. Purpose To detect and correct error(s) that is introduced during transmission of digital signal.
Doc.: IEEE / n Submission March 2004 PCCC Turbo Codes for IEEE n B. Bougard; B. Van Poucke; L. Van der Perre {bougardb,
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Tinoosh Mohsenin 2, Houshmand Shirani-mehr 1, Bevan Baas 1 1 University of California, Davis 2 University of Maryland Baltimore County Low Power LDPC Decoder.
1 Aggregated Circulant Matrix Based LDPC Codes Yuming Zhu and Chaitali Chakrabarti Department of Electrical Engineering Arizona State.
Waseda University Low-Density Parity-Check Code: is an error correcting code which achieves information rates very close to the Shanon limit. Message-Passing.
Channel Coding and Error Control 1. Outline Introduction Linear Block Codes Cyclic Codes Cyclic Redundancy Check (CRC) Convolutional Codes Turbo Codes.
Hamming Code In 1950s: invented by Richard Hamming
Variable Length LDPC Codes for 45GHz
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Rate 7/8 (1344,1176) LDPC code Date: Authors:
High Throughput LDPC Decoders Using a Multiple Split-Row Method
Physical Layer Approach for n
Scalable Memory-Less Architecture for String Matching With FPGAs
Information Redundancy Fault Tolerant Computing
Variable Length Ldpc Codes for 45GHz
Cyclic Code.
Chapter 10 Error Detection and Correction
Presentation transcript:

An Efficient FPGA Implementation of IEEE e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoder scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Low-Density Parity-Check Code Benefit of LDPC Codes.  Approaching Shannon limit  Low error floor  LDPC code is adopted by various standards (e.g. DVB-S2, n, e)

Low-Density Parity-Check Code Parity check matrix H is sparse  Very few 1’s in each row and column Null space of H is the codeword space Valid Codeword

Low-Density Parity-Check Code In (n, k) block codes, k-bit information data can be encoded as n-bit codeword. In systematic block codes, the information bits directly exist in the bits of codeword. Systematic Part Parity Part

Low-Density Parity-Check Code General encoding of systematic linear block codes  Finding generator matrix G via H.  C = sG = [s | p] Issues with LDPC codes  The size of G is very large.  G is not generally sparse.  Encoding complexity will be very high.

Structured LDPC Codes Quasi-Cyclic LDPC Codes  In QC-LDPC, H can be partitioned into square sub-blocks of size z x z.  Each sub-blocks can be Z x Z zero sub-block or identity matrix with permutation.

QC Codes With Dual-Diagonal Structure  In IEEE standards QC-LDPC Codes have Dual-Diagonal parity structure.  We take e code rate ½ matrix for example. Structured LDPC Codes 0 represent identity matrix.

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

General Encoding for LDPC Codes Richardson and Urbanke (RU) algorithm  Partition the H matrix into several sub-matrix.  In H, the part T is a low triangle matrix.

Richardson and Urbanke (RU) algorithm General Encoding for LDPC Codes O(n+g 2 ) p0 p1 O(n+g 2 )

A valid codeword c = [s|p] must satisfy Replace by dual-diagonal matrix Define lambda value as Efficient Encoding for Dual-Diagonal LDPC Codes Information bitsParity bits From equation, we obtained

Related Work (1) Sequential Encoding Encoding scheme Step 1 Compute lambda value by doing matrix operation x = HsS Step 2 Determines parity vector P 0 by adding all the lambda value Step 3 Rest of parity vector is obtained by exploiting dual-diagonal matrix T One-way derivation

Related Work (2) Arbitrary Bit-generation and Correction Encoding In [1], an alternative encoding for standard matrix was presented. Replace with zero cyclic shift Matrix will be modify by parity portion of weight-3 column set. H can be sectorized into three sub matrices  The information bit region A  The parity bit region Q for bit-flipping operation  The parity bit region U for non bit-flipping. [1] C. Yoon, E. Choi, M. Cheong, and S.-K. Lee, "Arbitrary bit generation and correction technique for encoding QC-LDPC codes with dual-diagonal parity structure," IEEE Wireless Communications and Networking Conference, (WCNC 2007), pp , March A QU

Encoding scheme Step 1 Compute lambda value by doing matrix operation x = As Step 2 Set P 0 as arbitrary binary values. solve unknown parity bits Step 3 Computed correction vector f from P 0 Step 4 Add correction vector to parity bits in region Q to correct them One-way derivation Related Work (2) Arbitrary Bit-generation and Correction Encoding

Advantage  Low-complexity encoding  The number of addition required is less than RU scheme Drawback  Can not directly applicable to standard code  Modifying matrix will decrease code performance Related Work (2) Arbitrary Bit-generation and Correction Encoding

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Better encoding scheme Advantages of the encoding scheme proposed in [2]  Low-complexity encoding  Can directly applicable to matrices defined in IEEE standards without any modification  Achieve higher level parallelism [3] C.-Y. Lin, C.-C. Wei, and M.-K. Ku, "Efficient Encoding for Dual-Diagonal Structured LDPC Code Based on Parity bits Prediction and Correction," IEEE Asia Pacific Conference on Circuits and Systems (APPCCAS), pp , Dec

Better Encoding Scheme Step 1 Set P 0 ’ as any binary vector Step 2 Compute lambda value by doing matrix operation Hs Step 3 [Forward Derivation] Step 4 [ Backward Derivation] Step 5 Compute the P 0 by adding prediction parity vector Step 6 Compute the correction vector f Step 7 Correct prediction parity by adding f Compute P 0 by adding prediction vector Compute correction vector f Correct prediction vector by f f = (P 0 ) d

Better Encoding Scheme Two-way derivation Reduce encoding delay !! Step 1 Set P 0 ’ as any binary vector. Step 2 Compute lambda value by doing matrix operation Hs. Step 3 [Forward Derivation] Step 4 [ Backward Derivation] Step 5 Compute the P 0 by adding prediction parity vector. Step 6 Compute the correction vector f. Step 7 Correct prediction parity by adding f.

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

LDPC Encoder Architecture Based on the encoding scheme proposed bedore, we design both parallel and serial architecture. Parallel architecture  Achieve higher level parallelism  High-speed Serial architecture

Parallel architecture Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 1) In this stage, matrix select the shift values and multiply specific value according to the code length. Benefit: 1.When the input data is coming, it can work immediately without all the input data are coming. 2.Reduce the numbers of barrel shifter.

Shifter Value Computation Equation for computing shift value Code rate 2 ∕ 3 A code : Normal code rate : Two type of matrix implement result with multiple rate and length SliceFFsLUTs CLK (MHz) Total gate count One matrix + calculate IP 14,1794,07126, ,076 Using matrices to save shifter value 41,40912,07876, ,691

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 2) Divide the datas from matrix. This module used to save the input data. These data are used in barrel shifters.

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 3) These module are used to circulated shift the input data Shifter value This module records the row position of the shifter values Lambda position = 3 Lambda position = 8 Lambda position = 11

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 4) Computed the lambda value by accumulating the shifted data after K b clock cycle KbKb According to the lambda position, in this clock cycle λ 1, λ 2, λ 5, λ 8, λ 9, λ 11 need to be accumulated.

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 5) Computed the prediction vector P i ‘ by equation

Parallel architecture (Stage 5) P_0 <= acc_out0; P_1 <= acc_out0 ^ acc_out1; P_2 <= acc_out0 ^ acc_out1 ^ acc_out2; P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3; P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4; P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5; P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6; P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7; P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8; P_9 <= acc_out11 ^ acc_out10 ^ acc_out9; P_10 <= acc_out11 ^ acc_out10; P_11 <= acc_out11; For saving the hardware area, we use one architecture to compute the prediction values for four different code rate. In code rate 1 / 2, P_0 ~ P_11 are the prediction In code rate 2 / 3, P_0 ~ P_3 P_8~P_11are the prediction

P_0 <= acc_out0; P_1 <= acc_out0 ^ acc_out1; P_2 <= acc_out0 ^ acc_out1 ^ acc_out2; P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3; P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4; P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5; P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6; P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7; P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8; P_9 <= acc_out11 ^ acc_out10 ^ acc_out9; P_10 <= acc_out11 ^ acc_out10; P_11 <= acc_out11; Parallel architecture (Stage 5) For saving the hardware area, we use one architecture to compute the prediction values for four different code rate. In code rate 3 / 4, P_0 ~ P_2 P_9~P_11 are the prediction vectors In code rate 5 / 6, P_0 ~ P_1 P_10~P_11are the prediction vectors

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 6) Step1: Compute the P 0. In code rate = 1 / 2, P 0 = P 5 ^ P 6 Step2: Correct the other P i. Using the equation P i = P i ’^ P 0

Serial architecture (Stage 1) As the stage1 in parallel architecture. In the first Kb clock cycle, encoder order are from top->middle and down ->middle, column by column Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 1) Reason: 1.Prepare the input data 2.Reduce the slice In the last clock cycle, encoder order are from left->right, row by row

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 2) Choose the corresponding input value to barrel shifter (Take clock cycle #2 for example) Divide the datas from matrix.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 3) Shift the input data according to the shifter value chosen form Mux

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 4) In this module, there are three works: 1.Compute λ i 2.Compute P i ’ 3.Compute P 0 In normal, this module accumulate the shifted data to compute λ i. When the data is the last value in this row, also compute P i ’.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 4) When all Pi have been computed, compute the P 0 by Xor P x ’ and P x+1 ’ which are the middle prediction vector in the matrix.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 5) Correct the other P i. Using the equation P i = P i ’^ P 0

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Implementation Results The proposed encoder based on IEEE e LDPC codes can encode the code with code rate 1/2 2/3 3/4 5/6 and code length ranging from 576 to The hardware implementation was performed and verification on Xilinx Virtex-4 and Altera Stratix Field Programmable Gate Array (FPGA) device.

Implementation Results Parallel architecture Information throughput ranging from to Gbps The encoder area is constant in any code rate or code length. For a given code rate, an increase in the code length will increase the throughput. Rate 1/2Rate 2/3Rate 3/4Rate 5/6 ZNSliceFFsLUTsCLK (MHz)IT (Gbps) ,1794,07126,

Implementation Results Serial architecture Information throughput ranging from to Gbps For a given code rate, an increase in the code length will increase the throughput.

Implementation Results Parallel architecture using row by row Area comparison

Implementation Results IT comparison IT/Area comparison

Compare to Related Work We compare implementation with [3]. Code LengthArea (LE)Clk (MHz)IT (Gbps) IT/Total Area (Mb per Le) [2] Table 4.5a The synthesis result of [22] at code rate 1/2 Code LengthArea (LE)Clk (MHz) IT (Gbps) Rate 1/2 IT/Total Area (Mb per Le) rate1/2 Proposed Better throughput for longer code length Using less area to implement multiple code length and code rate The clock cycle is shorter the [3]. [3] S. Kopparthi and D. M. Gruenbacher, "Implementation of a fiexible encoder for structured low-density parity-check codes," IEEE Pacic Rim Conference on Communications, Computers and Signal Processing (PacRim 2007), pp , Aug

Compare to Related Work The comparison of throughput The proposed encoder outperforms the work in [3] in terms of throughput when the code length longer then 1200 The proposed encoder architecture provides better throughput for a longer code length while the work in [3] does not have this kind of speed-up

Compare to Related Work The proposed encoder outperforms the work in [3] in terms of throughput/area ratio by to times The proposed encoder utilizes hardware resources more efficiently The comparison of throughput/area ratio

Compare to Related Work We compare implementation with [2].

Compare to Related Work The comparison of throughput The throughput in our proposed encoder is higher then [2] in all code rate and code length The proposed encoder outperforms the work in [2] in terms of throughput ratio by to times

Compare to Related Work The comparison of throughput/area The proposed encoder outperforms the work in [2] in terms of throughput ratio by to times The result shows that our proposed encoder utilizes hardware resources efficiently

Compare to Related Work (Serial) We compare implementation with [4]. SlicesFFsLUTsBlock ramsCLKIT [4]4,7241,8078, Proposed12,5673,88522, Our proposed encoder achieve higher IT in low clock. In our proposed encoder, the matrix information are built in it without additional blockrams. The IT/Area of our serial encoder is (Mbps) per slice and the IT/Area of [4] is [4] Jeong Ki KIM 1, Hyunseuk YOO 1 and Moon Ho LEE 1, "Efficient Encoding Architecture for IEEE e LDPC Codes, " IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2008.

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Proposed Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

An efficient encoding architecture for IEEE e LDPC codes with multiple code lengths and code rates are implemented. In our design, change between different code rate or code length only to change the type in information data. This architecture is also suitable the IEEE n standard. Our encoder achieve higher throughput and better throughput/area ratio than conventional encoding scheme when code length longer than 1200.

Thank you!!