Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,

Slides:



Advertisements
Similar presentations
Noise-Predictive Turbo Equalization for Partial Response Channels Sharon Aviran, Paul H. Siegel and Jack K. Wolf Department of Electrical and Computer.
Advertisements

Design and Performance of Rate Compatible-SCCC Alexandre Graell i Amat †‡, Guido Montorsi ‡, Francesca Vatta* † Universitat Pompeu Fabra. Barcelona, Spain.
Spartan-3 FPGA HDL Coding Techniques
Inserting Turbo Code Technology into the DVB Satellite Broadcasting System Matthew Valenti Assistant Professor West Virginia University Morgantown, WV.
Houshmand Shirani-mehr 1,2, Tinoosh Mohsenin 3, Bevan Baas 1 1 VCL Computation Lab, ECE Department, UC Davis 2 Intel Corporation, Folsom, CA 3 University.
1 Channel Coding in IEEE802.16e Student: Po-Sheng Wu Advisor: David W. Lin.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
Sliding-Window Digital Fountain Codes for Streaming of Multimedia Contents Matta C.O. Bogino, Pasquale Cataldi, Marco Grangetto, Enrico Magli, Gabriella.
Network Coding Project presentation Communication Theory 16:332:545 Amith Vikram Atin Kumar Jasvinder Singh Vinoo Ganesan.
Computer Architecture Project
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
RAPTOR CODES AMIN SHOKROLLAHI DF Digital Fountain Technical Report.
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
1 Verification Codes Michael Luby, Digital Fountain, Inc. Michael Mitzenmacher Harvard University and Digital Fountain, Inc.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
Improving the Performance of Turbo Codes by Repetition and Puncturing Youhan Kim March 4, 2005.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Low Density Parity Check (LDPC) Code Implementation Matthew Pregara & Zachary Saigh Advisors: Dr. In Soo Ahn & Dr. Yufeng Lu Dept. of Electrical and Computer.
Block-LDPC: A Practical LDPC Coding System Design Approach
Contact: Robust Wireless Communication System for Maritime Monitoring Robust Wireless Communication System for Maritime Monitoring.
1 Channel Coding (II) Cyclic Codes and Convolutional Codes.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
Analysis of Algorithms
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
1 –Mandatory exercise for Inf 244 –Deadline: October 29th –The assignment is to implement an encoder/decoder system.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 5 Turbo Code.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Introduction of Low Density Parity Check Codes Mong-kai Ku.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Design of a High-Speed Asynchronous Turbo Decoder Pankaj Golani, George Dimou, Mallika Prakash and Peter A. Beerel Asynchronous CAD/VLSI Group Ming Hsieh.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Multi-Edge Framework for Unequal Error Protecting LT Codes H. V. Beltr˜ao Neto, W. Henkel, V. C. da Rocha Jr. Jacobs University Bremen, Germany IEEE ITW(Information.
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Combined Multiuser Detection and Channel Decoding with Receiver.
Design and Implementation of Turbo Decoder for 4G standards IEEE e and LTE Syed Z. Gilani.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
An ARQ Technique Using Related Parallel and Serial Concatenated Convolutional Codes Yufei Wu formerly with: Mobile and Portable Radio Research Group Virginia.
Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder Lei Yang, Hui Liu, C.-J Richard Shi Transactions.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Memory-efficient Turbo decoding architecture for LDPC codes
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Tinoosh Mohsenin 2, Houshmand Shirani-mehr 1, Bevan Baas 1 1 University of California, Davis 2 University of Maryland Baltimore County Low Power LDPC Decoder.
1 Aggregated Circulant Matrix Based LDPC Codes Yuming Zhu and Chaitali Chakrabarti Department of Electrical Engineering Arizona State.
1 Code design: Computer search Low rate: Represent code by its generator matrix Find one representative for each equivalence class of codes Permutation.
Presenter: Darshika G. Perera Assistant Professor
Computer Architecture Chapter (14): Processor Structure and Function
SLS-CS_13-13 SCCC Green Book
Bridging the Gap Between Parallel and Serial Concatenated Codes
Length 1344 LDPC codes for 11ay
An Efficient Software Radio Implementation of the UMTS Turbo Codec
A Scalable Architecture for LDPC Decoding
Cache Memory Presentation I
Factor Graphs and the Sum-Product Algorithm
Rate 7/8 LDPC Code for 11ay Date: Authors:
Rate 7/8 (1344,1176) LDPC code Date: Authors:
Trellis Codes With Low Ones Density For The OR Multiple Access Channel
Interleaver-Division Multiple Access on the OR Channel
January 2004 Turbo Codes for IEEE n
An Improved Split-Row Threshold Decoding Algorithm for LDPC Codes
Physical Layer Approach for n
<month year> doc.: IEEE /125r0 August 2004
Irregular Structured LDPC Codes and Structured Puncturing
Miguel Griot, Andres I. Vila Casado, and Richard D. Wesel
Uncoordinated Optical Multiple Access using IDMA and Nonlinear TCM
Presentation transcript:

Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto, Guido Montorsi Politecnico di Torino CCSDS meeting: Channel Coding Working Groups Montreal, May 12 th 2004

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino2 Summary LDPC vs turbo decoders: –Common high speed architecture for iterative decoders –Memory collision problems and solutions –Comparison of complexity of LDPC and turbo decoders 4D-8PSK TCM + RS –Comparison of complexity

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino3 Introduction We want to summarize the result that have been obtained and are spread in literature under a unique framework The common framework will allows to perform some comparison of complexity and throughput for the two classes of turbo decoders and iterative LDPC decoders We will see that the two iterative decoders share the same problems and solutions Memory and complexity can be managed with the same approaches

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino4 Pipelined architectures for turbo decoders (PCCC and SCCC) High speed architectures for turbo decoders –The first high speed architecture, that does not require to terminate the encoders, permits to use convolutional interleavers, and yields the best performance is the pipelined architecture (Original encoder): –Best performance –Latency proportional to the number of iterations –Memory proportional to the number of iterations –Arithmetic complexity proportional to the number of iterations –The maximum throughput is the maximum speed of the SISO processors –Not good for packet transmission SW SISO I SW SISO D SW SISO I SW SISO D 1 st iteration 2 nd iteration

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino5 Block decoder structure When the code possesses a block structure, like for LDPC, decoding can be performed block by block LDPC are originally block encoders To give a block structure to PCCC and SCCC it is necessary to terminate the trellises and use block interleavers  This leads to a performance degradation  Yields high-speed decoding architectures  Permits to use PCCC and SCCC for packet applications

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino6 Block decoder structure (LDPC, PCCC, SCCC) AB LLR memory Internal memory (EXT) size N i LLR memory Complexity C =C a +C b Memory M =N+N i Processors K N Throughput

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino7 Block decoders data dependency The two processors must operate sequentially on the internal data, and parallel processing leads to a loss in performance AAB LLR EXT Dependency graph 1 2

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino8 SCCC SISO Inner SISO Outer LLR memory Internal shared memory size N=K/r o N Processors K

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino9 LDPC VNPCNP LLR memory Internal memory size  N N K edge density: average variable degree

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino10 High speed architectures Trivial solution: functional replication It is often assumed in the complexity evaluation of turbo decoders –The memory dominates the overall complexity when the block size is large It was never considered for the implementation of LDPC Decoder R Decoder R Throughput: LR Complexity: LC Memory: NM Memory and latency increase linearly with parallelism and throughput ! L decoders

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino11 General parallel decoder architecture A 1 A L 1 B 1 B L 2 LLR EXT LLR We assume the same number of processors Throughput: LR Complexity: LC Memory:M

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino12 LDPC CNP 1 CNP L VNP 1 VNP L EXT LLR memory Check node processors Variable node processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino13 SCCC OS 1 OS N IS 1 IS N EXT LLR memory Outer SISO processors Inner SISO processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino14 Problem for turbo decoders The second architecture can be realized if the parallel processors do not show data dependency –For LDPC this is true for processors on check nodes and variable nodes, so that parallelism can be increased up to the number of check nodes and variable nodes –For turbo decoder this is not possible in principle Dependency graph of turbo decoders LLR IS 1IS 2IS N OS 1OS 2OS N EXT Initialization of Forward - backward recursions

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino15 Breaking forward and backward initialization The weak dependency of one window with respect to the adjacent windows can be broken without affecting the performance, provided that the window is not too small LLR IS 1IS 2IS N OS 1OS 2OS N IS 1IS 2IS N OS 1OS 2OS N     LLR IS 1IS 2IS N OS 1OS 2OS N 

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino16 Simulation results: SCCC decoder with delayed initialization SCCC decoder 4 state constituent encoders variable window size

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino17 Memory access problem A 1A N 1 B 1B N 2 LLR memory Internal shared memory LLR memory The two accesses to the internal memory follow different orders because of the permutation!

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino18 The memory access problem When a set of processors must access data in a memory one wants to avoid collisions, i.e., accessing the same memory bank from two different processors at the same time The collision can be solved: I.With very high-speed memory and/or additional hardware II.Designing properly the interleaver/code to avoid it III.With permutation decomposition

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino19 The memory access problem High speed memory: When collisions occur, some additional hardware can solve the problem by serializing the memory access –It requires memory with speed higher than the processor speed or implies a loss of throughput –When the parallelism is very high the solution becomes unfeasible Proper designing of the code/interleaver: By designing the interleaver/code targeting a specific hardware architecture, i.e. parallelism, it is possible to avoid collisions  The code/interleaver must be matched to the architecture  Changing the parallelism may require a change of the code/interleaver  The imposed constraints to the code may lead to performance losses (e.g., smaller interleaver spread)  It does not apply to most of the existing code standards

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino20 Permutation decomposition Permutation decomposition: It has been shown that, for any desired parallelism L, it is possible to find a collision-free decomposition of the permutation as follows LL M permutations on N/L elements sequence of N/L permutations on M elements L L

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino21 Permutation decomposition Finding the appropriate decompositions is similar to the problem of “latin squares” The decomposition permits to realize any permutation in a collision-free way with a set of L memory banks and two L-way multiplexers MM L memory banks L-way multiplexer

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino22 Thus LDPC and turbo decoder admit very similar parallel architectures A collision-free memory access scheme can be realized without any particular attention to the structure of the parity-check matrix or the interleaver This permits the high-speed parallel implementation of existing code standards and/or carefully optimized codes/interleaver –This approach however requires two L-way multiplexers, where L is the parallelism Special code construction or interleaver structures may further simplify the routing problem

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino23 Complexities of processors Now that we have stated the analogy between the architectures of the LDPC and turbo iterative decoders we show how to compare the complexity of the constituent processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino24 Complexity of SISO processors The complexity of SISO processor can be easily evaluated in term of sums and max* operators LUT -

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino25 LDPC decoders The complexity of LDPC decoders can be evaluated using sums and “g” operators LUT --

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino26 Unit g implementing the SoftXOR The quantities are represented with modulus and sign notation The two LUT2 are equivalent to those used for the max* operator in the SISO algorithm The operator thus requires 4 sums, 2 access to the LUT2 and other minor logic (MUX and XOR) Critical path requires three sums The unit g have a complexity that is roughly 2 times the max* operator and a latency that is three times

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino27 Serial architecture for CNP Memory: ~3n memory cells Logic : 3 unit B (~6max*operators) Latency: 2(n-1) Speed : 1 edge/clock cycle n is the number of edges 4 edges ’

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino28 Parallel architecture of CNP a b c d adbc e e a b c d e 5 edges Memory : 2(n-1)(n-3)+n memory cells Logic : 3(n-2) unit B Latency : (n-3) clock cycles Throughput : 1 check node/clock cycle n is the number of edges

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino29 Number of operations for variable node Variable nodes simply makes the sum of all incoming edges and the extract the equivalent input

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino30 Summary of processor complexities summax* Forw. (k bits) Back (k bits) Out. O (k bits) Out. I (k bits) BMC O (k bits) BMC I (k bits) summax* CNP (per CN) VNP (per VN)

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino31 Summary of decoder complexities (per iteration) summax*/max SCCC LDPC

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino32 In order to compare the encoding and decoding complexity of the 4D- 8PSK-TCM+RS with the MHOMS SCCC 8PSK one, we have prepared the tables shown in the following slides The complexity has been measured in terms of number of elementary operations performed by the encoder and decoder Although no attempt has been made to refer to particular HW tools to implement the two schemes, we mention the fact that the MHOMS SCCC scheme decoder working at 1 Gbps can be implemented using 4 FPGA of the series XILINX Virtex 100 The complexity of the RS co-decoder has not been evaluated Complexity comparison between MHOMS SCCC 8PSK and 4D-8PSK-TCM+RS

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino33 Comparison We have performed a complexity comparison for the following codes –MHOMS –DVB-S2 new standard with rate 5/6 –LDPC code proposed by Goddard and JPL for the CCSDS standard –4D-8PSK TCM Encoder and decoding complexity has been evaluated

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino34 LDPC codes

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino35 Encoder complexity comparison Number of operations per information bit and memory requirements Assuming back-substitution is possible

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino36 Decoder complexity comparison Number of operations per information bit (10 iterations) and memory requirements

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino37 Decoder complexity comparison Number of operations per information bit (6 iterations, enough if a stopping criterion is used together with a slight memory expansion) and memory requirements

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino38 Conclusions We have proposed a way to compare the complexities of LDPC and turbo decoders The complexity comparison shows that in general the LDPC solutions are more complex by a factor of 3-4 LDPC solutions offer a superior performance  An agreed weighting function for complexity and performance must be defined in order to take the final decision