Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,

Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto, Guido Montorsi Politecnico di Torino CCSDS meeting: Channel Coding Working Groups Montreal, May 12 th 2004

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino2 Summary LDPC vs turbo decoders: –Common high speed architecture for iterative decoders –Memory collision problems and solutions –Comparison of complexity of LDPC and turbo decoders 4D-8PSK TCM + RS –Comparison of complexity

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino3 Introduction We want to summarize the result that have been obtained and are spread in literature under a unique framework The common framework will allows to perform some comparison of complexity and throughput for the two classes of turbo decoders and iterative LDPC decoders We will see that the two iterative decoders share the same problems and solutions Memory and complexity can be managed with the same approaches

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino4 Pipelined architectures for turbo decoders (PCCC and SCCC) High speed architectures for turbo decoders –The first high speed architecture, that does not require to terminate the encoders, permits to use convolutional interleavers, and yields the best performance is the pipelined architecture (Original encoder): –Best performance –Latency proportional to the number of iterations –Memory proportional to the number of iterations –Arithmetic complexity proportional to the number of iterations –The maximum throughput is the maximum speed of the SISO processors –Not good for packet transmission SW SISO I SW SISO D SW SISO I SW SISO D 1 st iteration 2 nd iteration

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino5 Block decoder structure When the code possesses a block structure, like for LDPC, decoding can be performed block by block LDPC are originally block encoders To give a block structure to PCCC and SCCC it is necessary to terminate the trellises and use block interleavers  This leads to a performance degradation  Yields high-speed decoding architectures  Permits to use PCCC and SCCC for packet applications

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino6 Block decoder structure (LDPC, PCCC, SCCC) AB LLR memory Internal memory (EXT) size N i LLR memory Complexity C =C a +C b Memory M =N+N i Processors K N Throughput

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino7 Block decoders data dependency The two processors must operate sequentially on the internal data, and parallel processing leads to a loss in performance AAB LLR EXT Dependency graph 1 2

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino8 SCCC SISO Inner SISO Outer LLR memory Internal shared memory size N=K/r o N Processors K

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino9 LDPC VNPCNP LLR memory Internal memory size  N N K edge density: average variable degree

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino10 High speed architectures Trivial solution: functional replication It is often assumed in the complexity evaluation of turbo decoders –The memory dominates the overall complexity when the block size is large It was never considered for the implementation of LDPC Decoder R Decoder R Throughput: LR Complexity: LC Memory: NM Memory and latency increase linearly with parallelism and throughput ! L decoders

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino11 General parallel decoder architecture A 1 A L 1 B 1 B L 2 LLR EXT LLR We assume the same number of processors Throughput: LR Complexity: LC Memory:M

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino12 LDPC CNP 1 CNP L VNP 1 VNP L EXT LLR memory Check node processors Variable node processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino13 SCCC OS 1 OS N IS 1 IS N EXT LLR memory Outer SISO processors Inner SISO processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino14 Problem for turbo decoders The second architecture can be realized if the parallel processors do not show data dependency –For LDPC this is true for processors on check nodes and variable nodes, so that parallelism can be increased up to the number of check nodes and variable nodes –For turbo decoder this is not possible in principle Dependency graph of turbo decoders LLR IS 1IS 2IS N OS 1OS 2OS N EXT Initialization of Forward - backward recursions

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino15 Breaking forward and backward initialization The weak dependency of one window with respect to the adjacent windows can be broken without affecting the performance, provided that the window is not too small LLR IS 1IS 2IS N OS 1OS 2OS N IS 1IS 2IS N OS 1OS 2OS N     LLR IS 1IS 2IS N OS 1OS 2OS N 

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino16 Simulation results: SCCC decoder with delayed initialization SCCC decoder 4 state constituent encoders variable window size

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino17 Memory access problem A 1A N 1 B 1B N 2 LLR memory Internal shared memory LLR memory The two accesses to the internal memory follow different orders because of the permutation!

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino18 The memory access problem When a set of processors must access data in a memory one wants to avoid collisions, i.e., accessing the same memory bank from two different processors at the same time The collision can be solved: I.With very high-speed memory and/or additional hardware II.Designing properly the interleaver/code to avoid it III.With permutation decomposition

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino19 The memory access problem High speed memory: When collisions occur, some additional hardware can solve the problem by serializing the memory access –It requires memory with speed higher than the processor speed or implies a loss of throughput –When the parallelism is very high the solution becomes unfeasible Proper designing of the code/interleaver: By designing the interleaver/code targeting a specific hardware architecture, i.e. parallelism, it is possible to avoid collisions  The code/interleaver must be matched to the architecture  Changing the parallelism may require a change of the code/interleaver  The imposed constraints to the code may lead to performance losses (e.g., smaller interleaver spread)  It does not apply to most of the existing code standards

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino20 Permutation decomposition Permutation decomposition: It has been shown that, for any desired parallelism L, it is possible to find a collision-free decomposition of the permutation as follows LL M permutations on N/L elements sequence of N/L permutations on M elements L L

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino21 Permutation decomposition Finding the appropriate decompositions is similar to the problem of “latin squares” The decomposition permits to realize any permutation in a collision-free way with a set of L memory banks and two L-way multiplexers MM L memory banks L-way multiplexer

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino22 Thus LDPC and turbo decoder admit very similar parallel architectures A collision-free memory access scheme can be realized without any particular attention to the structure of the parity-check matrix or the interleaver This permits the high-speed parallel implementation of existing code standards and/or carefully optimized codes/interleaver –This approach however requires two L-way multiplexers, where L is the parallelism Special code construction or interleaver structures may further simplify the routing problem

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino23 Complexities of processors Now that we have stated the analogy between the architectures of the LDPC and turbo iterative decoders we show how to compare the complexity of the constituent processors

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino24 Complexity of SISO processors The complexity of SISO processor can be easily evaluated in term of sums and max* operators LUT -

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino25 LDPC decoders The complexity of LDPC decoders can be evaluated using sums and “g” operators LUT --

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino26 Unit g implementing the SoftXOR The quantities are represented with modulus and sign notation The two LUT2 are equivalent to those used for the max* operator in the SISO algorithm The operator thus requires 4 sums, 2 access to the LUT2 and other minor logic (MUX and XOR) Critical path requires three sums The unit g have a complexity that is roughly 2 times the max* operator and a latency that is three times

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino27 Serial architecture for CNP 1234 1234 1234 Memory: ~3n memory cells Logic : 3 unit B (~6max*operators) Latency: 2(n-1) Speed : 1 edge/clock cycle n is the number of edges 4 edges ’

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino28 Parallel architecture of CNP a b c d adbc e e a b c d e 5 edges Memory : 2(n-1)(n-3)+n memory cells Logic : 3(n-2) unit B Latency : (n-3) clock cycles Throughput : 1 check node/clock cycle n is the number of edges

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino29 Number of operations for variable node Variable nodes simply makes the sum of all incoming edges and the extract the equivalent input

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino30 Summary of processor complexities summax* Forw. (k bits) Back (k bits) Out. O (k bits) Out. I (k bits) BMC O (k bits) BMC I (k bits) summax* CNP (per CN) VNP (per VN)

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino31 Summary of decoder complexities (per iteration) summax*/max SCCC LDPC

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino32 In order to compare the encoding and decoding complexity of the 4D- 8PSK-TCM+RS with the MHOMS SCCC 8PSK one, we have prepared the tables shown in the following slides The complexity has been measured in terms of number of elementary operations performed by the encoder and decoder Although no attempt has been made to refer to particular HW tools to implement the two schemes, we mention the fact that the MHOMS SCCC scheme decoder working at 1 Gbps can be implemented using 4 FPGA of the series XILINX Virtex 100 The complexity of the RS co-decoder has not been evaluated Complexity comparison between MHOMS SCCC 8PSK and 4D-8PSK-TCM+RS

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino33 Comparison We have performed a complexity comparison for the following codes –MHOMS –DVB-S2 new standard with rate 5/6 –LDPC code proposed by Goddard and JPL for the CCSDS standard –4D-8PSK TCM Encoder and decoding complexity has been evaluated

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino34 LDPC codes

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino35 Encoder complexity comparison Number of operations per information bit and memory requirements Assuming back-substitution is possible

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino36 Decoder complexity comparison Number of operations per information bit (10 iterations) and memory requirements

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino37 Decoder complexity comparison Number of operations per information bit (6 iterations, enough if a stopping criterion is used together with a slight memory expansion) and memory requirements

Università di Pisa Politecnico di Torino 5/05/2004Guido Montorsi - Politecnico di Torino38 Conclusions We have proposed a way to compare the complexities of LDPC and turbo decoders The complexity comparison shows that in general the LDPC solutions are more complex by a factor of 3-4 LDPC solutions offer a superior performance  An agreed weighting function for complexity and performance must be defined in order to take the final decision

Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,

Similar presentations

Presentation on theme: "Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,

Similar presentations

Presentation on theme: "Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,"— Presentation transcript:

Similar presentations

About project

Feedback