Proposed Fix to CTC Interleaver Parameters to Enable Efficient Parallel Decoder Designs ( ) Document Number: IEEE S802.16m-10/0922 Date Submitted: Source: Brian F. Johnson, Slobodan Jovanovic, Steven Tretter, Cory Modlin, Alek Purkovic, Arnon Friedmann, Eko Onggosanusi Texas Instruments Mark Cudak Motorola Re: Comments and Change Request on m Draft Amendment, IEEE P802.16m/D6 Section: Convolutional Turbo Codes in response to the Sponsor Ballot circulation. Base Contribution: IEEE C802.16m-10/0922. This presentation provides additional design descriptions and references for the base contribution. Purpose: To be discussed and adopted by TGm for the m D7 Draft as a remedy for Disapprove comments in the Sponsor Ballot circulation. Notice: This document does not represent the agreed views of the IEEE Working Group or any of its subgroups. It represents only the views of the participants listed in the “Source(s)” field above. It is offered as a basis for discussion. It is not binding on the contributor(s), who reserve(s) the right to add, amend or withdraw material contained herein. Copyright Policy: The contributor is familiar with the IEEE-SA Copyright Policy. Patent Policy: The contributor is familiar with the IEEE-SA Patent Policy and Procedures: and..html#6sect6.html#6.3 Further information is located at and.
Introduction The m standard specifies 39 CTC block lengths, N FB, ranging from 48 to 4800 bits. The standard also specifies 39 corresponding CTC interleavers with lengths ranging from 24 to Most of 39 interleavers are designed to have parallel memory access with the parallelism degree of 2 and 4. However 10 interleavers for N FB =88, 104, 120, 136, 152, 200, 248, 456, 586, do not have parallelism degree of 2 and 4. TI proposes the change of interleaver parameters for these block lengths so that all block lengths have at least parallelism degree of 2, and all except 2 lengths have parallelism degree of 4. This change enables very high throughput CTC decoder with efficient implementation for all block sizes. –This is important to meet high throughput, low latency decoding –This is also important to enable efficient multi-standard implementations.
Parallelism Order 2 Example This example has two MAP decoders that operate in parallel on two halves of the code block. Memory is divided in two banks, input symbols are split in two halves, first half is stored in one bank, second half in the other bank. In the interleaved pass of the turbo iteration two decoders synchronously read symbols from memory without contention.
Contention and Contention-Free In parallel decoding, codeword is broken into N segments and each segment is processed with a separate decoding engine The data is also stored in N separate memory banks and each decoding engine reads data out of one of the memory banks each cycle Contention occurs when 2 or more decode engines try to read data out of the same memory bank (not the same location) at the same time Interleavers that result in each decode engine accessing a different memory bank each cycle are considered contention-free interleavers
Parallelism Order for m N FB Parallelism Degree N FB Parallelism Degree Yes NoYesNoYes568No 64YesNoYesNo Yes640YesNoYes No Yes 72Yes NoYesNo 720Yes NoYes No 80YesNo YesNo 800YesNoYes No 88No 912Yes No YesNo 96Yes NoYesNoYes1024YesNoYesNo Yes 104No 1152Yes NoYesNoYes 120NoYesNoYesNo 1312YesNoYesNo 136No 1440Yes No 152No 1632Yes NoYesNo 176YesNo 1856YesNoYesNo Yes 200No YesNo 2112Yes NoYesNoYes 216NoYesNo 2368YesNoYesNo Yes 248No 2624YesNoYesNo Yes 288Yes NoYesNoYes2944YesNoYesNo Yes 320YesNoYes No Yes3328YesNoYesNo Yes 352YesNoYesNo 3776YesNoYesNo Yes 400YesNoYes No 4224Yes NoYesNoYes 456NoYesNo 4800Yes NoYes 512YesNoYesNo Yes
Observations: 10 block sizes do not support parallelism order 2 as a minimum Many block sizes do not support any order of parallelism between 2 and 8 including block sizes as large as 568. These are highly concerning for the high throughput turbo decoder design.
Importance of Contention Free Design Base station Turbo Decoding has very high throughput, and very tight latency requirements to complete decoding of all block sizes Frame by Frame. –Need to meet MAC scheduler, and HARQ timing requirements. –Need to meet worst case/corner cases to guarantee deadlines. –Burst decode rate can often be several times higher than “real- time” bit rate of the Uplink Air interface. Turbo Decoding can easily be largest contributor to PHY Receiver processing and latency requirements. There can be a large number of small block sizes on any given UL frame based on Traffic Mix and MAC scheduler behavior. Examples include: –Large # of VOIP Packets with header compression. –TCP ACK packets. –Other emerging applications.
Importance of Contention Free Design (con’t) Lack of parallelism order 2 for all block sizes means: –Lower throughput per turbo decoder, means adding more turbo decoders to meet high throughput which increases silicon area and power consumption. –Have to implement multiple decoding modes, increasing design and verification complexity.
Parallelism order for other Stds LTE supports contention free parallelism of 2,4, 8 for all block sizes and 16 for blocks above 512. –See next page.16e supports contention free parallelism order 2 for all but one block size. In this way.16m does not compare well with.16e or LTE. It will also make multi-standard implementations more difficult/costly. –See [6] for a description of how to use duobinary MAP decoding for both.16e and for LTE.
Finding A Solution: Interleaver Permutation Rule The interleaver permutation rule is defined in Section of the standard by the following pseudo-code: for j = 0,...,N − 1 switch (j mod 4): case 0: P(j) = (P0 · j + 1) mod N case 1: P(j) = (P0 · j N/2 + P1) mod N case 2: P(j) = (P0 · j P2) mod N case 3: P(j) = (P0 · j N/2 + P3) mod N The interleaver parameters P0, P1, P2, and P3 are specified in Table 962 of the m Draft 6 Amendment.
Finding A Solution: Condition for contention free An extensive study was made of the properties of the interleaver to find selection criteria for the interleaver parameters to guarantee contention free order 2 property. The common characteristic with these 10 blocks is that N FB /2 is divisible by 4 but not by 8 as with other block lengths. Conditions for these blocks to have parallelism degree of 2 are: –Parameter P0 is an odd integer and relatively prime to N FB /2, –Parameters P1 and P3 are equal and are even numbers, –Parameter P2 is equal to zero.
Finding A Solution: Search For Candidate Soutions Here an example of the method for finding the candidate solutions is described. First, using the rules outlined on the previous slide, a set of candidate solutions is created. See for example the table of 45 candidate solutions for block size 88bits. Performance simulations are then run on all candidates and comparison with current.16m proposal is made. See next slide. Calculation of the minimum distance properties of each candidate can also be done to reduce the search space. See [2]
Finding A Solution: Search For Candidate Solutions Here we see that for block size 88 bits, there are many good candidate interleaver parameter settings that match the performance of the current proposal while enabling parallelism order 2 contention free property. This search was performed for each block size. Solutions were selected to provide equal or better performance than the current.16m proposal for the 10 block sizes not currently supporting order 2 parallelism.
Proposed Solution: Modified Interleaver Parameters Maintains the Existing Block Sizes and Subblock interleaver. Replace the current interleaver parameters with the new ones shown in Table below. Performance analysis shows nearly identical with current design.
Conclusion TI recommends replacing the current m CTC interleaver parameters of 10 code blocks with the new ones so that all the specified block lengths will have at least parallelism degree of 2. The change will be to block lengths N FB =88,104,120,136,152,200,248,456 and 586 The bit error rate performance tests with the new interleaver parameters showed similar results with the old parameters. See next section for details of performance evaluation.
BER/FER Performance Tests The Max-log-MAP algorithm with 8 iterations is used with the rate 1/3 CTC decoder in the AWGN channel environment. Simulation results are reported as BER and FER vs SNR The conclusions of the simulation results are that the new interleaver parameters show nearly identical performance as the current parameters using the same performance analysis methods as used in [2], and [3]. This means that the CTC interleaver parameters are well-designed and that the proposed design can support parallelism order 2 for all block sizes, and parallelism 4 for all except 2 block sizes without any performance loss.
Block size K=88 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=104 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=120 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=136 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=152 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=200 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=216 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=248 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=456 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Block size K=568 Old Interleaver New Interleaver BER and FER curves (blue – new Interleaver)
Summary It is not good for the evolution of the standard or the evolution of the 4G technology in general to introduce an m amendment with a CTC specification that has poor support for efficient parallel implementation of the turbo decoder. This contribution proposes small changes to the interleaver parameter table maintaining the existing well-designed block sizes and performance while also improving the design so that all block sizes below 1024 have at least parallelism order 2 and all except 2 block sizes have parallelism order 4. This will ensure low cost and complexity to implement high throughput turbo decoding for m and facilitate efficient multi- standard implementations. The proposed text changes follow...
Proposed Text Changes The proposed text changes (repeated here from C80216m-10_0922.doc are confined to a few entries in Table 962 in Section Text Start Text End
References: [1] IEEE P802.16m/D6, “Draft Amendment to IEEE Standard for Local and Metropolitan Area Networks- Part 16: Air Interface for Broadband Wireless Access Systems,” May [2] IEEE C802.16m-09/0313, “IEEE m CTC block sizes and interleaver,” January [3] IEEE C802.16m-09/1726r1, “Proposed Changes to CTC Encoder Input Sizes,” January [4] C. Berrou, Y. Saouter, C. Douillard, S. Kerouédan, and M. Jézéquel, “Designing good permutations for turbo codes: towards a single model,” IEEE International Conference on Communications, pp , June [5] 3GPP TS v9.2.0, “Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel coding,” June [6] Lee, Seok-Jun et. al., “Area-Efficient High-Throughput MAP Decoder Architectures,” IEEE Trans. On VLSI Systems, August 2005, pp