Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Scalable Architecture for LDPC Decoding

Similar presentations


Presentation on theme: "A Scalable Architecture for LDPC Decoding"— Presentation transcript:

1 A Scalable Architecture for LDPC Decoding
Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, Proceedings , Volume: 3 , Feb , Pages:

2 Outline Introduction Serial approach UMP algorithm
Dataset in check nodes Check operation Computation skill Memory reduction Computation for Iteration

3 Introduction High code rate (=0.9) LDPC code K (avg.=30):Row-weight
High code rate, codeword length and High SNR Memory reduction (1/10) MacKay [3] has shown that for high rate R applications and intermediate or longer codeword lengths this brings no advantage. In fact, the error performance for higher SNR values becomes worse.

4 Serial Approach Storage media application (optical or magnetic)
Relaxed delay requirement Process from first bit node to last bit node Memory storage for message

5 UMP Algorithm "FOR 40 ITERATIONS DO" "NEXT ITERATION"
"FOR ALL BIT NODES DO" "FOR EACH INCOMING ARC X" "SUM ALL INCOMING LLRs EXCEPT OVER X" "SEND THE RESULT BACK OVER X" "NEXT ARC" "NEXT BIT NODE" "FOR ALL CHECK NODES DO" "TAKE THE ABS MINIMUM OF THE INCOMING LLRs EXCEPT OVER X" “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X” "NEXT ARC“ "NEXT CHECK NODE" "NEXT ITERATION"

6 UMP algorithm Not needed knowledge of SNR of channel Robust performance Not needed complex mathematical function (tanh x) area saving

7 Dataset in check nodes Minimum: Overall minimum value One-but-minimum
Index Check Node 4

8 Check operation Compute exclusive or of all hard bits output by connected bit nodes, except jth. Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.

9 Computation skill Minimum:
LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum

10 Memory reduction Original size Reduced size Address=index

11 Memory unit inside Check node

12 Computation for Iteration
"FOR 40 ITERATIONS DO" "FOR ALL BIT NODES DO" “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“ “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES” “NEXT BIT NODES” "NEXT ITERATION"

13 Computation for Iteration
NEW | OLD NEW | OLD NEW | OLD NEW | OLD

14 Time folded architecture
FSM & PC μROM R/W & address Control Serial input Serial output Computational Kernel Prefetcher Memory

15 Prefetch Every dataset is statically used for 30 consecutive cycles.
Every clock cycle an average of 2R and 2W operations are required. Delayed writeback Datasets caching

16 Tiled architecture FSM & PC μROM Computational Kernel Prefetcher
Memory

17 Result and area distribution
N=1020 R=0.5, 57 tiles 36mm2 with 300Mb/s

18 Conclusion Speedup & Simultaneously multiple access  Prefetch
Reduce memory access latency Memory hierarchy Increase performance N-tiled architecture Modified version can be pipelined


Download ppt "A Scalable Architecture for LDPC Decoding"

Similar presentations


Ads by Google