A Scalable Architecture for LDPC Decoding

A Scalable Architecture for LDPC Decoding
Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, Proceedings , Volume: 3 , Feb , Pages:

Outline Introduction Serial approach UMP algorithm
Dataset in check nodes Check operation Computation skill Memory reduction Computation for Iteration

Introduction High code rate (=0.9) LDPC code K (avg.=30):Row-weight
High code rate, codeword length and High SNR Memory reduction (1/10) MacKay [3] has shown that for high rate R applications and intermediate or longer codeword lengths this brings no advantage. In fact, the error performance for higher SNR values becomes worse.

Serial Approach Storage media application (optical or magnetic)
Relaxed delay requirement Process from first bit node to last bit node Memory storage for message

UMP Algorithm "FOR 40 ITERATIONS DO" "NEXT ITERATION"
"FOR ALL BIT NODES DO" "FOR EACH INCOMING ARC X" "SUM ALL INCOMING LLRs EXCEPT OVER X" "SEND THE RESULT BACK OVER X" "NEXT ARC" "NEXT BIT NODE" "FOR ALL CHECK NODES DO" "TAKE THE ABS MINIMUM OF THE INCOMING LLRs EXCEPT OVER X" “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X” "NEXT ARC“ "NEXT CHECK NODE" "NEXT ITERATION"

UMP algorithm Not needed knowledge of SNR of channel Robust performance Not needed complex mathematical function (tanh x) area saving

Dataset in check nodes Minimum: Overall minimum value One-but-minimum
Index Check Node 4

Check operation Compute exclusive or of all hard bits output by connected bit nodes, except jth. Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.

Computation skill Minimum:
LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum

Memory reduction Original size Reduced size Address=index

Memory unit inside Check node

Computation for Iteration
"FOR 40 ITERATIONS DO" "FOR ALL BIT NODES DO" “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“ “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES” “NEXT BIT NODES” "NEXT ITERATION"

Computation for Iteration
NEW | OLD NEW | OLD NEW | OLD NEW | OLD

Time folded architecture
FSM & PC μROM R/W & address Control Serial input Serial output Computational Kernel Prefetcher Memory

Prefetch Every dataset is statically used for 30 consecutive cycles.
Every clock cycle an average of 2R and 2W operations are required. Delayed writeback Datasets caching

Tiled architecture FSM & PC μROM Computational Kernel Prefetcher
Memory

Result and area distribution
N=1020 R=0.5, 57 tiles 36mm2 with 300Mb/s

Conclusion Speedup & Simultaneously multiple access  Prefetch
Reduce memory access latency Memory hierarchy Increase performance N-tiled architecture Modified version can be pipelined

A Scalable Architecture for LDPC Decoding

Similar presentations

Presentation on theme: "A Scalable Architecture for LDPC Decoding"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Scalable Architecture for LDPC Decoding

Similar presentations

Presentation on theme: "A Scalable Architecture for LDPC Decoding"— Presentation transcript:

Similar presentations

About project

Feedback