Download presentation
Presentation is loading. Please wait.
1
A Scalable Architecture for LDPC Decoding
Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, Proceedings , Volume: 3 , Feb , Pages:
2
Outline Introduction Serial approach UMP algorithm
Dataset in check nodes Check operation Computation skill Memory reduction Computation for Iteration
3
Introduction High code rate (=0.9) LDPC code K (avg.=30):Row-weight
High code rate, codeword length and High SNR Memory reduction (1/10) MacKay [3] has shown that for high rate R applications and intermediate or longer codeword lengths this brings no advantage. In fact, the error performance for higher SNR values becomes worse.
4
Serial Approach Storage media application (optical or magnetic)
Relaxed delay requirement Process from first bit node to last bit node Memory storage for message
5
UMP Algorithm "FOR 40 ITERATIONS DO" "NEXT ITERATION"
"FOR ALL BIT NODES DO" "FOR EACH INCOMING ARC X" "SUM ALL INCOMING LLRs EXCEPT OVER X" "SEND THE RESULT BACK OVER X" "NEXT ARC" "NEXT BIT NODE" "FOR ALL CHECK NODES DO" "TAKE THE ABS MINIMUM OF THE INCOMING LLRs EXCEPT OVER X" “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X” "NEXT ARC“ "NEXT CHECK NODE" "NEXT ITERATION"
6
UMP algorithm Not needed knowledge of SNR of channel Robust performance Not needed complex mathematical function (tanh x) area saving
7
Dataset in check nodes Minimum: Overall minimum value One-but-minimum
Index Check Node 4
8
Check operation Compute exclusive or of all hard bits output by connected bit nodes, except jth. Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.
9
Computation skill Minimum:
LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum
10
Memory reduction Original size Reduced size Address=index
11
Memory unit inside Check node
12
Computation for Iteration
"FOR 40 ITERATIONS DO" "FOR ALL BIT NODES DO" “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“ “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES” “NEXT BIT NODES” "NEXT ITERATION"
13
Computation for Iteration
NEW | OLD NEW | OLD NEW | OLD NEW | OLD
14
Time folded architecture
FSM & PC μROM R/W & address Control Serial input Serial output Computational Kernel Prefetcher Memory
15
Prefetch Every dataset is statically used for 30 consecutive cycles.
Every clock cycle an average of 2R and 2W operations are required. Delayed writeback Datasets caching
16
Tiled architecture FSM & PC μROM Computational Kernel Prefetcher
Memory
17
Result and area distribution
N=1020 R=0.5, 57 tiles 36mm2 with 300Mb/s
18
Conclusion Speedup & Simultaneously multiple access Prefetch
Reduce memory access latency Memory hierarchy Increase performance N-tiled architecture Modified version can be pipelined
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.