Presentation is loading. Please wait.

Presentation is loading. Please wait.

Massively LDPC Decoding on Multicore Architectures Present by : fakewen.

Similar presentations


Presentation on theme: "Massively LDPC Decoding on Multicore Architectures Present by : fakewen."— Presentation transcript:

1 Massively LDPC Decoding on Multicore Architectures Present by : fakewen

2 Authors Gabriel Falcao Leonel Sousa Vitor Silva

3 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

4 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

5 Introduction LDPC decoding on multicore architectures LDPC decoders were developed on recent multicores, such as off-the-shelf general- purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.).

6 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

7 BELIEF PROPAGATION Belief propagation, also known as the SPA, is an iterative algorithm for the computation of joint probabilities

8 LDPC Decoding exploit probabilistic relationships between nodes imposed by parity-check conditions that allow inferring the most likely transmitted codeword.

9 LDPC Decoding(cont.) White Gaussian noise

10 LDPC Decoding(cont.)

11

12 Complexity

13 Forward and Backward recursions memory access operations is registered, which contributes to in- crease the ratio of arithmetic operations per memory access

14

15 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

16 DATA STRUCTURES AND PARALLEL COMPUTING MODELS compact data structures to represent the H matrix

17 Data Structures separately code the information about H in two independent data streams, and

18 remind r mn : 是 CNm->BNn q nm : 是 BNn->CNm

19 Parallel Computational Models Parallel Features of the General-Purpose Multicores Parallel Features of the GPU Parallel Features of the CELL/B.E.

20 Parallel Features of the General- Purpose Multicores #pragma omp parallel for

21 Parallel Features of the GPU

22 Throughput

23 Parallel Features of the CELL/B.E.

24 Throughput

25 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

26 PARALLELIZING THE KERNELS EXECUTION The Multicores Using OpenMP The GPU Using CUDA The CELL/B.E.

27 The Multicores Using OpenMP

28 The GPU Using CUDA Programming the Grid Using a Thread per Node Approach

29 The GPU Using CUDA(cont.) Coalesced Memory Accesses

30

31 The CELL/B.E. Small Single-SPE Model(A B C) Large Single-SPE Model

32 Why Single-SPE Model In the single-SPE model, the number of communications between PPE and SPEs is minimum and the PPE is relieved from the costly task of reorganizing data (sorting procedure in Algorithm 4) between data transfers to the SPE.

33

34

35 Outline Introduction BELIEF PROPAGATION DATA STRUCTURES AND PARALLEL COMPUTING MODELS PARALLELIZING THE KERNELS EXECUTION EXPERIMENTAL RESULTS

36

37 LDPC Decoding on the General-Purpose x86 Multicores Using OpenMP LDPC Decoding on the CELL/B.E. – Small Single-SPE Model – Large Single-SPE Model LDPC Decoding on the GPU Using CUDA

38 LDPC Decoding on the General- Purpose x86 Multicores Using OpenMP

39 LDPC Decoding on the CELL/B.E.

40 LDPC Decoding on the CELL/B.E.(cont.)

41

42 LDPC Decoding on the GPU Using CUDA

43 The end Thank you~

44 Forward backward I can do better than that. I can send you a MSc thesis of a former student of ours who graduated 5 years ago. She explains the basic concept in detail. Basically, when you are performing the horizontal processing (the same applies for the vertical one) and you have a CN updating all the BNs connected to it, the F&B optimization exploits the fact that you only have to read all the BNs information (probabilities in the case of SPA) once for each CN, which gives you tremendous gains in computation time since you save many memory accesses, which, as you know, are the main bottleneck in parallel computing. Quite shortly, imagine you have one CN updating 6 BNs BN0 to BN5 (horizontal processing) and that BN0 holds information A, BN1= B, BN2=C,..., BN5=F. Then, to update the corresponding rmn elements, for each BN you have to calculate: BN0=BxCXDXEXF BN1=AXCXDXEXF BN2=AXBXDXEXF... BN5=AXBXCXDXE, where each BN contributes to update its neighbors, but it does not contribute to the update of itself. So, the F&B optimization allows you to read A, B, C, D, E and F data only once from memory and produce all the necessary intermediate values necessary to update all the BNs connected to that CN. You save memory accesses (very important!) and processing too.


Download ppt "Massively LDPC Decoding on Multicore Architectures Present by : fakewen."

Similar presentations


Ads by Google