Presentation is loading. Please wait.

Presentation is loading. Please wait.

J. Harkins1 of 51MAPLD2005/C178 Sorting on the SRC 6 Reconfigurable Computer John Harkins, Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang The George Washington.

Similar presentations


Presentation on theme: "J. Harkins1 of 51MAPLD2005/C178 Sorting on the SRC 6 Reconfigurable Computer John Harkins, Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang The George Washington."— Presentation transcript:

1 J. Harkins1 of 51MAPLD2005/C178 Sorting on the SRC 6 Reconfigurable Computer John Harkins, Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang The George Washington University Washington, DC

2 J. Harkins2 of 51MAPLD2005/C178 Algorithms Quick Sort Heap Sort Radix Sort Bitonic Sort Odd/Even Merge

3 J. Harkins3 of 51MAPLD2005/C178 SRC System Architecture 16 Port Crossbar Switch 1.6 GB/s Peak Port BW Processor Node FPGA Node Memory Node Up to 16 Nodes per Switch \ 64 ………

4 J. Harkins4 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9]

5 J. Harkins5 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [ 0][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][13]

6 J. Harkins6 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [ 0][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][13] QS1: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8][ 9][12][15][14][11][10][13]

7 J. Harkins7 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [ 0][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][13] QS1: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8][ 9][12][15][14][11][10][13]

8 J. Harkins8 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [ 0][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][13] QS1: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8][ 9][12][15][14][11][10][13] mL: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8]

9 J. Harkins9 of 51MAPLD2005/C178 Example - Quick Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 med: [ 0][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][13] QS1: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8][ 9][12][15][14][11][10][13] mL: [ 0][ 3][ 5][ 7][ 4][ 2][ 6][ 1][ 8] PS: [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8]

10 J. Harkins10 of 51MAPLD2005/C178 Quick Sort - MIMD Architecture Bank A Bank B Bank C Bank D Bank E Bank F FPGA 1 QS 1 QS 2 QS 3 90% FPGA 2 QS 4 QS 5 QS 6 84% 6 Instances Median of 3 to select pivot Pipeline Sort for partitions ≤ 10 vs. Insertion Sort ≤ 20

11 J. Harkins11 of 51MAPLD2005/C178 14 1026 0841275 111 9 15 Example - Heap Sort 0: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 13 3

12 J. Harkins12 of 51MAPLD2005/C178 14 1026 41275 111 15 Example - Heap Sort 13 3 0 9 8 8: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

13 J. Harkins13 of 51MAPLD2005/C178 14 1026 841275 111 15 Example - Heap Sort 13 3 0 9 7: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

14 J. Harkins14 of 51MAPLD2005/C178 14 1026 841275 111 15 Example - Heap Sort 13 3 9 0 7: [13][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][ 0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

15 J. Harkins15 of 51MAPLD2005/C178 14 102 841275 15 Example - Heap Sort 13 3 9 0 6 111 6: [13][ 3][14][15][10][ 2][ 6][ 9][ 8][ 4][12][ 7][ 5][11][ 1][ 0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 6 111

16 J. Harkins16 of 51MAPLD2005/C178 14 102 841275 15 Example - Heap Sort 13 3 9 0 6 111 6: [13][ 3][14][15][10][ 2][11][ 9][ 8][ 4][12][ 7][ 5][ 6][ 1][ 0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 61

17 J. Harkins17 of 51MAPLD2005/C178 14 12711 3841025 61 0 9 Example - Heap Sort max: [15][13][14][ 9][12][ 7][11][ 3][ 8][ 4][10][ 2][ 5][ 6][ 1][ 0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 13

18 J. Harkins18 of 51MAPLD2005/C178 Heap Sort - MIMD Architecture Bank A Bank B Bank C Bank D Bank E Bank F FPGA 1 HS 1 HS 2 HS 3 55% FPGA 2 HS 4 HS 5 HS 6 5% 6 Instances Almost identical to processor code

19 J. Harkins19 of 51MAPLD2005/C178 Example - Radix Sort 1: [13][ 3][14][15][10][ 2][ 6][ 0][ 8][ 4][12][ 7][ 5][11][ 1][ 9] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 Pass1: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  index 0 = 0  index 1 = 4  index 2 = 8  index 3 = 12 count 1 = 4 count 2 = 4 count 3 = 4 count 4 = 4 index n = ∑ count i n > 0 i=1 n index 0 = 0

20 J. Harkins20 of 51MAPLD2005/C178 Example - Radix Sort 2: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 Pass2: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  index 0 = 0  index 1 = 4  index 2 = 8  index 3 = 12 count 0 = 0 count 1 = 0 count 2 = 0 count 3 = 0

21 J. Harkins21 of 51MAPLD2005/C178 1101 Example - Radix Sort 2: [ ][ ][ ][ ][13][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 Pass2: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  index 0 = 0  index 1 = 5  index 2 = 8  index 3 = 12 count 0 = 0 count 1 = 0 count 2 = 0 count 3 = 1

22 J. Harkins22 of 51MAPLD2005/C178 1101 Example - Radix Sort 2: [ ][ ][ ][ ][13][ ][ ][ ][ ][ ][ ][ ][ 3][ ][ ][ ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 0011 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: count 0 = 1 count 1 = 0 count 2 = 0 count 3 = 1 Pass2:  index 0 = 0  index 1 = 5  index 2 = 8  index 3 = 13

23 J. Harkins23 of 51MAPLD2005/C178 1101 Example - Radix Sort 2: [ ][ ][ ][ ][13][ ][ ][ ][14][ ][ ][ ][ 3][ ][ ][ ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 0011 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: count 0 = 1 count 1 = 0 count 2 = 0 count 3 = 2 Pass2:  index 0 = 0  index 1 = 5  index 2 = 9  index 3 = 13 1110

24 J. Harkins24 of 51MAPLD2005/C178 1101 Example - Radix Sort 3: [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1101 0011 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 0011 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:  index 0 = 4  index 1 = 8  index 2 = 12  index 3 = 16 1110 1111 1010 0010 0110 0000 1000 0100 1100 0111 0101 1011 0001 1001 0100 1100 Pass3: 1000 1101 1001 1010 1011 0000 0001 0010 0011 1110 0101 1111 0110 0111

25 J. Harkins25 of 51MAPLD2005/C178 Radix Sort - MIMD Architecture Bank A Bank B Bank C Bank D Bank E Bank F FPGA 1 Radix Sort 1 33% FPGA 2 5% 3 Instances Uses enumeration sort Radix 13 bits vs. 8 bits Radix Sort 2 Radix Sort 3

26 J. Harkins26 of 51MAPLD2005/C178 MIMD Code Structure main.c int main( ) { int n = 523770*6; int64 *buf; buf = cacheAlign(n); mapSort(buf, n); free(buf); exit(0); } mapSort.mc void mapSort(int64 *buf, n) { OBM_BANK_A (bufA, int64, n/6) OBM_BANK_B (bufB, int64, n/6) OBM_BANK_F (bufF, int64, n/6) DMA_CPU(dir, bufA, stripes, buf, n); #pragma src parallel sections { #pragma src section {Xsort(bufA, n/6);} #pragma src section {Xsort(bufB, n/6);} #pragma src section {Xsort(bufF, n/6);} } DMA_CPU(dir, bufA, stripes, buf, n); return; } … …

27 J. Harkins27 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [13][ 3][14][15] [10][ 2][ 6][ 0] [ 8][ 4][12][ 7] [ 5][11][ 1][ 9] 13 3 14 15 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

28 J. Harkins28 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ ][ ][ ][ ] [10][ 2][ 6][ 0] [ 8][ 4][12][ 7] [ 5][11][ 1][ 9] 10 2 6 0 3 13 15 14 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

29 J. Harkins29 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ ][ ][ ][ ] [ 8][ 4][12][ 7] [ 5][11][ 1][ 9] 5 11 1 9 2 10 6 0 3 15 13 14 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

30 J. Harkins30 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ ][ ][ ][ ] [ 8][ 4][12][ 7] [ ][ ][ ][ ] 8 4 12 7 5 11 9 1 3 13 14 15 6 2 10 0 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

31 J. Harkins31 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ 0][ 2][ 3][ 6] [ ][ ][ ][ ] 1 12 5 8 7 9 4 11 0 2 3 6 10 13 14 15 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

32 J. Harkins32 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ 0][ 2][ 3][ 6] [10][13][14][15] [ ][ ][ ][ ] 1 7 4 5 9 12 8 11 10 13 14 15 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

33 J. Harkins33 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ 0][ 2][ 3][ 6] [10][13][14][15] [ ][ ][ ][ ] [ 1][ 4][ 5][ 7] 1 4 5 7 8 9 11 12 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

34 J. Harkins34 of 51MAPLD2005/C178 Example - Bitonic Sort 0: 1: 2: 3: LHLH HLHL LHLH HLHL LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH HLHL HLHL LHLH LHLH LHLH [ 0][ 2][ 3][ 6] [10][13][14][15] [ 8][ 9][11][12] [ 1][ 4][ 5][ 7] 8 9 11 12 Input Keys: Schedule: (0,1) (3,2) (0,2) (1,3) (0,1) (2,3)

35 J. Harkins35 of 51MAPLD2005/C178 Bitonic Sort - SIMD Architecture Bank A Bank B Bank C Bank D Bank E Bank F FPGA 1 8 Input Bitonic Sorting Network 1 27% FPGA 2 5% 2 Instances Parallel sorting network 4 Input Bitonic Sort 2 SIMD Controller

36 J. Harkins36 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH MUX Z -1 LHLH A: [ 0][ 1][ 2][ 4][ 7][11][12][14] B: [ 3][ 5][ 6][ 8][ 9][10][13][15] Input Keys: Z -2 C: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

37 J. Harkins37 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH Z -1 LHLH A: [ 0][ 1][ 2][ 4][ 7][11][12][14] B: [ 3][ 5][ 6][ 8][ 9][10][13][15] 0 3 1 5 Input Keys: Z -2 C: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

38 J. Harkins38 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH Z -1 LHLH A: [ ][ ][ 2][ 4][ 7][11][12][14] B: [ ][ ][ 6][ 8][ 9][10][13][15] 2 3 4 5 Input Keys: Z -2 0 1 C: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

39 J. Harkins39 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH Z -1 LHLH A: [ ][ ][ ][ ][ 7][11][12][14] B: [ ][ ][ 6][ 8][ 9][10][13][15] 7 3 11 5 Input Keys: Z -2 2 4 1 0 C: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

40 J. Harkins40 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH Z -1 LHLH A: [ ][ ][ ][ ][ ][ ][12][14] B: [ ][ ][ 6][ 8][ 9][10][13][15] 7 6 11 8 Input Keys: Z -2 3 5 0 4 2 1 C: [ 0][ 1][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

41 J. Harkins41 of 51MAPLD2005/C178 Example - Odd/Even Merge LHLH LHLH Z -1 LHLH A: [ ][ ][ ][ ][ ][ ][12][14] B: [ ][ ][ ][ ][ 9][10][13][15] 7 9 11 10 Input Keys: Z -2 6 8 2 5 4 3 C: [ 0][ 1][ 2][ 3][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] Merged Keys:

42 J. Harkins42 of 51MAPLD2005/C178 Odd/Even Merge - SIMD Architecture Bank A Bank B Bank C Bank D Bank E Bank F FPGA 1 Odd Merge Two 40% FPGA 2 5% 1 Instance Parallel sorting network A/B = odd ; C/D = even Even Merge Two Merge Out

43 J. Harkins43 of 51MAPLD2005/C178 SIMD Code Structure main.c int main( ) { int n = 523770*6; int64 *buf; buf = cacheAlign(n); mapSort(buf, n); free(buf); exit(0); } mapSort.mc void mapSort(int64 *buf, n) { OBM_BANK_A (AA, int64, n/6) OBM_BANK_B (BB, int64, n/6) OBM_BANK_F (FF, int64, n/6) DMA_CPU(dir, AA, stripes, buf, n); for (i=0; i<rounds; i++) { schedule( &r1, &r2); bitonicSort8(AA[r1],BB[r1],CC[r1],DD[r1], AA[r2],BB[r2],CC[r2].DD[r2], &AA[r1],&BB[r1],&CC[r1],&DD[r1], &AA[r2],&BB[r2],&CC[r2],&DD[r2]); bitonicSort4(EE[r1],FF[r1],EE[r2],FF[r2], … ); } DMA_CPU(dir, bufA, stripes, buf, n); return; } …

44 J. Harkins44 of 51MAPLD2005/C178 Implementation Comparisons Algorithm Processor Complexity Language Compiler Lines Of Code Recursion FPGA Util. % Slices MIMD SIMD Refactoring Upper Bound x10 6 keys/s Quick Sort X86N lgNC81 FPGAN lgNMC97/96n/a90,8431.58 Heap Sort X86N lgNC55- FPGAN lgNMC56/54n/a55,031.58 Radix Sort X86NC70- FPGANMC81/64n/a33,060.00 Bitonic Sort X86Nlg 2 NC78 FPGAlg 2 NVHDL53/478/365n/a27,06.32 O/E Merge X86NC52- FPGANMC71/120n/a40,060.87 = icc v8.0 -fast = mcc v1.8 = mcc v1.9 X86= Dual Xeon 2.8GHz FPGA= Virtex2XC6000 @ 100MHz MC= MAP C = entirely = major changes = some = very little = almost none

45 J. Harkins45 of 51MAPLD2005/C178 Lesson Learned #1 Compiler Quick Sort Heap Sort Radix Sort Bitonic Sort O/E Merge 2.8 GHz Xeon x10 6 keys/s gcc1.990.501.63-- icc -fast5.661.064.72-- FPGA upper bound estimate x10 6 keys/s 31.58 60.006.3260.87 Upper bound on speedup vs gcc15.8763.1636.81-- vs icc5.5829.7912.71-- Know your tools Develop accurate assessments early

46 J. Harkins46 of 51MAPLD2005/C178 Test Conditions 64 bit unsigned integer keys Uniformly distributed Randomly permuted Scores average of 10 runs FPGA configuration time ~65ms DMA time ~18ms Typical key quantity 3.14M Processor comparison: Xeon 2.8GHz, 1GB mem

47 J. Harkins47 of 51MAPLD2005/C178 Experimental Results - 64 bit keys x 10 6 keys/s Sorting Algorithms

48 J. Harkins48 of 51MAPLD2005/C178 mcc Compiler Attempts to pipeline inner loops –Maintains sequential behavior of C –Reports dependencies/penalties Quick Sort:1penalty* Heap Sort:12penalties Radix Sort:2penalties Bitonic Sort:5penalties Odd/Even Merge:1penalty Easy to build embarrassingly parallel code Resource usage ~2x HDL

49 J. Harkins49 of 51MAPLD2005/C178 Conclusion FPGAs not best choice for sorting Sorting is memory bound –Tight loops, low computation suited to processor –More parallel memory accesses –Faster clock rates Refactoring for better performance –FPGAs underutilized –Understand compiler limitations –Eliminate dependencies

50 J. Harkins50 of 51MAPLD2005/C178 Tight Loop Example Merge a[N]=b[N]=infinity; j=k=0; Loop i = 0 to 2N-1 { if (a[j] > b[k]) merged[i] = b[k++]; else merged[i] = a[j++]; }

51 J. Harkins51 of 51MAPLD2005/C178 Future Work More refactoring –Greater use of block rams –HW prediction to reduce penalties FPGA performance gain = ƒ(computation density/memory access)


Download ppt "J. Harkins1 of 51MAPLD2005/C178 Sorting on the SRC 6 Reconfigurable Computer John Harkins, Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang The George Washington."

Similar presentations


Ads by Google