Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II.

Similar presentations


Presentation on theme: "Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II."— Presentation transcript:

1 Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II

2 Original Algorithm - mrFAST Goal : Find out matched coordination of fragment on reference Algorithm  Create hash table  Get coordinate list  Compare against reference for each coordinate by  Edit-distance calculation --- Expansive! Problem  - High cost of edit-distance calculation (High complexity and memory accesses) 1 memory access to hash table. / 188 in average Reference DNA lookups. At least 108 character compares and at lest 324 addes Average 188 edit distance calculation for each Fragment! 2 Reference DNA Sequence Sample fragment Sequence Address: 1225 AAAA AAAC AAAC TTTT -------- Coordinate 1 1225 1225 Coordinate 2 Coordinate 2 Coordinate 3 Coordinate 3 Coordinate 1 Coordinate 2 Coordinate 3

3 Edit-Distance Calculation 3

4 New Idea : Binary Search Filtering Insight  Search expected coordinate of each fragment's substring with hash table. Pros.  + Avoid accessing to the reference sequence.  + Less memory access. 4 ACCCTTACACTAAAAA Individual DNA Sequence ACCC ACCC ------- ACTA ACTA TTAC TTTT -------- Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 1 Coordinate 2 Coordinate 3 ------- AAAA Coordinate 1 Coordinate 2 Coordinate 3 m m d d m+4 n n m+8 n+4 p p t t Coordinate 1 Coordinate 2 Coordinate 3 f f m+12 n+11 n+7 …CAGTACCCTTACACTAAAAAGTMTTCCAAACC… Reference DNA Sequence m m m+4 m+8 m+12

5 Load imbalance of Hash-table 5 These keys have really large entries

6 New Idea : Prefiltering to load balancing Insight  Pick the cheap keys in binary search filtering, which has small coordinate list size Pros.  + Reducing # of binary search.  + Balancing computation Load of binary search. AAAATTACACTAAAAA Individual DNA Sequence Balance the load of binary search computation by selecting key, based on the coordinate size.

7 Effectiveness of Binary Search Filtering 7 We want all dots to fall into the left box. As left as possible!

8 Effectiveness of Binary Search Filtering 8

9 Future Work Implement in GPU Analyze the load imbalance problem  Coordinates passed binary search filtering may vary Solve the divergence problem  Edit distance may diverge  Divergence is bad for GPU SIMT model 9

10 Q&A Thank you! 10


Download ppt "Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II."

Similar presentations


Ads by Google