Download presentation
Presentation is loading. Please wait.
Published byDaisy Chase Modified over 8 years ago
1
Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II
2
Original Algorithm - mrFAST Goal : Find out matched coordination of fragment on reference Algorithm Create hash table Get coordinate list Compare against reference for each coordinate by Edit-distance calculation --- Expansive! Problem - High cost of edit-distance calculation (High complexity and memory accesses) 1 memory access to hash table. / 188 in average Reference DNA lookups. At least 108 character compares and at lest 324 addes Average 188 edit distance calculation for each Fragment! 2 Reference DNA Sequence Sample fragment Sequence Address: 1225 AAAA AAAC AAAC TTTT -------- Coordinate 1 1225 1225 Coordinate 2 Coordinate 2 Coordinate 3 Coordinate 3 Coordinate 1 Coordinate 2 Coordinate 3
3
Edit-Distance Calculation 3
4
New Idea : Binary Search Filtering Insight Search expected coordinate of each fragment's substring with hash table. Pros. + Avoid accessing to the reference sequence. + Less memory access. 4 ACCCTTACACTAAAAA Individual DNA Sequence ACCC ACCC ------- ACTA ACTA TTAC TTTT -------- Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 1 Coordinate 2 Coordinate 3 ------- AAAA Coordinate 1 Coordinate 2 Coordinate 3 m m d d m+4 n n m+8 n+4 p p t t Coordinate 1 Coordinate 2 Coordinate 3 f f m+12 n+11 n+7 …CAGTACCCTTACACTAAAAAGTMTTCCAAACC… Reference DNA Sequence m m m+4 m+8 m+12
5
Load imbalance of Hash-table 5 These keys have really large entries
6
New Idea : Prefiltering to load balancing Insight Pick the cheap keys in binary search filtering, which has small coordinate list size Pros. + Reducing # of binary search. + Balancing computation Load of binary search. AAAATTACACTAAAAA Individual DNA Sequence Balance the load of binary search computation by selecting key, based on the coordinate size.
7
Effectiveness of Binary Search Filtering 7 We want all dots to fall into the left box. As left as possible!
8
Effectiveness of Binary Search Filtering 8
9
Future Work Implement in GPU Analyze the load imbalance problem Coordinates passed binary search filtering may vary Solve the divergence problem Edit distance may diverge Divergence is bad for GPU SIMT model 9
10
Q&A Thank you! 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.