Presentation is loading. Please wait.

Presentation is loading. Please wait.

String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton.

Similar presentations


Presentation on theme: "String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton."— Presentation transcript:

1 String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton

2 One way to parallelize is: Consider a huge text document ( something like an encyclopedia available electronically ) and you want to search through it for several words or phrases or sentences at the same time. Consider a huge text document ( something like an encyclopedia available electronically ) and you want to search through it for several words or phrases or sentences at the same time. We call what we are searching as “search_string”. We call what we are searching as “search_string”. Rather than having one processor look for all the search_strings in the given huge document, we could take advantage of parallel processing and have 10 different processors look for 10 different search_strings simultaneously thereby doing the searching really quickly and efficiently. Rather than having one processor look for all the search_strings in the given huge document, we could take advantage of parallel processing and have 10 different processors look for 10 different search_strings simultaneously thereby doing the searching really quickly and efficiently.

3 One way to parallelize is: My first program basically accomplishes this objective. My first program basically accomplishes this objective. The document in which I am searching for search_strings is an actual document, collection of William Shakespeare’s works, downloaded from an online resource and consists of approximately 400 Million characters. The document in which I am searching for search_strings is an actual document, collection of William Shakespeare’s works, downloaded from an online resource and consists of approximately 400 Million characters. My program is capable of handling up to 450 Million characters. My program is capable of handling up to 450 Million characters.

4 Second Way to Parallelize Think of this scenario: Think of this scenario: I have to look up the available huge electronic document (again imagine an encyclopedia ) for just one word or phrase or sentence at a time. I have to look up the available huge electronic document (again imagine an encyclopedia ) for just one word or phrase or sentence at a time. How do I take advantage of parallel processing? How do I take advantage of parallel processing? Simple! Simple! Divide the whole document into as many equal parts Divide the whole document into as many equal parts as there are processors. Let’s call these “sub- documents” and allot each sub-document to one processor. as there are processors. Let’s call these “sub- documents” and allot each sub-document to one processor. Now, what do we do with these sub-documents? Now, what do we do with these sub-documents?

5 Second Way to Parallelize Yes, you are right! Yes, you are right! Have each of the processors search for the search_string in only the sub-document that it has been allotted. Have each of the processors search for the search_string in only the sub-document that it has been allotted. Sounds great! So, how do I code it? Sounds great! So, how do I code it? Using MPI_Scatter Of Course! Using MPI_Scatter Of Course! Note: This program works when no. of processors are 10 and above, for less no. of processors, the buffer gets exceeded for MPI_Scatter command. Note: This program works when no. of processors are 10 and above, for less no. of processors, the buffer gets exceeded for MPI_Scatter command.

6 Comparison of Times See Table of Comparisons. See Table of Comparisons.

7 Algorithm for String Searching int string_searching_algo (char *string, char *search_string) { int string_searching_algo (char *string, char *search_string) { int i, j, k; int i, j, k; int count = 0, occurences = 0; int count = 0, occurences = 0; const int len_search_string = strlen ( search_string ); const int len_search_string = strlen ( search_string ); const int len_given_string = strlen ( string ); const int len_given_string = strlen ( string ); for (i = 0; i <= (len_given_string - len_search_string); i++ ) { for (i = 0; i <= (len_given_string - len_search_string); i++ ) { count = 0; count = 0; for(j = i,k = 0; k < (len_search_string) ; j++, k++) { for(j = i,k = 0; k < (len_search_string) ; j++, k++) { if ( *(string + j) != *(search_string + k) ) { if ( *(string + j) != *(search_string + k) ) { break; break; } else { } else { count++; count++; } } if ( count == len_search_string ) { if ( count == len_search_string ) { occurences++; occurences++; } } } } return occurences; return occurences; }

8 Conclusion String searching done in parallel saves a lot of time especially when string searching needs to be done in an extremely huge document and is more efficient than single-processor searching. String searching done in parallel saves a lot of time especially when string searching needs to be done in an extremely huge document and is more efficient than single-processor searching. One way to parallelize is to have several processors search different strings in one document in parallel and second way is to have several processors search for the same string in different portions(sub-documents) of the same document in parallel. One way to parallelize is to have several processors search different strings in one document in parallel and second way is to have several processors search for the same string in different portions(sub-documents) of the same document in parallel.

9 One Problem however… The second program that uses MPI_Scatter has one drawback that is, when a search_string overlaps in two sub-documents (one portion of it exists at the end of one sub-document and the other portion of the search-string exists at the beginning of next sub- document, available with some other processor), then the program will not give proper results. The second program that uses MPI_Scatter has one drawback that is, when a search_string overlaps in two sub-documents (one portion of it exists at the end of one sub-document and the other portion of the search-string exists at the beginning of next sub- document, available with some other processor), then the program will not give proper results.


Download ppt "String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton."

Similar presentations


Ads by Google