String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton.

Slides:



Advertisements
Similar presentations
Variables in C Amir Haider Lecturer.
Advertisements

C Programming lecture 2 Beautiful programs Greek algorithms
Messiaen Quartet for the end of time And another.
Structured ASIC Xcellence Framescript A case study demonstrating the power of Framescript to automate the generation of a Data Book or Catalogue.
ARDUINO CLUB Session 1: C & An Introduction to Linux.
Templates in C++. Generic Programming Programming/developing algorithms with the abstraction of types The uses of the abstract type define the necessary.
Strings.
Lecture 9. Lecture 9: Outline Strings [Kochan, chap. 10] –Character Arrays/ Character Strings –Initializing Character Strings. The null string. –Escape.
 2000 Prentice Hall, Inc. All rights reserved Fundamentals of Strings and Characters String declarations –Declare as a character array or a variable.
Procedural programming in Java
Final Project of Information Retrieval and Extraction by d 吳蕙如.
1 CS 177 Week 12 Recitation Slides Running Time and Performance.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
PHYS 2020 Making Choices; Arrays. Arrays  An array is very much like a matrix.  In the C language, an array is a collection of variables, all of the.
Introduction to a Programming Environment
Chapter 8 Arrays and Strings
Recursion In general there are two approaches to writing repetitive algorithms. One uses loops(while, do while and for): the other uses recursion. Recursion.
Writing an Essay discussing Advantages and Disadvantages of something.
Writing an Essay discussing Advantages and Disadvantages of something
Recitation 1 Programming for Engineers in Python.
Strings in C. Strings are Character Arrays Strings in C are simply arrays of characters. – Example:char s [10]; This is a ten (10) element array that.
Tutorial for Arrays and Lists By Ruthie Tucker. Description This presentation will cover the basics of using Arrays and Lists in an Alice world This presentation.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 15: Linked data structures.
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware presented by Tianyuan Chen.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Chapter 8 Arrays and Strings
Folder & File Management By Computer Magic Presented by Jane Cable.
Microsoft Word ITE115 Trisha Cummings. MsWord - Word Processing Program Allows you to create Letters, Envelopes, Mailing Labels, Memo’s , Fax’s.
CIS3023: Programming Fundamentals for CIS Majors II Summer 2010 Ganesh Viswanathan Searching Course Lecture Slides 28 May 2010 “ Some things Man was never.
Objective At the conclusion of this chapter you will be able to:
Timothy J. Ham Western Michigan University April 23, 2010.
How to Read Code Benfeard Williams 6/11/2015 Susie’s lecture notes are in the presenter’s notes, below the slides Disclaimer: Susie may have made errors.
CSC Intro. to Computing Lecture 12: PALGO. Announcements Homework #3 solutions available  Download from Blackboard/web Quiz #3 will be in class.
1 CS 177 Week 12 Recitation Slides Running Time and Performance.
Objectives At the end of the class, students are expected to be able to do the following: Understand the searching technique concept and the purpose of.
Character Encoding & Handling doubles Pepper. Character encoding schemes EBCDIC – older with jumps in alphabet ASCII 1967 (7 bit)– Handled English, –ASCII.
Python Mini-Course University of Oklahoma Department of Psychology Day 3 – Lesson 11 Using strings and sequences 5/02/09 Python Mini-Course: Day 3 – Lesson.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
COP 3275 – Character Strings and Introduction to Pointers Instructor: Diego Rivera-Gutierrez.
CS321 Data Structures Jan Lecture 2 Introduction.
Chapter 8 Characters and Strings. Objectives In this chapter, you will learn: –To be able to use the functions of the character handling library ( ctype).
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI N305 Characters and Strings Functions.
University of Macau Faculty of Science and Technology Programming Languages Architecture SFTW 241 spring 2004 Class B Group 3.
Lecturer: Nguyen Thi Hien Software Engineering Department Home page: hienngong.wordpress.com Chapter 2: Language C++
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Modern Information Retrieval
Variables in C Topics  Naming Variables  Declaring Variables  Using Variables  The Assignment Statement Reading  Sections
Declaring variables The type could be: int double char String name is anything you want like lowerCaseWord.
String Abstract Data Type Timothy J. Ham. What is it? Quite literally, the string abstract data type is "an array of [characters]. The array consists.
Using Find, Replace, Go To In Microsoft Word By Krysia Biville.
Dividing Fractions Part 1: Dividing a Whole Number by a Unit Fractions.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
Online Documents – File Compression File size can be a big deal Like when you want more music on your phone Or work on your USB stick Or when you.
Loop Design What goes into coding a loop. Considerations for Loop Design ● There are basically two kinds of loops: ● Those that form some accumulated.
Sorts, CompareTo Method and Strings
The Data Types and Data Structures
6/16/2010 Parallel Performance Parallel Performance.
Section 10.3b Quick Sort.
Introduction to C++ Recursion
Fork and Exec Unix Model
Unit-2 Divide and Conquer
Govt. Polytechnic,Dhangar
Coding Concepts (Data- Types)
Text Analyzer BIS1523 – Lecture 14.
16 Strings.
Lecture 7 Algorithm Design & Implementation. All problems can be solved by employing any one of the following building blocks or their combinations 1.
Tutorial for Arrays and Lists
Data Structures and Algorithm: SEARCHING TECHNIQUES
Characters and Strings Functions
EPSII 59:006 Spring 2004.
Presentation transcript:

String Searching In Parallel By Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton

One way to parallelize is: Consider a huge text document ( something like an encyclopedia available electronically ) and you want to search through it for several words or phrases or sentences at the same time. Consider a huge text document ( something like an encyclopedia available electronically ) and you want to search through it for several words or phrases or sentences at the same time. We call what we are searching as “search_string”. We call what we are searching as “search_string”. Rather than having one processor look for all the search_strings in the given huge document, we could take advantage of parallel processing and have 10 different processors look for 10 different search_strings simultaneously thereby doing the searching really quickly and efficiently. Rather than having one processor look for all the search_strings in the given huge document, we could take advantage of parallel processing and have 10 different processors look for 10 different search_strings simultaneously thereby doing the searching really quickly and efficiently.

One way to parallelize is: My first program basically accomplishes this objective. My first program basically accomplishes this objective. The document in which I am searching for search_strings is an actual document, collection of William Shakespeare’s works, downloaded from an online resource and consists of approximately 400 Million characters. The document in which I am searching for search_strings is an actual document, collection of William Shakespeare’s works, downloaded from an online resource and consists of approximately 400 Million characters. My program is capable of handling up to 450 Million characters. My program is capable of handling up to 450 Million characters.

Second Way to Parallelize Think of this scenario: Think of this scenario: I have to look up the available huge electronic document (again imagine an encyclopedia ) for just one word or phrase or sentence at a time. I have to look up the available huge electronic document (again imagine an encyclopedia ) for just one word or phrase or sentence at a time. How do I take advantage of parallel processing? How do I take advantage of parallel processing? Simple! Simple! Divide the whole document into as many equal parts Divide the whole document into as many equal parts as there are processors. Let’s call these “sub- documents” and allot each sub-document to one processor. as there are processors. Let’s call these “sub- documents” and allot each sub-document to one processor. Now, what do we do with these sub-documents? Now, what do we do with these sub-documents?

Second Way to Parallelize Yes, you are right! Yes, you are right! Have each of the processors search for the search_string in only the sub-document that it has been allotted. Have each of the processors search for the search_string in only the sub-document that it has been allotted. Sounds great! So, how do I code it? Sounds great! So, how do I code it? Using MPI_Scatter Of Course! Using MPI_Scatter Of Course! Note: This program works when no. of processors are 10 and above, for less no. of processors, the buffer gets exceeded for MPI_Scatter command. Note: This program works when no. of processors are 10 and above, for less no. of processors, the buffer gets exceeded for MPI_Scatter command.

Comparison of Times See Table of Comparisons. See Table of Comparisons.

Algorithm for String Searching int string_searching_algo (char *string, char *search_string) { int string_searching_algo (char *string, char *search_string) { int i, j, k; int i, j, k; int count = 0, occurences = 0; int count = 0, occurences = 0; const int len_search_string = strlen ( search_string ); const int len_search_string = strlen ( search_string ); const int len_given_string = strlen ( string ); const int len_given_string = strlen ( string ); for (i = 0; i <= (len_given_string - len_search_string); i++ ) { for (i = 0; i <= (len_given_string - len_search_string); i++ ) { count = 0; count = 0; for(j = i,k = 0; k < (len_search_string) ; j++, k++) { for(j = i,k = 0; k < (len_search_string) ; j++, k++) { if ( *(string + j) != *(search_string + k) ) { if ( *(string + j) != *(search_string + k) ) { break; break; } else { } else { count++; count++; } } if ( count == len_search_string ) { if ( count == len_search_string ) { occurences++; occurences++; } } } } return occurences; return occurences; }

Conclusion String searching done in parallel saves a lot of time especially when string searching needs to be done in an extremely huge document and is more efficient than single-processor searching. String searching done in parallel saves a lot of time especially when string searching needs to be done in an extremely huge document and is more efficient than single-processor searching. One way to parallelize is to have several processors search different strings in one document in parallel and second way is to have several processors search for the same string in different portions(sub-documents) of the same document in parallel. One way to parallelize is to have several processors search different strings in one document in parallel and second way is to have several processors search for the same string in different portions(sub-documents) of the same document in parallel.

One Problem however… The second program that uses MPI_Scatter has one drawback that is, when a search_string overlaps in two sub-documents (one portion of it exists at the end of one sub-document and the other portion of the search-string exists at the beginning of next sub- document, available with some other processor), then the program will not give proper results. The second program that uses MPI_Scatter has one drawback that is, when a search_string overlaps in two sub-documents (one portion of it exists at the end of one sub-document and the other portion of the search-string exists at the beginning of next sub- document, available with some other processor), then the program will not give proper results.