Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information.

Similar presentations


Presentation on theme: "Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information."— Presentation transcript:

1 Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information Sciences Applications to string search problems from: L. Boxer and R. Miller, Coarse Grained Gather and Scatter Operations with Applications, Journal of Parallel and Distributed Computing, 64 (2004), 1297-1320

2 The Problem: Given two character strings, a “pattern” and a “text” (with the text typically much larger than the pattern), find all matching copies of the pattern in the text. P: agtacagtac T: actaactagtacagtacagtacaactgtccatccg P: Gallo T: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Examples using case-insensitive exact matches Output: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Output:

3 Additional “finds” when a small number of errors (mismatch, insert, delete) are permitted P: Gallo T: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Output: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows?... Better they should have a singalong.… He used a lame pickup line: “Is this little gal lonely?” 1 character mismatch: “h” for “g” Must insert one “l” for perfect match Must delete one space for perfect match

4 Analysis of algorithms Seek to estimate proportional running time T(n) of an algorithm when applied to a data set of size n. T(n) = Θ(f(n)) if, for large n, T(n) is approximately proportional to f(n). T(n) = O(f(n)) if, for large n, T(n) < something that’s Θ(f(n)). Emphasis on large n; for small n, even an inefficient algorithm may finish in acceptable time.

5 Example: Sequential Sorting Algorithms nSelection SortMerge Sort 2 4 2 4 16 8 8 64 24 16 256 64 32 1,024 160 64 4,096 384 128 16,384 896 256 65,536 2,048

6 Previous State of Knowledge for exact string matching (algorithms for sequential computers) In the worst case, all the input must be considered (otherwise, we may miss a match). There exist Θ(n)-time solutions for sequential computers, which, therefore, are optimal in the worst case. However, n may be so large that Θ(n) time may be unacceptable. Speedup may come by using sequential algorithms highly probable to run faster than worst-case time (topic of another talk). We may use parallel computers to get faster results (topic of today’s talk). Using absolute value notation for # of characters in string, suppose |T| = n, |P| = m, where 1 < m < n (usually, m << n). Therefore, input size is Θ(m+n). Since n < m+n < n+n = 2n, input size is Θ(n).

7 Parallel vs. sequential computers Ideally, a parallel computer with q processors should solve a problem in 1/q – th of the time that a sequential computer requires. Thus, if is the time for a sequential computer to solve a given problem, then we want the parallel computer to use But achieving this level of speedup may be difficult or impossible, because time is required to exchange data among processors. The time required for standard data exchange operations depends on the configuration of processors.

8 Examples of parallel architectures with times to broadcast a unit of data Linear array. q-1 = Θ(q) steps to send a unit of data from leftmost to rightmost processor 1.Source row (linear array) broadcasts across row. 2. In parallel, each column linear array broadcasts across column.

9 Example - tree In 1 st step, root broadcasts to each of its “children;” in subsequent steps, in parallel, nodes at a given level that have just received the datum broadcast to their children. Thus, time is proportional to number of levels, Θ(log q).

10 Communications problems for string matching problems Data is distributed (in segments of consecutive characters) among processors: Occurrences of matches may be broken among processors. Hence want to share copies of 1 st m-1 characters of T in a processor with processor containing previous segment of T. Would be useful to have copy of P in each processor.

11 Suppose we take the following steps: 1.Each processor gets a copy of all of P. 2.Each processor gets the 1 st m-1 characters of T initially stored in the processor with the next segment of T. Then, in parallel, each processor can run an optimal sequential algorithm on its portion of the data in time. For the exact matching problem … ------------ P: Gallo who lows …

12 So, how do we perform these data movements efficiently? Keys: efficient gather and scatter operations Gather: given a unit of data in each processor, get a copy of each of these values into one processor.

13 Scatter: return gathered items to their original processors (typically after modification by a sequential algorithm)

14 How to gather/scatter efficiently (q = # of processors) If not already known, identify a minimal spanning tree (MST) rooted at the processor to which data is to be gathered. This is done as follows: Root sends message to each neighbor. Each non-root processor waits for a message. First message to arrive identifies processor’s parent. Upon receipt, send message to each neighbor identifying sender’s parent. To scatter efficiently: reverse the direction of data flow for a gather operation: Θ(q) time. Performing the gather: In parallel, each processor sends data to its parent processor in the MST until each value reaches the root processor. This takes Θ(q) time. Thus, a gather operation takes Θ(q) time. Each processor receives messages described above. If A receives a message from B identifying A as parent of B, A knows B is A’s child. Advanced techniques show this takes O(q) time.

15 Getting a complete copy of P to each processor, assuming m < n/q (P small enough to fit one processor) Gather a dummy record from each processor to one processor – Θ(q) time. Gather P to this processor, pipelining the data flow if more than one character of P is stored in any processor. Time is Θ(m+q) = Θ(max{m,q}). For each character of P, tag each dummy record with the character and scatter, pipelining. Pipelining allows reduction of the time from what one might expect to require Θ(mq) time (m separate scatters of Θ(q) time apiece) to Θ(md+q) = Θ(max{md,q}) (m scatters that overlap in time), where d (degree bound) is the maximum number of neighboring processors to any given processor (1 < d < q - 1). Total time: Θ(md+q) = Θ(max{md,q}). If both md < n/q and q < n/q, the total time is O(n/q).

16 Getting each processor the m-1 characters of T that follow the processor’s last character of T (case 1): Suppose processors holding consecutive segments of T are adjacent (this is possible for linear arrays, meshes using snake-like order for processors, hypercubes; not for trees, etc). Then: In parallel, each odd-numbered processor gets the 1 st m-1 characters of T that are stored in. This takes Θ(m) time via direct communication (since these processors are adjacent). Similarly, in parallel, each even-numbered processor gets the 1 st m-1 characters of T that are stored in. This takes Θ(m) time via direct communication. Thus, total time for this process is Θ(m).

17 Getting each processor the m-1 characters of T that follow the processor’s last character of T (case 2): Suppose processors holding consecutive segments of T are not adjacent. Then: In parallel, each processor copies its 1 st m-1 characters of T with tags containing the index of the processor with the previous segment. This takes Θ(m) time. Sort these (m-1) q = Θ(mq) data values by their processor index tags so that they each end up in the processor with the previous segment. This takes time. Thus, total time for this task is

18 Thus, we have the following algorithm for the exact string pattern matching problem on a coarse grained parallel computer with q processors: 0) T is distributed among processors in segments of n/q characters apiece. 1)Distribute to each processor a copy of all of P as described above, in Θ(md+q) = Θ(max{md,q}) time. If both q < n/q (coarse grained parallel computer) and md < n/q, the total time is O(n/q). 2)Distribute to each processor a copy of the 1 st m-1 characters of the next segment of T. This takes Θ(m) time if processors with consecutive segments are adjacent; time otherwise. 3)Each processor runs an optimal sequential algorithm on its n/q+m-1 characters of T in time. This reduces to Θ(n/q), since m=O(n/q).

19 Thus, we get optimal worst-case running time Θ(n/q) under the following conditions: If processors with consecutive segments of T are adjacent, when q < n/q (equivalently, ) and md < n/q; i.e., if max{md, q} < n/q. If processors with consecutive segments of T are not adjacent, we need the stronger restriction, which is true, for example, when - equivalently, when.


Download ppt "Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information."

Similar presentations


Ads by Google