Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Examples Concepts & Definitions Analysis of Algorithms
Chapter 4: Trees Part II - AVL Tree
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Minimum Spanning Trees
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Chapter 9: Searching, Sorting, and Algorithm Analysis
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
1 Complexity of Network Synchronization Raeda Naamnieh.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
1 Amortized Analysis Consider an algorithm in which an operation is executed repeatedly with the property that its running time fluctuates throughout the.
CPSC 668Set 2: Basic Graph Algorithms1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Efficiency of Algorithms
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Fast Finds: Making Google & BLAST Faster Dr. Laurence Boxer (w. Stephen Englert, NU CIS/MAT ’05) Dept. of Computer & Information Sciences Presented to.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
A Pre-Processing Algorithm for String Pattern Matching Laurence Boxer Department of Computer and Information Sciences Niagara University and Department.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Hashing General idea: Get a large array
Data Structure Algorithm Analysis TA: Abbas Sarraf
The Complexity of Algorithms and the Lower Bounds of Problems
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Broadcast & Convergecast Downcast & Upcast
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
{ CS203 Lecture 7 John Hurley Cal State LA. 2 Execution Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Analysis of Algorithms
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CS 221 Analysis of Algorithms Instructor: Don McLaughlin.
Chapter 7: Sorting Algorithms Insertion Sort. Sorting Algorithms  Insertion Sort  Shell Sort  Heap Sort  Merge Sort  Quick Sort 2.
Simple Iterative Sorting Sorting as a means to study data structures and algorithms Historical notes Swapping records Swapping pointers to records Description,
CSC 211 Data Structures Lecture 13
1 Today’s Material Iterative Sorting Algorithms –Sorting - Definitions –Bubble Sort –Selection Sort –Insertion Sort.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
Quicksort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Sorting and Searching by Dr P.Padmanabham Professor (CSE)&Director
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 2: Basic Graph Algorithms 1.
Lecture 15 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Student: Fan Bai Instructor: Dr. Sushil Prasad CSc8530.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Internal and External Sorting External Searching
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Course Code #IDCGRF001-A 5.1: Searching and sorting concepts Programming Techniques.
1 UNIT-I BRUTE FORCE ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 3:
Searching Topics Sequential Search Binary Search.
Basic Communication Operations Carl Tropper Department of Computer Science.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
The Complexity of Algorithms and the Lower Bounds of Problems
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2002 Bin Sort, Radix.
Coarse Grained Parallel Selection
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2001 Bin Sort, Radix.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CENG 351 Data Management and File Structures
Presentation transcript:

Faster finds from Gallo to Google Presented to the Niagara University Bioinformatics Seminar Dr. Laurence Boxer Department of Computer and Information Sciences Applications to string search problems from: L. Boxer and R. Miller, Coarse Grained Gather and Scatter Operations with Applications, Journal of Parallel and Distributed Computing, 64 (2004),

The Problem: Given two character strings, a “pattern” and a “text” (with the text typically much larger than the pattern), find all matching copies of the pattern in the text. P: agtacagtac T: actaactagtacagtacagtacaactgtccatccg P: Gallo T: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Examples using case-insensitive exact matches Output: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Output:

Additional “finds” when a small number of errors (mismatch, insert, delete) are permitted P: Gallo T: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows? Better they should have a singalong…. He used a lame pickup line: “Is this little gal lonely?” Output: If Professor Gallo serves many gallons of home-brewed wine to students who do dastardly deeds in the hallowed DePaul hallways, how many will go to the gallows?... Better they should have a singalong.… He used a lame pickup line: “Is this little gal lonely?” 1 character mismatch: “h” for “g” Must insert one “l” for perfect match Must delete one space for perfect match

Analysis of algorithms Seek to estimate proportional running time T(n) of an algorithm when applied to a data set of size n. T(n) = Θ(f(n)) if, for large n, T(n) is approximately proportional to f(n). T(n) = O(f(n)) if, for large n, T(n) < something that’s Θ(f(n)). Emphasis on large n; for small n, even an inefficient algorithm may finish in acceptable time.

Example: Sequential Sorting Algorithms nSelection SortMerge Sort , , , ,536 2,048

Previous State of Knowledge for exact string matching (algorithms for sequential computers) In the worst case, all the input must be considered (otherwise, we may miss a match). There exist Θ(n)-time solutions for sequential computers, which, therefore, are optimal in the worst case. However, n may be so large that Θ(n) time may be unacceptable. Speedup may come by using sequential algorithms highly probable to run faster than worst-case time (topic of another talk). We may use parallel computers to get faster results (topic of today’s talk). Using absolute value notation for # of characters in string, suppose |T| = n, |P| = m, where 1 < m < n (usually, m << n). Therefore, input size is Θ(m+n). Since n < m+n < n+n = 2n, input size is Θ(n).

Parallel vs. sequential computers Ideally, a parallel computer with q processors should solve a problem in 1/q – th of the time that a sequential computer requires. Thus, if is the time for a sequential computer to solve a given problem, then we want the parallel computer to use But achieving this level of speedup may be difficult or impossible, because time is required to exchange data among processors. The time required for standard data exchange operations depends on the configuration of processors.

Examples of parallel architectures with times to broadcast a unit of data Linear array. q-1 = Θ(q) steps to send a unit of data from leftmost to rightmost processor 1.Source row (linear array) broadcasts across row. 2. In parallel, each column linear array broadcasts across column.

Example - tree In 1 st step, root broadcasts to each of its “children;” in subsequent steps, in parallel, nodes at a given level that have just received the datum broadcast to their children. Thus, time is proportional to number of levels, Θ(log q).

Communications problems for string matching problems Data is distributed (in segments of consecutive characters) among processors: Occurrences of matches may be broken among processors. Hence want to share copies of 1 st m-1 characters of T in a processor with processor containing previous segment of T. Would be useful to have copy of P in each processor.

Suppose we take the following steps: 1.Each processor gets a copy of all of P. 2.Each processor gets the 1 st m-1 characters of T initially stored in the processor with the next segment of T. Then, in parallel, each processor can run an optimal sequential algorithm on its portion of the data in time. For the exact matching problem … P: Gallo who lows …

So, how do we perform these data movements efficiently? Keys: efficient gather and scatter operations Gather: given a unit of data in each processor, get a copy of each of these values into one processor.

Scatter: return gathered items to their original processors (typically after modification by a sequential algorithm)

How to gather/scatter efficiently (q = # of processors) If not already known, identify a minimal spanning tree (MST) rooted at the processor to which data is to be gathered. This is done as follows: Root sends message to each neighbor. Each non-root processor waits for a message. First message to arrive identifies processor’s parent. Upon receipt, send message to each neighbor identifying sender’s parent. To scatter efficiently: reverse the direction of data flow for a gather operation: Θ(q) time. Performing the gather: In parallel, each processor sends data to its parent processor in the MST until each value reaches the root processor. This takes Θ(q) time. Thus, a gather operation takes Θ(q) time. Each processor receives messages described above. If A receives a message from B identifying A as parent of B, A knows B is A’s child. Advanced techniques show this takes O(q) time.

Getting a complete copy of P to each processor, assuming m < n/q (P small enough to fit one processor) Gather a dummy record from each processor to one processor – Θ(q) time. Gather P to this processor, pipelining the data flow if more than one character of P is stored in any processor. Time is Θ(m+q) = Θ(max{m,q}). For each character of P, tag each dummy record with the character and scatter, pipelining. Pipelining allows reduction of the time from what one might expect to require Θ(mq) time (m separate scatters of Θ(q) time apiece) to Θ(md+q) = Θ(max{md,q}) (m scatters that overlap in time), where d (degree bound) is the maximum number of neighboring processors to any given processor (1 < d < q - 1). Total time: Θ(md+q) = Θ(max{md,q}). If both md < n/q and q < n/q, the total time is O(n/q).

Getting each processor the m-1 characters of T that follow the processor’s last character of T (case 1): Suppose processors holding consecutive segments of T are adjacent (this is possible for linear arrays, meshes using snake-like order for processors, hypercubes; not for trees, etc). Then: In parallel, each odd-numbered processor gets the 1 st m-1 characters of T that are stored in. This takes Θ(m) time via direct communication (since these processors are adjacent). Similarly, in parallel, each even-numbered processor gets the 1 st m-1 characters of T that are stored in. This takes Θ(m) time via direct communication. Thus, total time for this process is Θ(m).

Getting each processor the m-1 characters of T that follow the processor’s last character of T (case 2): Suppose processors holding consecutive segments of T are not adjacent. Then: In parallel, each processor copies its 1 st m-1 characters of T with tags containing the index of the processor with the previous segment. This takes Θ(m) time. Sort these (m-1) q = Θ(mq) data values by their processor index tags so that they each end up in the processor with the previous segment. This takes time. Thus, total time for this task is

Thus, we have the following algorithm for the exact string pattern matching problem on a coarse grained parallel computer with q processors: 0) T is distributed among processors in segments of n/q characters apiece. 1)Distribute to each processor a copy of all of P as described above, in Θ(md+q) = Θ(max{md,q}) time. If both q < n/q (coarse grained parallel computer) and md < n/q, the total time is O(n/q). 2)Distribute to each processor a copy of the 1 st m-1 characters of the next segment of T. This takes Θ(m) time if processors with consecutive segments are adjacent; time otherwise. 3)Each processor runs an optimal sequential algorithm on its n/q+m-1 characters of T in time. This reduces to Θ(n/q), since m=O(n/q).

Thus, we get optimal worst-case running time Θ(n/q) under the following conditions: If processors with consecutive segments of T are adjacent, when q < n/q (equivalently, ) and md < n/q; i.e., if max{md, q} < n/q. If processors with consecutive segments of T are not adjacent, we need the stronger restriction, which is true, for example, when - equivalently, when.