Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.

Slides:



Advertisements
Similar presentations
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
1 Suffix Arrays: A new method for on-line string searches Udi Manber Gene Myers May 1989 Presented by: Oren Weimann.
Suffix Trees Come of Age in Bioinformatics Algorithms, Applications and Implementations Dan Gusfield, U.C. Davis.
MUMmer 游騰楷杜海倫 王慧芬曾俊雄 2007/01/02. Outlines Suffix Tree MUMmer 1.0 MUMmer 2.1 MUMmer 3.0 Conclusion.
What about the trees of the Mississippi? Suffix Trees explained in an algorithm for indexing large biological sequences Jacob Kleerekoper & Marjolijn Elsinga.
Suffix Trees and Suffix Arrays
15-853Page : Algorithms in the Real World Suffix Trees.
OUTLINE Suffix trees Suffix arrays Suffix trees Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the.
1 String Matching of Bit Parallel Suffix Automata.
296.3: Algorithms in the Real World
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)
Suffix Trees Suffix trees Linearized suffix trees Virtual suffix trees Suffix arrays Enhanced suffix arrays Suffix cactus, suffix vectors, …
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
Krzysztof Fabjański Common string pattern searching.
G ENOME - SCALE D ISK - BASED S UFFIX T REE I NDEXING Phoophakdee and Zaki.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
21/05/2015Applied Algorithmics - week51 Off-line text search (indexing)  Off-line text search refers to the situation in which a preprocessed digital.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Suffix Trees and Their Uses.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
GTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAA exon intron intergene Find Gene Structures in DNA Intergene State First Exon State Intron State.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Suffix trees and suffix arrays presentation by Haim Kaplan.
1 Parallel EST Clustering by Kalyanaraman, Aluru, and Kothari Nargess Memarsadeghi CMSC 838 Presentation.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Suffix Trees ALGGEN: Algorithmics and genetics group Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Dr. Xavier Messeguer.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Indexing and Searching
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Great Theoretical Ideas in Computer Science.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner.
Book: Algorithms on strings, trees and sequences by Dan Gusfield Presented by: Amir Anter and Vladimir Zoubritsky.
Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
Contents First week First week: algorithms for exact string matching: One pattern One pattern: The algorithm depends on |p| and |  k patterns k patterns:
Suffix Trees ALGGEN: Algorithmics and genetics group Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Dr. Xavier Messeguer.
Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (
Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch ( LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona.
Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
15-853:Algorithms in the Real World
Theory of Computation Lecture #
Andrzej Ehrenfeucht, University of Colorado, Boulder
Comparison of large sequences
CSE 5290: Algorithms for Bioinformatics Fall 2009
String Data Structures and Algorithms: Suffix Trees and Suffix Arrays
Contents First week: algorithms for exact string matching:
String Data Structures and Algorithms
String Data Structures and Algorithms
Suffix trees and suffix arrays
Suffix Arrays and Suffix Trees
Presentation transcript:

Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short sequences ( up to bps) Dot Matrix Pairwise align. Multiple align. Hash alg. 3. Comparison of large sequences ( more that bps) Data structures Suffix treesMUMs 4. String matching

Comparison of large sequences First part: Alignment of large sequences

Dynamic programming What about genomes? Quadratic cost of space and time. accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t Short sequences (up to bps) can be aligned using dynamic programming Quadratic cost of space and time. acc agt | | | |xx acc a--

Genomic sequences In which case Dynamic Programming can be applied? The length of sequences is 1000 times longer. Genomic sequences have millions of base pairs. The running time is times higher ! (1 second becomes 11 days) (1 minute becomes 2 years)

First assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………… Genome B ……………………………. Genome A

Realistic assumption? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………………………………………. ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B

Realistic assumptions? But, now is it a real case? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A ………………………………………………………………… ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B

Preview in a real case Chlamidia muridarum: bps Chlamidia Thrachomatis: bps        

Preview in a real case Pyrococcus abyssis: bps Pyrococcus horikoshu: bps      

Methodology of an alignment 1st: 2nd: 3th: (Linear cost) Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. (Linear cost)

Methodology of an alignment (Linear cost) Make a preview: ……………………..…. …………………...…. 1st: 2nd: 3th: Identify the portions that can be aligned. Make the alignment: …..………… ………………. ?

Preview-Revisited … a a t g….c t g... … c g t g….c c c... MatchingUniqueMaximal MUM Connect to MALGENMALGEN

Methodology of an alignment 1st: 2nd: 3th: Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. How can MUMs be found? With CLUSTALW, TCOFFEE,… How can these portions be determined? Linear cost with Suffix trees

Comparison of large sequences M-GCAT Todd Treangen

Homework 1.Javier14. Alexis 2.Dmitry15. Ramon 3.Ana Iris 4.David 5.Patricia 6.Rogeli 7.Atif 8.Aina 9.Isaac 10.Maria Merce 11.Romina 12.Guillem 13.Raul

Bioinformatics PhD. Course Second part: Introducing Suffix trees

Suffix trees Given string ababaas: 1: ababaas 2: babaas 3: abaas 4: baas 5: aas 6: as 7: s as,3 s,6 as,5 s,7 as,4 ba baas,2 a ba baas,1 a ba baas,1 ba baas,2 as,3as,4 s,6 as,5 s,7 Suffixes: What kind of queries?

Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

Quadratic insertion algorithm Given the string ………………………… P1: the leaves of suffixes from  have been inserted and the suffix-tree …...  Invariant Properties:

Quadratic insertion algorithm Given the string ababaabbs ababaabbs,1

Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1

Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1 aba baabbs,1

Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3

Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3 ba baabbs,2

Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 ba baabbs,2 abbs,4

Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1

Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5

Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5

Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1

Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6

Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6

Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7

Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,8

Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,7s,9

Generalizad suffix tree The suffix tree of many strings … and it is the suffix tree of the concatenation of strings. the generalized suffix tree of ababaabb and aabaat … is the suffix tree of ababaabαaabaatβ, : is called the generalized suffix tree … For instance,

Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given the suffix tree of ababaabα : Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1

Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3

Construction of the suffix tree of ababaabbαaabaaβ : Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Generalized suffix tree of ababaabbαaabaaβ :

Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

Applications of Suffix trees 2. The substring problem for a database of strings DB Does the DB contain any ocurrence of patterns abab, aab, and ab? a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

Applications of Suffix trees 3. The longest common substring of two strings a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

Applications of Suffix trees 4. Finding the maximal repeats. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

Applications of Suffix trees 5. Finding MUMs. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

Bioinformatics PhD. Course Third part: Suffix links

a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings aab in S 2 [1] = S 1 [5..6-7] in S 2 [1]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3]

Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6]

From UMs to MUMs Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6] Array of UMs and S 1 = a b a b a a b b α MUM: S 1 [3..6-8] in S 2 [2]

Bioinformatics PhD. Course Third part: Linear insertion algorithm

Quadratic insertion algorithm Given the string ………………………… P1: the leaves of suffixes from  have been inserted and the suffix-tree …...  Invariant Properties:

Linear insertion algorithm Given the string ………………………… P2: the string  is the longest string that can be spelt through the tree. P1: the leaves of suffixes from  have been inserted and the suffix-tree  …...   Invariant Properties:

Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4   aa 

Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4   6 7 8

Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4  Given the string ababaababb... 

Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4  Given the string ababaababb... 

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 baababb...,1 ba baababb...,2 ababb...,4 Given the string ababaababb...   baababb...,1 b b...,6 aababb...,1

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1 baababb...,2 b aababb...,2

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 8… b b...,6 aababb...,1 baababb...,2 b b...,7 aababb...,2

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a

Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a 

Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9 

Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

Index Suffix arrays Suffix-arrays: a new method for on-line string searches, G. Myers, U. Manber

Suffix arrays Given string ababaa#: 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 7: # Suffixes:… but lexicographically sorted 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: # Which is the cost?O(n log(n))

Applications of suffix arrays 1. Exact string matching Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab? 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: # Binary search O(log(n) |P|) … which is the cost? O(log(n)+|P|) ? Can it be improved to …

Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β 1 2 … n Suffix array P2: matches pref( query)

Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β γ Algorithm: 1 2 … n Suffix array P2: matches pref( query) If suff( γ )<suff(query) then α = γ else β = γ