Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner.

Similar presentations


Presentation on theme: "Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner."— Presentation transcript:

1 Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

2 String Matching Pattern: CCATT Text: ACTGCCATTCCTTAGGGCCATGTG Brute force & variant Automata-like strategies Suffix trees 2 To do: [CLRS] 32 Supplements #5

3 Some Applications Text & programming languages –Spell-checking –Linguistic analysis –Tokenization –Virus scanning –Spam filtering –Database querying DNA sequence analysis Music identification & analysis 3

4 Brute Force Exact Match Pattern: CCATT Text: ACTGCCATTCCTTAGGGCCATGTG Algorithm? Example of worst case? Running time? 4

5 Rabin-Karp, 1981 5

6 Intuition for Better Algorithms Pattern: CCATT Text: CCAGCCATTCCTTAGGGCCATGTG After failing match at 1 st position, what should we do? 6

7 Quick Overview of Two Algorithms 7

8 Finite Automata Matching Example Pattern: CCATT Text: CCAGCCATTCCTTAGGGCCATGTG 8 CCACCAT CCATT CCC CC A C T C T C C

9 Regular Expressions 9

10 Regular Expression to Finite Automaton 10

11 Regular Expression to Finite Automaton 11

12 12

13 Eliminating Non-Determinism Each state is a set of the old states. 13 0 1 a a b 0 1 0,1 a ba b a b a,ba,b

14 Can Minimize 14 0 1 a a b 0 0,1 a b 1 a b a b a,ba,b

15 Finite Automaton Approach Summary Preprocessing expensive for REs. Matching is flexible and still linear. 15

16 Suffix Tree (Trie) Text: CCAGAT How many leaves, nodes, edges? Total space? How to search for a pattern? Running time? 16 TA TGATAGAT C CAGAT 5 4 321 6 GAT Each edge labeled with two indices.

17 Require Suffixes to End at Leaves Text: CCAGAT Reason: Simplicity. Distinguish nodes & leaves. Example text that would break that? 17 TA TGATAGAT C CAGAT 5 4 321 6 GAT

18 Forcing Suffixes to End at Leaves Text: CCAGAC$ 18 $A C$GAC$AGAC$ C CAGAC$ 5 4 321 7 GAC$ 6 $

19 How Would You Create the Tree? Text: CCAGAC$ 19 $A C$GAC$ C 5 4 321 7 6 $AGAC$CAGAC$

20 Suffix Tree Construction Example Text: nananabanana$ 20 nananabanana$

21 Suffix Tree Construction Example Text: nananabanana$ 21 nananabanana$ananabanana$

22 Suffix Tree Construction Example Text: nananabanana$ Match nanabanana$, nananabanana$ : 4 comparisons 22 nana banana$nabanana$ ananabanana$

23 Suffix Tree Construction Example Text: nananabanana$ Match anabanana$, ananabanana$ : 3 redundant comps Next … Match nabanana$, nanabanana$ : 2 redundant comps Match abanana$, anabanana$ : 1 redundant comp 23 nana banana$nabanana$ banana$ ana

24 First Algorithm Improvement: No Redundant Matching 24

25 Suffix Tree Construction Example Text: nananabanana$ Match nanabanana$, nananabanana$ : 4 comparisons 25 nana banana$nabanana$ ananabanana$

26 Suffix Tree Construction Example Text: nananabanana$ No matching. Just form new node and adjust edge string indices. 26 nana banana$nabanana$ banana$ ana

27 Suffix Tree Construction Example Text: nananabanana$ No matching. Just form new node and adjust edge string indices. 27 na banana$nabanana$ banana$ ana na banana$

28 Suffix Tree Construction Example Text: nananabanana$ No matching. Just form new node and adjust edge string indices. 28 na banana$nabanana$ banana$ na banana$ a

29 Suffix Tree Construction Example Text: nananabanana$ 29 na banana$nabanana$ banana$ na banana$ a

30 Suffix Tree Construction Example Text: nananabanana$ anana is there, but it takes work to match & follow links. 30 na banana$nabanana$banana$ na banana$ a $ na

31 Suffix Tree Construction Example Text: nananabanana$ nana is there, but it takes work to match & follow links. 31 na banana$nabanana$banana$ na banana$ a $ na$

32 Second Algorithm Improvement: Suffix Links 32 banana$a na banana$$na banana$$nabanana$banana$$na banana$$ $

33 Suffix Tree Construction Example Text: nananabanana$ Have to follow links for match on first time. Need to create suffix link for new node. 33 na banana$nabanana$banana$ na banana$ a $ na

34 Suffix Tree Construction Example Text: nananabanana$ Need to create suffix link for new node. Use parent’s suffix link to get close. 34 na banana$nabanana$banana$ na banana$ a $ na

35 Suffix Tree Construction Example Text: nananabanana$ 35 na banana$nabanana$banana$ na banana$ a $ na

36 Suffix Tree Construction Example Text: nananabanana$ Already found node. No searching or matching. 36 na banana$nabanana$banana$ na banana$ a $ na$

37 Suffix Tree Construction Example Text: nananabanana$ Follow suffix link. 37 na banana$nabanana$banana$ na banana$ a $ na$$

38 Suffix Tree Construction Example Text: nananabanana$ Follow suffix link. 38 na banana$nabanana$banana$ na banana$ a $ na$$ $

39 Suffix Tree Construction Example Text: nananabanana$ Follow suffix link. 39 na banana$nabanana$banana$ na banana$ a $ na$$ $$

40 Suffix Tree Construction Example 40 na banana$nabanana$banana$ na banana$ a $ na$$ $$ $

41 Almost Correct Analysis of Construction 41

42 Creating & Following Suffix Links 42

43 Correct Analysis of Construction 43

44 Some Applications of Suffix Trees Search for fixed patterns Search for regular expressions Find longest common substrings, longest repeated substrings Find most commonly repeated substrings Find maximal palindromes Find Lempel-Ziv decomposition (for text compression) As used in Bioinformatics Data compression Data clustering 44

45 Supplementary Resources Exact String Matching Algorithms >30 algorithms, with animations Course on string matching (Biosequencing) Wikipedia: Suffix treesSuffix trees Tutorial on Suffix treesSuffix trees Tutorial on Suffix trees with applet & codeSuffix trees Notes on string matching and building suffix treesstring matching building suffix trees Suffix tree slides adapted from those by Guy Blelloch, CMU 15-853. 45


Download ppt "Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner."

Similar presentations


Ads by Google