Download presentation
Presentation is loading. Please wait.
Published byTiffany Morton Modified over 9 years ago
1
Pattern Matching Rhys Price Jones Anne R. Haake
2
Pattern matching algorithms - Review Finding all occurrences of pattern p in text t P has length m, t has length n Naïve algorithm, Rabin-Karp both have worst- case O(mn) and expected case O(m+n) behavior Automaton approach preprocesses p to yield a O(n) algorithm
3
Suffix Tree Preprocess t To yield a O(m) algorithm Useful if t is fixed and there are lots of p’s that you want to search for.
4
Tries Are often used for retrieval of keywords to provide efficient indexing. Suppose you want to index: –PATTERN –MONKEY –PATAPAN –PROBOSCIS –PATHETIC
5
Build a TRIE For PATTERN PATTERN
6
Build a TRIE For PATTERN MONKEY PATTERNMONKEY
7
Build a TRIE For PATTERN MONKEY PATAPAN PAT MONKEY APAN TERN
8
Build a TRIE For PATTERN MONKEY PATAPAN PROBOSCIS MONKEY APAN P TERN ROBOSCIS AT
9
Build a TRIE For PATTERN MONKEY PATAPAN PROBOSCIS PATHETIC P M HETIC APAN TERN ROBOSCIS AT Each keyword can be located in O(k) steps where k is the length of the keyword
10
Applications of a Trie Dictionary –Just need to check if you get to a leaf to know the word exists –Or store a link to the word’s definition at the leaf Index for a book –Store a list of all pages where the keyword appears at the leaf Finding reserved words or filtering unwanted words …
11
Suffix Tree For a text t Is a Trie for the set of suffixes of t BIOINFORMATICS IOINFORMATICS OINFORMATICS INFORMATICS … ICS CS S Build it on the board
12
Suffix Tree For ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T ACACTACT 0
13
Suffix Tree For ACACTACT ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T ACACTACT CACTACT 0 1
14
Suffix Tree For ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T CACTACT AC ACTACTTACT 1 0 2
15
Suffix Tree For ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T AC ACTACTTACT C ACTACT 0 23 1
16
Suffix Tree For ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T AC ACTACTTACT C ACTACT 0 23 1 4 TACT
17
Suffix Tree For ACACTACT ACACTACT CACTACT ACTACT CTACT TACT ACT CT T AC ACTACTTACT C ACTACT 0 23 1 4 TACT
18
What’s the Problem ACACTACT CACTACT ACTACT CTACT TACT ACT CT T AC ACTACTTACT C ACTACT 0 23 1 4 TACT This suffix is a prefix of another suffix
19
What’s the Fix? Add a new symbol to the end of the string A symbol $ that does not appear elsewhere
20
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ ACACTACT$ 0
21
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ ACACTACT$ CACTACT$ 0 1
22
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ CACTACT$ AC ACTACT$TACT$ 1 0 2
23
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$TACT$ C ACTACT$ 0 23 1
24
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$TACT$ C ACTACT$ 0 23 1 4 TACT
25
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$T C TACT$ ACTACT$ 0 3 1 4 TACT ACT$ $ 2 5 Suffix is prefix problem went away
26
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$T C ACT$ ACTACT$ 0 3 1 4 TACT ACT$ $ 2 5 $ T 6
27
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$T C ACT$ ACTACT$ 0 3 1 T ACT$ $ 2 5 $ T 6 TACT $ 47
28
Suffix Tree For ACACTACT$ ACACTACT$ CACTACT$ ACTACT$ CTACT$ TACT$ ACT$ CT$ T$ $ AC ACTACT$T C ACT$ ACTACT$ 0 3 1 T ACT$ $ 2 5 $ T 6 TACT $ 47 $ 8
29
Use for Testing Substrings ACT follow links, get 2,5 CAC follow links, get 1 AC ACTACT$T C ACT$ ACTACT$ 0 3 1 T ACT$ $ 2 5 $ T 6 TACT $ 47 $ 8
30
Reprise To review the procedure, let’s build a suffix tree for MISSISSIPPI On the board Don’t forget the $
31
Code (define suffix-tree ; input string output suffix tree (lambda (t) (trie (suffixes-of t) (string-length t)))) (define suffixes-of ; input string output list of all its suffixes (lambda (t) (cond ((zero? (string-length t)) '()) (else (cons t (suffixes-of (substring t 1 (string-length t)))))))) (define trie ; input list of strings and n ; output suffix-tree style trie with ; n-(length of keyword) at the leaves (lambda (l n) ; list of keywords to put in a trie (tries (sort (lambda (x y) (string<=? x y)) l) n)))
32
More code (define tries ; builds a trie from sorted list l. Leaves as above (lambda (l n) (cond ((null? l) (make-empty-trie)) ((singleton? (samestarts l)) (make-internal-node (make-edge (car l) (make-leaf (- n (string-length (car l))))) (tries (cdr l) n))) (else (let ((childstrings (samestarts l))) (let ((label (commonprefix childstrings))) (let ((childnode (trie (map (chop (string-length label)) childstrings) (- n (string-length label))))) (make-internal-node (make-edge label childnode) (tries (nthcdr l (length childstrings)) n)))))))))
33
Analysis Building suffix tree: O(n 2 ) Searching for p: O(m+k) –Where p appears k times
34
Improvement possible Suffix tree for text length n can be built in time O(n) Thereafter all searches are O(m)
35
Applications in Biology Suffix Trees in Computational Biology Link doesn’t work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.