Download presentation
Presentation is loading. Please wait.
Published byCorey Richards Modified over 6 years ago
1
PAT Trees Index for arbitrary character sequence in text
Gonnet(1983) – based on Patricia Tree Used for indexing OED (Morrison 68) SISTRINGS — Semi-Infinite-Strings pos A I rise on a point of order which … B I rise on a point of objection to …. B < A in sistring order What if we encountered all sistrings and sorted then?
2
STEP 1 : SISTRINGS .can .a.can.can.cans? –
STRING: . c a n a c a n c a n c a n s ? SISTRING OFFSET .can.a.can.can.cans? can.a.can.can.cans? an.a.can.can.cans? n.a.can.can.cans? .a.can.can.cans? a.can.can.cans? .can.can.cans? can.can.cans? an.can.cans? n.can.cans? .can.cans? can.cans? an.cans? n.cans? .cans? cans? ans? ns? s? ? –
3
STEP 2 : Sort and find minimal distinguishing prefixes
.can .a.can.can.cans? STRING : c a n a c a n c a n c a n s ? SISTRING OFFSET MINIMAL DISTINGUISHING PREFIX .a.can.can.cans? a .can.a.can.can.cans? can.a .can.can.cans? can.can. .can.cans? can.cans .cans? cans ? ? a.can.can.cans? a. an.a.can.can.cans? an.a an.can.cans? an.cans ans? ans can.a.can.can.cans? can.a can.can.cans? can.can. can.cans? can.cans cans? cans n.a.can.can.cans? n.a n.can.cans? n.can. n.cans? n.cans ns? ns s? s
4
STEP 4 : Create a Digital Trie from Prefixes
ⓐ ⓒ ⓝ ⓢ ⓝ ⓐ ⓢ ⓐ ⓒ ⓢ ⓝ ⓐ ⓒ ⓐ ⓢ ⓝ ⓐ ⓒ ⓐ ⓢ ⓐ ⓝ ⓝ ⓐ ⓒ ⓢ ⓢ ⓐ ⓝ ⓢ ( Label with substring beginning ) • ? 18 19 • • 5 12 • 16 3 4 • 15 1 14 • • • 11 9 13 7 • 10 16
5
STEP 5 : Simplify tree with use of skipped bits
ⓐ ⓒ ⓝ ⓢ x x x x ⓐ ⓒ ⓝ ⓢ ⓢ x x x x ⓢ ⓢ ⓐ ⓒ ⓐ ⓒ x x x x x x ⓐ ⓒ ⓐ ⓒ ⓢ ⓐ ⓢ x x x x x x x x ⓢ ⓢ # of missing letters c a n s ? • ? 19 18 • • • 4 5 15 • • 16 1 3 • 2 7 12 9 13 • • 16 10 8 12
6
Convert character to ASCII bits and make trie
Patricia Tree Binary digital trie Convert character to ASCII bits and make trie a c n
7
O(height) O(return set)
Applications Prefix Searching If no branch for next character, then fail. • c a n Enumerate all leaves sharing prefix O(height) O(return set) O(log n)
8
Applications • Longest Repetition Search .can.can farthest internal node from root — Simplify on-line calculation by storing a bit to show which direction longest subtree goes • Most Frequent N-gram
9
Problem with Pat Tree Enumerating subtree costly
10
Create Suffix Arrays
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.