Download presentation
Presentation is loading. Please wait.
Published byValerie Floyd Modified over 9 years ago
1
1 More Specialized Data Structures String data structures Spatial data structures
2
2 String Data Structures
3
3 String Operations String indexing Pattern matching Find pattern P in text T Find common substrings among a set of a strings Application Domains Bioinformatics Google search!
4
4 A simplified hash table for strings 0. Build a lookup table of size |Σ| w for all w-length words in D AAACAGATCACCCGCTGAGCGGGTTATCTGTT S1:S1:C A G T C C T S2:S2:C G T T C G C 1 2 3 4 5 6 7 S 1,1S 1,2S 1,3 S 1,4 S 1,5 S 1,6 S 2,1 S 2,2 S 2,3 S 2,4 S 2,5 S 2,6 Σ={A,C,G,T} w = 2 4 2 (=16) entries in lookup table Lookup table:
5
5 Lookup Table - Discussion Will work best for: fixed-size word matches Really short word matches Space complexity is O(|Σ| w ) If |Σ|=2, and w=30, then space is 2 30 or 10 9 What to do if the pattern/words are arbitrarily long? Suffix trees, suffix arrays, PATRICIA trees
6
6 PATRICIA trees “Practical Algorithm to Retrieve Information Coded in Alphanumeric” Compacted trie of a set of strings Dictionary searches made easy
7
7 Suffix Tree Compacted trie of all suffixes of a string 1 2 3 4 5 6 B A N A N A Find Pattern: “ANAN” Think how to implement Google Search?
8
8 Generalized Suffix Tree (GST) $ O ND W I $OG D $OGI OW$ $OG ND $OGI OW$ $OGI OW$ $W $ INDOW$ $ (2, 3)(1, 4) (2, 5) (2, 4) (2, 1)(1, 2) (2, 2)(1, 3)(1, 5)(2, 6) (1, 6)(1, 1) (1, 7) (2, 7) WINDOW$ INDIGO$ 1234567 1234567
9
9 Suffix Arrays Sorted array of suffixes Enumerate the leaves of a suffix tree Suffix tree + “LCP” Arrays = Suffix Trees a abra abracadabra acadabra adabra bra bracadabra cadabra dabra ra racadabra abracadabra
10
10 Spatial Data Structures
11
11 Spatial Data Structures Operation TypeData Structures Spatial queries on high- dimensional data: - range queries - nearest neighbor search Quad-trees oct-trees k-d trees range trees R-trees Points in 2-D Bounding rectangle
12
12 Recursive Bisection Technique for spatial domain decomposition Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005 c FDE G … …. root Quad trees (4-way trees)
13
13 Compacted Quad-trees (for 2D data) Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005 For 3D data, the corresponding tree is called an oct-tree N Each node has exactly 4 children (for 4 quadrants) Compact path into single edge 2D space with data Quad-tree decomposition E
14
Range Queries on Quad-trees (0,0) Range Query Result (a 1,b 1 ) (a 2,b 2 )
15
15 Oct-Trees (for 3D data) Issue: What happens if the data is unevenly (ie., non-uniformly) distributed ? Most of the levels in the tree will be empty Solution: “Compacted Oct-trees”
16
16 k-d trees (for k dimensions) Maintain a combined binary search tree for all dimensions Recursively bisect each dimension, alternating dimensions at each level of the tree
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.