Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 More Specialized Data Structures String data structures Spatial data structures.

Similar presentations


Presentation on theme: "1 More Specialized Data Structures String data structures Spatial data structures."— Presentation transcript:

1 1 More Specialized Data Structures String data structures Spatial data structures

2 2 String Data Structures

3 3 String Operations String indexing Pattern matching Find pattern P in text T Find common substrings among a set of a strings Application Domains Bioinformatics Google search!

4 4 A simplified hash table for strings 0. Build a lookup table of size |Σ| w for all w-length words in D AAACAGATCACCCGCTGAGCGGGTTATCTGTT S1:S1:C A G T C C T S2:S2:C G T T C G C 1 2 3 4 5 6 7 S 1,1S 1,2S 1,3 S 1,4 S 1,5 S 1,6 S 2,1 S 2,2 S 2,3 S 2,4 S 2,5 S 2,6 Σ={A,C,G,T} w = 2  4 2 (=16) entries in lookup table Lookup table:

5 5 Lookup Table - Discussion Will work best for: fixed-size word matches Really short word matches Space complexity is O(|Σ| w ) If |Σ|=2, and w=30, then space is 2 30 or 10 9 What to do if the pattern/words are arbitrarily long? Suffix trees, suffix arrays, PATRICIA trees

6 6 PATRICIA trees “Practical Algorithm to Retrieve Information Coded in Alphanumeric” Compacted trie of a set of strings Dictionary searches made easy

7 7 Suffix Tree Compacted trie of all suffixes of a string 1 2 3 4 5 6 B A N A N A Find Pattern: “ANAN” Think how to implement Google Search?

8 8 Generalized Suffix Tree (GST) $ O ND W I $OG D $OGI OW$ $OG ND $OGI OW$ $OGI OW$ $W $ INDOW$ $ (2, 3)(1, 4) (2, 5) (2, 4) (2, 1)(1, 2) (2, 2)(1, 3)(1, 5)(2, 6) (1, 6)(1, 1) (1, 7) (2, 7) WINDOW$ INDIGO$ 1234567 1234567

9 9 Suffix Arrays Sorted array of suffixes Enumerate the leaves of a suffix tree Suffix tree + “LCP” Arrays = Suffix Trees a abra abracadabra acadabra adabra bra bracadabra cadabra dabra ra racadabra abracadabra

10 10 Spatial Data Structures

11 11 Spatial Data Structures Operation TypeData Structures Spatial queries on high- dimensional data: - range queries - nearest neighbor search Quad-trees oct-trees k-d trees range trees R-trees Points in 2-D Bounding rectangle

12 12 Recursive Bisection Technique for spatial domain decomposition Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005 c FDE G … …. root Quad trees (4-way trees)

13 13 Compacted Quad-trees (for 2D data) Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005 For 3D data, the corresponding tree is called an oct-tree N Each node has exactly 4 children (for 4 quadrants) Compact path into single edge 2D space with data Quad-tree decomposition E

14 Range Queries on Quad-trees (0,0) Range Query Result (a 1,b 1 ) (a 2,b 2 )

15 15 Oct-Trees (for 3D data) Issue: What happens if the data is unevenly (ie., non-uniformly) distributed ? Most of the levels in the tree will be empty Solution: “Compacted Oct-trees”

16 16 k-d trees (for k dimensions) Maintain a combined binary search tree for all dimensions Recursively bisect each dimension, alternating dimensions at each level of the tree


Download ppt "1 More Specialized Data Structures String data structures Spatial data structures."

Similar presentations


Ads by Google