Download presentation
Presentation is loading. Please wait.
1
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer
2
Digital Data In earlier work with BSTs and various balanced trees, we compared keys for order or equality Here, we take advantage of structure of key Use it as an index, or Decompose string key into characters, or Treat key as numerical quantity on which we can perform operations
3
Assumptions We will construct and manipulate sets that Are drawn from a universe U of size N U = {u 0, …u N-1 } A relatively simple procedure exists by which we can compute, for an element u U, the index i such that u = u i. Easy if U is set of integers Also easy if U is set of characters with character codes in a contiguous interval
4
Bit Vector Used to represent a subset S U A table of N bits, Bits[0.. N-1] Bits[i] == 1 if u i S Bits[i] == 0 if u i S Example: today’s attendance 1 1 0 1 0 1 1 0 1 2 3 4 5 6 -- student number 1 = present 0 = absent
5
Bit Vectors Assume: determining element index takes constant time accessing position in table takes constant time May actually take several ops, and depend somewhat on N(size of universe), but not on size of set represented Then: Insert, Delete, Member are constant time ops
6
Bit Vectors A subset of a set of size N always takes N bits to represent, independent of size of subset Makes sense if: N is not too large need to represent sets of size comparable to N
7
Storage Efficiency Bit Vector vs. Binary Trees Binary Tree, set of size n Requires n(2p + K) bits K >= lg N, size of field to represent key value p = number of bits in a pointer Bit Vector, takes N bits If n N, then bit vector more efficient If p = K = 32, then tree becomes more space efficient when n/N 1% Actually, when n(2p + K) = N, which is when n/N = 1/96
8
When to use Bit Vectors? When universe is relatively small When sets are large in relation to size of universe
9
Advantages of Bit Vectors O(1) implementation of Insert, Delete, Member Union and Intersection easy Implement via Boolean and and or operations May actually take less than one op/element, as operations are performed on full machine word If machine word == 32, then one machine operation handles 32 potential elements of set
10
Disadvantages of Bit Vectors On some computers access to individual bits can require shifting and masking operations (expensive) Result is that Member may be much more expensive than Union Initialization takes (N) -- zero all the bits in the vector But can use constant time initialization algorithm But that makes storage requirement go to 2p + 1 bits per element So, in practice, just use machine ops to set to zero, which are efficient
11
Tries and Digital Search Trees If the key can be decomposed into characters, then the characters of the key can be used as indices Tries are based on this idea “trie” is the middle symbol of retrieval, a pun on tree, but pronounced “try”
12
Tries Assume k possible character values A trie is a (k+1)-ary tree each node a table of k+1 pointers One pointer for each possible character One for the end of string character,
13
Trie Example
14
Tries Path for key of m characters is length m, with pointer at Don’t need to store key itself.. It is the path followed. Info field might be pointed to by element
15
Tries: Analysis Let: n be the number of keys stored in a trie l be the length(in characters) of the longest key s be the number of nodes in the trie k be the size of the alphabet Pro: Access time is O(l), independent of k, n and s Con: Size -- requires (k+1) * s * p bits Most pointers are null, so lots of wasted space
16
Strategies for reducing storage requirements of tries 1.Implement a k-ary trie with m nodes as a 2-D, m by k table A B C D E … M …. P …. T …. ------1-2-3-- 45----------- 6---7--8----- -----------9- -----------10- 012345012345
17
Table approach Number the nodes in the diagram of slide 13 from 1 to m The table entry corresponding to j th child of i th node is the index of the child node How does that save space? Just as many nodes and elements as on slide 13 … need only ceil(lg(m)) bits to represent, smaller than a pointer …
18
Patricia Tree: Another strategy for reducing space in a trie Patricia tree Practical Algorithm to Retrieve Information Coded in Alphanumeric Eliminate nodes with only one nonempty child Can now skip right from T to in TURING in our example Skip from MA …. To E or in the MENDEL, MENDELEEV chain But need to store with each node the index of the character on which it discriminates And need to store the key itself at the leaf
19
Patricia tree
20
de la Briandais trees Another strategy to save space vs. standard tries Use a linked list instead of a table at the node level Each pointer labeled with the character it indexes longer search time than tries; depends on size of character set saves significant amounts of memory
21
de la Briandais
22
Another strategy … Use tries at the first few levels Use ordinary BSTs or de la Briandais at the lower levels reasoning: speed advantage at the top, but not too much extra memory required save space at lower levels
23
Digital Search Trees Treat keys as bit strings (strings over the alphabet {0,1}) Binary tree – search directed left on 0, right on 1 Each node contains not only two pointers, but also contains a key that matches that string prefix Compare for equality before searching left or right If frequencies are known, store higher frequency keys nearer root Can be grown dynamically Expected Search time: O(log n)
24
Digital Search Tree
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.