Download presentation
Presentation is loading. Please wait.
Published byBrian West Modified over 6 years ago
1
Computer Science 112 Fundamentals of Programming II
Implementation Strategies for Unordered Collections
2
What They Are Bag - a collection of items in no particular order
Set - a collection of unique items in no particular order Dictionary - a collection of values associated with unique keys
3
Variations SortedBag - a bag that allows clients to access items in sorted order SortedSet - a set that allows clients to access items in sorted order SortedDictionary - a dictionary that allows clients to access keys in sorted order
4
Sorted Set and Dictionary Implementations
Array-based, using a sorted list Linked, using a linked binary search tree Must keep the tree balanced; insertions and removals will then be logarithmic as well
5
Dictionary Interface d.isEmpty() len(d)
iter(d) # Iterate through the keys str(d) key in d d.get(key, defaultValue = None) item = d[key] d[key] = item # Add or replace d.pop(key, defaultValue = None) d.entries() # A set of entries d.keys() # An iterator on the keys d.values() # An iterator on the values
6
Dictionary Implementations
Array-based (like ArraySet and ArraySortedSet) Linked structure (like LinkedSet and TreeSortedSet) All use an Entry class to contain the key/value pair
7
Possible Organization I
AbstractCollection AbstractBag ArrayBag LinkedBag ArraySet LinkedSet ArrayDict LinkedDict Is a dictionary just a type of set with some additional methods?
8
Possible Organization II
AbstractCollection AbstractBag AbstractDict ArrayBag LinkedBag ArrayDict LinkedDict ArraySet LinkedSet Which methods are implemented in AbstractDict?
9
The Entry Class class Entry(object): def __init__(self, key, value): self.key = key self.value = value def __eq__(self, other): if type(self) != type(other): return False return self.key == other.key def __lt__(self, other): return self.key < other.key def __le__(self, other): return self.key <= other.key Goes in abstractdict.py, where all dictionaries can see it
10
The AbstractDict Class
from abstractcollection import AbstractCollection class AbstractDict(AbstractCollection): def __init__(self): AbstractCollection.__init__(self, None) def __str__(self): return " {" + ", ".join(map(lambda entry: str(entry.key) + \ ":" + str(entry.value), self.entries())) + "}" {2:3, 6:7}
11
Can We Do Better? If we could associate each unordered set element or each unordered dictionary key with a unique index position in an array, we could have Constant-time search Constant-time insertion Constant-time removal
12
Hashing Each data element has a unique hash value, which is an integer
This value can be computed in constant time by a hash function This computation can be performed on each insertion, access, and removal
13
How Are the Elements Stored?
The hash value is used to locate the element’s index in an array, thus preserving constant-time access How to compute this: hashValue % capacity of array Position will be >= 0 and < capacity
14
A Sample Access Method (Set)
def __contains__(self, item): index = abs(hash(item)) % len(self._array) return self._array[index] != None self._array is an array of items len(self._array) is the array’s current physical size hash(item) is a function that returns an item’s hash value Other access methods have a similar structure
15
A Sample Mutator Method (Set)
def add(self, item): if not item in self: index = abs(hash(item)) % len(self._array) self._array[index] = item
16
Adding Items A mySet.add("A") index = 10
17
Adding Items B A mySet.add("B") index = 5
18
Adding Items C B A mySet.add("C") index = 0
19
Adding Items C B A D mySet.add("D") index = 15
20
Adding Items C B A D Add 12 more items
21
Adding Items C E M Q B N F K T W A G L Y I D Array is full
Resize the array and rehash all elements
22
Performance O(1) lookups, insertions, removals - wow!
Cost of resizing the array is amortized over many insertions and removals Works as long as hashValue % capacity is not the same for two items
23
Problem: Collisions As more elements fill the array, the likelihood that their hash values map to the same array position increases A collision then occurs: that is, items compete for the same position in the array
24
A Tester Program def testHash(arrayLength = 10, numberOfItems = 5):
print(("Array length: ", arrayLength) print("Number of items: ", numberOfItems) for i in range(1, numberOfItems + 1): item = "Item" + str(i) code = hash(item) index = abs(code) % arrayLength print("%7s%12d%8d" % (item, code, index))
25
Load Factor An array’s load factor expresses the ratio of the number of elements to its capacity Example: elements(10) / length(30) = .3333 Try to keep load factor low to minimize collisions Does waste some memory, though
26
Collision Processing Strategies
Linear collision processing - search for the next available empty slot in the array, wrapping around if the end is reached Can lead to clustering, where several elements that have collided now occupy consecutive positions Several small clusters may coalesce into a large cluster and thus degrade performance
27
Collision Processing Strategies
Rehashing - run one or more additional hash functions until a collision does not occur Works well when the load factor is small Multiple hash functions may contribute a large constant of proportionality to the running time
28
Collision Processing Strategies
Quadratic collision processing - Move a considerable distance from the initial collision Does not require other rehashing functions When k is the collision position, we enter a loop that repeatedly attempts to locate an empty position k // The first attempt to locate a position k // The second attempt to locate a position k + r2 // The rth attempt to locate a position
29
Collision Processing Strategies
Chaining Each hash value specifies an index or bucket in the array This bucket is at the head of a linked structure or chain of items with the same hash value
30
Some Buckets and Chains
index D5 D2 1 D6 D4 2 3 D8 4 D7 D3 D1
31
HashSet Data Extra instance variables support pointer manipulations
# Instance variables for locating data self._foundEntry # Pointer to item just located # undefined if not found self._priorEntry # Pointer to item prior to one just located self._index # Index of chain in which item was located # Instance variables for data self._array # the array of collision lists self._size # number of items in the set Extra instance variables support pointer manipulations during insertions and removals
32
HashSet Initialization
from node import Node from abstractset import AbstractSet from abstractcollection import AbstractCollection class HashSet(AbstractSet, AbstractCollection): DEFAULT_CAPACITY = 1000; def __init__(self, sourceCollection = None): self._array = Array(HashSet.DEFAULT_CAPACITY) self._foundEntry = self._priorEntry = None self._index = -1 AbstractCollection.__init__(self, sourceCollection) Uses singly linked nodes for the collision lists
33
HashSet Searching def __contains__(self, item): self._index = abs(hash(item)) % len(self._array) self._priorEntry = None self._foundEntry = self._array[self._index] while self._foundEntry != None: if self._foundEntry.data == item: return True else: self._priorEntry = self._foundEntry self._foundEntry = self._foundEntry.next return False If this method returns True, the instance variables _index, _foundEntry, and _priorEntry allow other methods to locate and manipulate an item in the array’s collision list efficiently
34
HashSet Insertion Link to head of chain def add(self, item):
if not item in self: newEntry = Node(item, self._array[self._index]) self._array[self._index] = newEntry self._size += 1 Link to head of chain
35
HashSet Removal def remove(self, item): if not item in self:
raise KeyError(str(item) + " not in set") elif self._priorEntry is None: self._array[self._index] = self._foundEntry.next else: self._priorEntry.next = self._foundEntry.next self._size -= 1
36
Performance of Chaining
If chains are evenly distributed across the array, close to O(1) If one or two chains get very long, processing tends to be linear Can use a large array but wastes memory On the average and for the most part, close to O(1)
37
Introduction to Graphs (Chapter 12)
For Friday Introduction to Graphs (Chapter 12)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.