CS-514 Final Project How circular arrays behave under successive rounds of uniform insertions and deletions Diogo Andrade Gábor Rudolf
Experiments Consider a circular array: Define the number of elements (n). Define the load factor r = n / array size. Define the number of rounds of insertions & deletions (I). Initialize the array with n elements, and then run for I iterations consisting of one insertion and deletion each. The positions for insertion and the deleted elements are selected uniformly.
Data generated by experiments Distribution, average and standard deviation of the number of shifts per iteration Number of clusters Distribution of cluster sizes in the “stationary” state Measure of clustering (sum of the log of gaps)
Results: Convergence The number of clusters per iteration and the gap measure converge after some iterations, independent of the array parameters and the how the array is initialized. 3 different initializations: Random One big chunk Successive insertions
Results: Convergence
Histogram of Shifts and Cluster Sizes
Distribution of Shifts The probability of having to make k shifts after an insertion can be determined by the sizes of the clusters: P[k shifts] = (# clusters with size >= k) / size
Approximation by Geometric & Modified Geometric Distribution
Results: Dependency on load factor The number of shifts per iteration and the average number of clusters depend only on the load factor of the array. The gap measure depends on the load factor and on the array size.
Results: Shifts - Dependency on load factor
Results: gap measure - dependency on load factor and size
Hashing with Linear Probing The experiment models the behavior of a dynamic hash table with open addressing using linear probing. The static case was studied extensively, see for example Knuth, 1963. We compare the long-term behavior with the static case as described by Knuth’s formulas.
Comparison with static case Expected number of shifts for inserting last element in static case (Knuth’s formula) Long-term behavior in our experiment
Future Work Proving convergence results Derive formulas for distribution, average and deviation of shifts Further comparison with Knuth’s results Analyze the time it takes to reach “stationary” state from different initial arrays (most importantly for successive insertions, which correspond to a hash table)