Download presentation
Presentation is loading. Please wait.
Published byMilton Ferguson Modified over 9 years ago
1
Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees (BSTs) is proportional to the height of the tree, so height is a crucial performance parameter. In the worst case, it is possible to obtain “skinny” BSTs, whose height is equal or close to the total number of nodes N. This is no better than using an array as data structure. If only insertions are performed in the BST, it can be shown analytically that the average height is approximately 3·log 2 (N). But if both insertions and deletions are performed (as it happens in most real-life applications), the process is not analytically tractable. Empirical evidence indicates that the average height is still proportional to the log 2 N. Arun Mahendra - Dept. of Math, Physics & Engineering, Tarleton State UniversityMentor: Dr. Mircea Agapie OBJECTIVE We conduct a systematic study of insertions and deletions in BSTs of various sizes, and investigate the statistics of the height of the tree: average, standard deviation, and coefficient of variation. METHODS Each node is assigned the depth property, which shows how many levels down that node is from the root. The root itself has depth zero. The height of the tree is defined as the maximum depth of all its nodes, e.g. for the tree below the height is 3. RESULTS To simulate real-life dynamic operation, we allowed 1/3 of the nodes to be deleted and then re-inserted in each cycle, and performed a total of 10,000 cycles for each tree size. In the deletion process the first occurrence of a duplicate key was deleted. CONCLUSIONS AND FUTURE WORK For Binary Search Trees of sizes N between 100 and 12800 nodes, and deletion-insertion cycles as described above, the following behaviors have been observed: Average max tree height is logarithmic as a function of size. Maximum and minimum max heights are also logarithmic, with the same slope. In all our experiments, the total range (max – min) was bounded by 8. The coefficient of variation of the max height distribution is always under 0.14, and decreasing as tree size increases, as expected from statistics (STDDEV of the sampling distribution is STDDEV of population divided by n). The empirical law derived from data is H = -2.61 + 2.2·log 2 (N). Future work will investigate: The impact of “deeper” or more “shallow” cycles. The impact of larger numbers of cycles per tree, such that the total # of insertions is of the order of N 2. The impact of using average depth instead of maximum depth (height). The impact of not allowing duplicate keys. The theoretical grounding of the empirical formula derived. Assuming that the functional relationship between height and number of nodes is of the form H = a + b·log 2 (N) with unknown coefficients a and b, the linear regression enables to estimate a and b. From our data we find: a = - 2.61, b = 2.2. The theoretical explanation of these numbers is unknown, and it may be the object of further study, but for now this formula is a purely empirical result. This is a simple Binary Tree, having only two leaves (terminal nodes) under the Root. Nodes with the same parent are called siblings. All nodes store integers, or other keys (e.g. floating point, strings of text etc.). Height of BST subjected to 33% fluctuation cycles  For additional information please contact: 25 2030 Root Leaves Siblings 25 2030 1022 51121 2835 A more complex Binary Tree, having leaves and internal nodes. For each node, the following property holds: all numbers in the left sub-tree are smaller than (or equal to), and all Numbers in the right sub-tree are larger than the number In the node itself. This is the definition of a BST. 25 2030 1022 51121 2835 Depth = 0 Depth = 1 Depth = 2 Depth = 3 We used the computer programming language C for implementation, because of its small overhead, simple syntax, and direct access to pointers. For example, the height of a tree is found through the function maxDepth(), shown below: void maxDepth(node *tree){ if (tree){//tree not empty maxDepth(tree->left); heightOfTree = (heightOfTree depth) ?\ tree->depth : heightOfTree; maxDepth(tree->right); } The function modifies the global variable maxDeptTree, which has to be set to zero in the program before maxDept() is called. Due to the expected logarithmic behavior of the height, we chose exponential data points: out trees have 100, 200, 400, 800, 1600, 3200,6400 and 12800 nodes. The trees are subjected to cycles of node deletions followed by the same number of node insertions: The initial trees are built by inserting random numbers into an initially empty tree. The numbers to be deleted are chosen at random from among the numbers already in the tree. The numbers to be inserted are generated at random, using the function rand() from the C standard library. Duplicates are permitted. Coefficient of variation of height of BST subjected to 33% fluctuation cycles The coefficient of variation c is a measure of variability, defined as the ratio of standard deviation to average. We present it because of the varying averages of our distributions; in this context standard deviations cannot be compared directly, but coefficients of variation can, since the STDDEV is scaled. Arun Mahendra Computer Science program Tarleton State University st_amahendra@tarleton.edu Dr. Mircea Agapie Dept. of Math, Physics & Engineering Tarleton State University agapie@tarleton.edu ╛ An earlier version of this work was presented at the 3rd Annual TAMUS Pathways Student Research Symposium, Kingsville 2005.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.