Apr 17, 2013 Persistent Data Structures
Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures can (usually) be copied, with modifications, to create a new version The modified version takes up as much memory as the original version A persistent data structure is one that, when modified, retains both the old and the new values Persistent data structures are effectively immutable, in that prior references to it do not see any change Modifying a persistent data structure may copy part of the original, but the new version shares memory with the original This definition is unrelated to persistent storage, which means keeping a copy of data on disk between program executions
Why persistent data structures? Functional programming is based on the idea of immutable data—or persistent data, which is effectively immutable The use of immutable data structures greatly simplifies concurrent programming Synchronization is expensive, and immutable data structures don’t need to be synchronized Copying large data structures is expensive and wastes space, but persistent data structures can use sophisticated structure sharing to reduce the cost on disk between program executions
Lists Lists are the original persistent data structures, and are very heavily used in functional programming xz y original w insert wdelete x As you can see, persistence is automatic with a list, and requires no additional effort
Trees and binary trees Trees and binary trees can also be implemented in a persistent fashion, though it takes a bit more work 5 A BC DEFG HIJKLM N A’ C’ G’
Arrays and vectors It’s more difficult to implement a persistent array The programming language Clojure implements persistent vectors, which are like arrays but can be expanded Any location in a vector can be accessed in (almost) O(1) time Vectors are represented as “fat trees,” or more precisely, as 32-tries 6
Tries A trie is like a binary search tree, only each node may have many children Tries are most often used with strings (and have up to 26 children per node) Each node of a 32-trie may have 32 children 7
Vector implementation I A persistent vector in Clojure is implemented as an N-level trie (N <= 7), where the root and internal nodes are arrays of 32 references, and the leaves are arrays of 32 values The depth of the trie (1 to 7) is also kept as an instance value For example, consider accessing location 5000 in a vector 5000 decimal is binary To acess element 5000 in a trie of depth 4: The binary number in group 4 (green) says to take the 0 th reference The binary number in group 3 (orange) says to take the 5 th reference The binary number in group 2 (green) says to take the 28 th reference The binary number in group 1 (blue) says to take the 8 th value 8
Vector implementation II The trie can be treated as a “fat tree,” with the structure sharing discussed earlier Because the trie is fat (many children per node), there is a high proportion of actual data to structure Access time is “almost” O(1), but as the size increases, the constant factor grows from 1 to 7 (depth of trie) This design is especially good for appending vectors For adding single elements to the end of the vector, there are additional special-case optimizations 9
Persistent Hash Map Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could be implemented just like a vector For a vector, the additional space required for the trie structure is a reasonable proportion of the total space For a hash map, the additional space required is not reasonable There will be a large number of 32-element arrays which contain mostly nulls The hard part is to use only as much space as needed Basic approach: Use arrays size N <= 32, where N is the number of non-null children Use a 32-bit word to indicate which children are actually present For example: indicates 5 children Find a fast function to map numbers in the range [0, 31] into the range [0, N) Many processors have an instruction to count the number of 1 bits in a word This would make a good assignment for the next time I teach this course 10
The End 11 Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. --Sir Winston Churchill, Speech in November 1942