Download presentation
Presentation is loading. Please wait.
1
Persistent data structures
Yoav Rubin
2
About me Software engineer in IBM Research, Haifa
Worked on From large scale products to small scale research projects Domains Software tools Development environments Simplified programming Technologies Frontend engineering Java, Clojure Lecture the course “Functional programming on the JVM” in Haifa University {:name “Yoav Rubin, : :blog
3
Roadmap Why What How
4
Why
5
Few assumptions Modern software uses different kinds of data
Modern software requires various ways to work with data We’re in a multi-core world Mutability and concurrency don’t get along
6
Concurrency and mutability don’t get along
Working with different kinds of data Requiring different ways to work with data + Modern software Data structures Concurrency and mutability don’t get along + Immutability Persistent data structures
7
What
8
What is a data structure
A way to organize data Provides contracts for read / find Provides contracts for update Adding data elements Removing data elements A data structure may contain other data structures
9
What’s in a contract Information given by the requester
For reads: Nothing Some identifier of the data For writes The data element itself The data element alongside additional info What is returned The data elements / not-found-identifier For writes: The data structure itself The cost of the operation
10
E.g., Hashtable Balanced search tree add(data-element, key) , O(1)
read(key), O(1) find(data-element), O(n) Balanced search tree add(data-element), O(logn) find(data-element), O(logn) Remove(data-element), O(logn)
11
E.g., LIFO FIFO push(data-element), O(1) pop(), O(1)
contains(data-element), O(n) FIFO enqueue(data-element), O(1) dequeue(), O(1)
12
What is the data in a data structure
Values Or other data structures At the leaves – only values A value is something that cannot change 7, \a , nil, “abc”
13
Is a data structure a value
In a mutable world - No In an immutable world – Yes It cannot change !!!
14
Persistency A persistent data structure is a data structure
That acts as a value Cannot be changed While providing contracts for reading and writing Without affecting other users of the data structure
15
Writing to an immutable structure
You don’t write to an immutable structure Copy-on-write? Breaks performance guarantees Maintaining persistency Create a structure, that is perceived as updated by the update requester Perceived as immutable by everyone else
16
Writing “together with”
There are two “participants” in the write The “write-to” structure The added value Extract as much as possible from the “write to” structure Shared part Non-shared part Complete the new structure with the added value and duplications of the non-shared part
17
How
18
Persistent data structures
Persistent list Persistent vector Persistent map
19
Persistent list - conj list1 a b c d (def list2 (conj list1 x)) list1
Note that list2 uses all of list1 list2
20
Persistent list – next (/ pop)
a b c d (def list3 (next list2)) list1 a b c d x Note that list3 is list1 list2 list3
21
Persistent list Complete structural sharing upon “modification”
Write (conj) – just adding the new value Delete (pop) – returning a pointer to the next element
22
Persistent lists O(1) for insertion (at the front)
O(n) for going over the list O(1) for popping
23
Persistent vectors and maps
24
Behind the scenes A trie Over the alphabet of 0-31
A tree in which each node has an “alphabet map” to route the navigation to the children Over the alphabet of 0-31 Vector – a balanced dense trie Maps – sparse trie
25
Why trie Trie allows holding both data and metadata of an element
The data is at the leaves The metadata is the derived from the structure of the path to the leave Deriving information from the structure is a very powerful mechanism Neural networks work that way In persistent vectors the metadata is the index In persistent maps the metadata is the hashcode
26
Persistent vectors Values are at the leaves
Each level can hold up to 32level elements If passed that number, a new level is created, and the previous level is pointed by entry 0
27
Adding elements 10 1 2 1 4 2 7 3 1 2 3 5 6 8 9 11 12
28
Finding an element Looking at the bit representation of the index
Each 5 bits correspond to a position in a specific level We know the trie’s height The height tells us which bit quintet to start the find process with
29
Finding an element - example
Assume that the trie height is 3 At the top level we would look here At the next level we’d look here At the leaves level we’d look here At the top level we would look here At the next level we’d look here At the leaves level we’d look here
30
Persistent vector Very efficient Find, add Subvec – O(1) Almost O(1)
O(near-constant-time) O(log32n) A very strong narrowing factor 1M elements => need to handle 4 levels For all practical uses – think of it as O(1) Subvec – O(1) No dependency in the size of the sub vector !!
31
Persistent map A special trie - Hash Array Map Trie
Based on the work of Phil Bagwell Over the alphabet of 0-31 Not dense Similar to the way Persistent vector map works Instead of indices, we use a 32 bit hash value of the key
32
What happens when modifying*
We want to do (assoc m x v1) The big arrows represent the not participating sub tree (each arrow can be several real arrows to several nodes) n x v
33
(assoc m x v1) m’ m r’ r n’ n x’ x … v1 v
34
Path copying Performing an action on a structure does not creates an entirely new structure A new structure is created and share a large portion of the old structure
35
What should be created Anything that is related to the new node
Remember – information about that node is found at the leaf (the content) and on the path (the metadata) Recreating the path to the changed node and the changed node
36
Path copying - price The amount of nodes need to be created is O(tree-height) using a very wide tree 32 children for each parent Tree-height is log32n n is the number of nodes in the tree
37
Performance concerns Very sparse structure
Each node should only point to existing children Not be structured as a full node Still, need to locate child in an efficient way Constant time
38
Processing within a node
Each node holds the following fields A list of up to 32 “links” that point from the parent node to a child Less cache misses Bitmap of 32 bits (an integer) 1 in the ith bit means that an entry whose value in the segment is i is pointed by the links list Its index in the list is the number of ones to the right of ith bit A very sparse tree – most of the links lists would be very small
39
Example A node is pointed by entry 0 in level 0 and has 4 children
In the positions 2, 5, 14 , 22 How to assoc keys with the following hash codes
40
The node (in level 1): 2 5 14 22
41
First hash code: 2 5 14 22 First entry in level 0, brought us to this node
42
First hash code: 2 5 14 22 Looking for entry 4 in this level, checking the bit map and seeing that it has 0 there, therefore returning false
43
Second hash code: 2 5 14 22 First entry in level 0, brought us to this node
44
Second hash code: 2 5 14 22 Looking for entry 14 in this level, checking the 14th bit in the map. There’s 1 there. Counting the number of 1s to the right of that bit – getting 2 Continue on the link in cell 2 (remember – zero based indexing)
45
How many actions taken Looking at the ith bit
A simple masking Counting the number of 1s Using the processor instruction CTPOP Population counting in a bit Found on many processors Counts the number of ones Masking and counting A constant amount of bit operations
46
That’s not all We traveled down the link
If there’s a node with map there Continue the same with the next segment If there’s a node with a value there Return that value
47
Disclaimer There are other ways to implement the internal processing within a node This specific way is a simplification of the process that was presented in the original “Ideal Hash Trees” paper Which was once used in Clojure Was replaced with several more optimizations
48
Why trie A linear structure – would need to copy the entire structure
Too much performance hit O(n) time for each action Too much memory consumption Path copying results in needing to create a few nodes for each change the wider the trie, the less node 32 children for parent is a very wide trie
49
Zippers
50
Zippers Generic tree handling API Purely functional data structure
Walking editing Purely functional data structure Persistent Excellent performance Found in clojure.zip First described by Gérard Huet in 1997
51
Zippers - rational In functional data structures, we replaced the O(1) for updates with O(logh) Gained immutability If we assume that the updates occur near a cursor on the structure, we can regain the O(1) for updates Think of a text editor (a tree of lines, each has characters in it) All the changes occur at the cursor
52
What is a zipper +
53
Zippers - creation Can be based on any tree structure
Need to provide the following three functions branch? – can the given node have children children – get the children of the given node make-node – creation of a new node
54
Zippers - creation What does the zipper have upon creation
The three functions An empty bookkeeping data The root of the tree (as the initial cursor) Creation with practically no performance cost
55
What is possible Going around Going to the root
left, right, up, down Going to the root Getting the current node DFS preorder travelling (in Clojure) With prev or next Each movement returns a new zipper!!!
56
n1 x y c a b We want to get here Right context Path Focus Left context
57
n3 R x y x y c a b We want to get here Right context Path Focus
Left context n1 n3 R n1 n2 n2 n3 x y n5 x y n4 c a b We want to get here
58
R x y x y n4 L c c a b We want to get here Right context Path Focus
Left context n1 R n1 n2 n2 n3 x y n5 x y n4 n3 n4 L n5 c c a b We want to get here
59
R x y x y L c c a b We want to get here b R a Right context Path Focus
Left context n1 R n1 n2 n2 n3 x y n5 x y n4 n3 L n5 c c a b We want to get here n4 b R a
60
What is possible Editing Note – this is a functional data structure
In O(creation-of-new-node) Not O(logn + creation-of-new-node) Treating the zipper as the root of the tree Note – this is a functional data structure Nothing is changed A new element is created Think of path copying But without the path
61
More info Phil Bagwell’s paper about tries hash mapped array tries
State, value and Identity – Rich Hickey Persistent maps and vectors inner working in Clojure Functional structures in Scala (Daniel Spiewak) Designing data structures (Phil Bagwell) Gérard Huet’s paper about zippers Clojure Zippers Zippers: Making Functional “Updates” Efficient Purely functional data structures by Chris Okasaki
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.