Data Structures: Segment Trees, Fenwick Trees
Segment Trees Idea: create a tree on top of a base array that keeps information about ranges of the array Example: where is the minimum in a range of the array? Example: where is the maximum? Example: what is the sum over some range? Segment trees are a way of computing this Work for dynamically changing data Static data has a dynamic programming solution that is easier (we will see later) See section 2.4.3 of book
Storing the Segment Tree Similar to a heap Use an array of 4 times the base array size (could be smaller, but easy upper bound) Root is index 1 Covers [0,n-1] in original array Example: the index of the minimum value from the array, or max, or sum, etc. Left child of node i (covering [L,R]) is 2*i, right node is 2*i+1 Left child covers range [L, (L+R)/2] Right child covers range [(L+R)/2+1, R] Parent node is always node i/2
Assume n=16 1 0-15 2 0-7 3 8-15 4 0-3 5 4-7 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15)
Building and updating the tree At base level, each “leaf” will have the result for a single element of the original array. When the node covers [k,k] it is the result for a[k] in the original array, a. Can build the tree bottom-up Compute children, then merge these for parent If an array element gets updated Update all nodes on the path to that node – can traverse tree appropriately, and update bottom-up.
Answering range queries with the tree For query [a,b], checking node i: [L, R] Assumption: [a,b] is within [L,R], i.e. a >= L, b <= R If a==L and b==R, just return value If [a,b] overlaps one child, recursively call on that child if b <= (L+R)/2, return query [a,b] on left child node 2*i if a >= (L+R)/2+1, return query [a,b] on right child node 2*i+1 If [a,b] overlaps both children, process result from call on both: query [a,(L+R)/2] on left child node 2*i query [(L+R)/2+1,b] on right child node 2*i+1 return the result of both (e.g. sum, or min, or max, or lcm, etc.)
Query [6,12] 1 0-15 [6,12] overlaps both sides 2 0-7 3 8-15 4 0-3 5 4-7 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15)
Query [6,12] 1 0-15 [6,12] overlaps both sides 2 0-7 3 8-15 [6,7] overlaps right side only [8,12] overlaps both sides 4 0-3 5 4-7 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15)
Query [6,12] 1 0-15 [6,12] overlaps both sides 2 0-7 3 8-15 [6,7] overlaps right side only [8,12] overlaps both sides [12,12] on left only 4 0-3 [8,11] is exactly this, so stop 5 4-7 [6,7] overlaps right side only 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15)
Query [6,12] 1 0-15 [6,12] overlaps both sides 2 0-7 3 8-15 [6,7] overlaps right side only [8,12] overlaps both sides [12,12] on left only 4 0-3 [8,11] is exactly this, so stop 5 4-7 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 [6,7] is exactly this, so stop [12,12] on left only 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15)
Query [6,12] 1 0-15 [6,12] overlaps both sides 2 0-7 3 8-15 [6,7] overlaps right side only [8,12] overlaps both sides [12,12] on left only 4 0-3 [8,11] is exactly this, so stop 5 4-7 6 8-11 7 12-15 8 0-1 9 2-3 10 4-5 11 6-7 12 8-9 13 10-11 14 12-13 15 14-15 [6,7] is exactly this, so stop [12,12] on left only 16 (0) 17 (1) 18 (2) 19 (3) 20 (4) 21 (5) 22 (6) 23 (7) 24 (8) 25 (9) 26 (10) 27 (11) 28 (12) 29 (13) 30 (14) 31 (15) [12,12] is this, so stop
Square Root Decomposition Another way (simpler, but somewhat less powerful than segment trees) to improve range queries. Idea: break an array of size n into buckets of size n/k Then, queries over a range will look at the first bucket item-by-item from the starting point on then the entire buckets until the last one then the last bucket item-by-item until the ending point Ideal bucket size is k = sqrt(n) Example: 16 elements, range [0,15], break into 4 buckets. Query: [1,9]: First bucket looks at 1, 2, 3; second bucket covers 4-7, third bucket looks at 8,9
Assume n=16 0-3 4-7 8-11 12-15 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Range [6,12] [6,12] does not overlap 0-3 4-7 8-11 12-15 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Range [6,12] [6,12] overlaps. Check elements one by one. 0-3 4-7 8-11 12-15 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Range [6,12] [6,12] completely covers. 0-3 4-7 8-11 12-15 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Range [6,12] [6,12] overlaps. Check elements one by one. 0-3 4-7 8-11 12-15 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Fenwick Trees (Binary Indexed Tree) A very efficient implementation of trees (stored as an array) Uses binary codes to quickly address node elements Typically used for range-sum queries (RSQ) – sum over range Again, good for dynamic data Index into an array from 1 to n See book section 2.4.4
Storing the Fenwick tree LSOne = Least significant bit Can be calculated as x & (-x) (-x is in two’s complement) Node i of the Fenwick tree covers [i-LSOne(i)+1, i] 1: [1,1] 1-1+1 2: [1,2] 2-2+1 3: [3,3] 3-1+1 4: [1,4] 4-4+1 5: [5,5] 5-1+1 6: [5,6] 6-2+1 7: [7,7] 7-1+1 8: [1,8] 8-8+1 etc.
Assume n=16 16 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Range Sum Query on Fenwick Tree For [1,b], sum results from a sequence of elements, stripping off LSOne each time: b b’ = b-LSOne(b) repeat until b’ = 0 Total number = number of 1 bits in binary representation. e.g. for [1,7]: b = 7 (binary 0111) [7,7] b’ = 7-1 = 6 (binary 0110) [5,6] b’’ = 6-2 = 4 (binary 0100) [1,4] b’’’ = 4-4 = 0 (binary 0000) stop To compute [a,b] compute for [1,b] – [1,a-1]
Query [6,12] = [1,12]-[1,5] = 01100 - 00101 16 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
[1,12] = 01100 = 01100(12) + 01000(8) 16 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
[1,5] = 00101 = 00101(5) + 00100(4) 16 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Query [6,12] = [1,12]-[1,5] = 01100 - 00101 16 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Updating Fenwick tree Updating element k: e.g for n=12, k = 3 Must update range for sequence of nodes containing k k’ = k+LSOne(k) repeat until > n, where n is size of array e.g for n=12, k = 3 k = 3 0011 k’ = 3+1 = 4 0011 + 0001 = 0100 k’’ = 4+4 = 8 0100 + 0100 = 1000 k’’’ = 8+8=16 1000 + 1000 = 10000 > 12 therefore stop
Update element 11 (from array of size 16) 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Update element 11=01011: (from array of size 16) first update 01011 (11) (it is <=16) 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Update element 11=01011: last was 01011 01011+00001 = 01100(12) (still <=16) 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Update element 11=01011: last was 01100 01100+00100 = 10000(16) (still <=16) 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
Update element 11=01011: last was 10000 10000+10000 = 100000(32) (NOT <=16, so stop!) 1-16 8 1-8 4 1-4 12 9-12 2 1-2 6 5-6 10 9-10 14 13-14 1 1-1 3 3-3 5 5-5 7 7-7 9 9-9 11 11-11 13 13-13 15 15-15 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)