Download presentation
Presentation is loading. Please wait.
Published byMeagan Washington Modified over 9 years ago
1
Mauro Mezzini ANSWERING SUM-QUERIES : A SECURE AND EFFICIENT APPROACH University of Rome “La Sapienza” Computer Science Department
2
Introduction Statistical database: users are allowed to ask statistical information such as sum, count, average, max and min queries on a numerical attribute. PRODUCT SALES(€) storage 90000 router 30000 server 30000 mainframe 25000 select sum( SALES ) from Retail where PRODUCT = “storage” or PRODUCT = “router”; Retail r = 120.000
3
Introduction Definition: The target K of a query q. select sum( SALES ) from Retail where PRODUCT = “storage” or PRODUCT = “router”; PRODUCT K storage router
4
The efficiency issue To speed up the answer of a sum-query, the query system is endowed with a set of pre-computed sum-queries called the set of materialised views. select sum( SALES ) q 2 from Retail where PRODUCT = “storage” or PRODUCT = “router”; q 1 select sum( SALES ) from Retail r 1 = 175.000 r 2 = 120.000 select sum( SALES ) q from Retail where PRODUCT = “server” or PRODUCT = “mainframe”; r = r 1 r 2 = 55.000
5
Protection issue To protect the confidentiality of the numerical attribute, the query system is endowed with the list of all sensitive categories. q 1 select sum( SALES) from Retail where PRODUCT = “storage”; q 2 select sum( SALES) from Retail where PRODUCT = “router”; PRODUCT SALES(€) storage 90000 routers 30000 server 30000 mainframe 25000
6
select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 Protection issue x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 The value of all confidential information can be inferred from the answer of non–confidential queries {q 1, q 2, q 3 }.
7
The inference model Efficiency : Given a set of sum-queries V = {q 1,…,q n } determine if the result of q can be inferred from V. Protection : Given a set of sum-queries V = {q 1,…,q n } determine for every inferable sum-query q if the result of q is a sensitive information.
8
The inference model Let V = {q 1, q 2, …,q n } Let K i and r i be the target and the result of q i respectively Let ={C 1, C 2,…, C m } be the coarsest partition of i=1,…,n K i such that each K i can be obtained by the union of one or more elements of The inference model is based on the following linear constraints system j=1,…,m a i,j x j = r i i=1,…,n x F m where a i,j = 1 if C j K i and a i,j = 0 otherwise and F is the domain of the numerical attribute (1)
9
The inference model. An example K 1 ={router, server} C 1 ={router} C 2 ={server} C 3 ={storage} F is the set of non-negative reals select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 K 2 ={storage, server} K 3 ={storage, router}
10
The inference model Definition: Given a subset S of {1,2,…,m} the sum-expression j S x j is an F - invariant if it takes on the same value for every solution x of (1). An F -invariant sum is the result of the sum-query with target j S C j
11
The inference model Definitions: Given the partition = {C 1,…,C m } and a query q with target K the two sets: S = { j : C j K} the support of q S = { j : C j K and C j - K } the cosupport of q The sum j S S x j is called the sum-expression associated to q.
12
The inference model. An example q select sum( SALES) from Retail where PRODUCT = “storage”; The support of q is { 3 }, the cosupport is empty and the sum-expression associated to q is trivially: x 3 K 1 ={router, server} C 1 ={router} C 2 ={server} C 3 ={storage} select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 K 2 ={storage, server} K 3 ={storage, router} x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 K={storage}
13
Problems definitions 1)Given a sum-expression j S x j decide whether it is an F - invariant. 2)Given a sum-expression j S x j that is not an F -invariant, find a nonempty subset S of S such that j S x j is an F - invariant.
14
Let S be a subset of {1,…,m} and let s be the characteristic vector of S. Then 1 if i S 0 if i S Problem (2) s(i)= i = 1,…,m
15
Problem (2) An m-dimensional f vector is a linear combination of rows of A if We can rewrite system (1) as : A x = r, x F m f = i=1,…,m i a i i R a i is a row of A i=1,…,m
16
Problem (2) Definition: A subset S of {1,2,…,m} is said to be algebraic if its characteristic vector can be expressed as a linear combination of the rows of the matrix A. If F is R, or Z then j S x j is F- invariant if and only if S is algebraic.
17
Problem definition :Given a sum expression j S x j that is not R- invariant, find a non-empty algebraic subset of S (NAS Problem). NAS Problem : find a non-empty subset F of S such that the characteristic vector of F is expressible as a linear combination of rows of A The NAS Problem
18
The subset sum problem (SSP): Given a set S = {1,…,p} and a mapping a:S Z such that a(i) > 0 for i=1,…,p-1 and a(i) < 0 for i=p find a subset F of S such that i F a(i) = 0 The NAS Problem
19
Let c be a q-dimensional vector, with q≥p, such that c(1) = a(1) c(2) = a(2) …. c(p) = a(p) and c(j) R for p<j q Let M = (I, c) be the q (q+1) matrix obtained from c. The NAS Problem
20
Example: let S={1, 2, 3, 4} and a(1) = 1 a(2) = 2 a(3) = 5 a(4)= -7 The subset F = { 2, 3, 4} of S is a solution of the SSP since a(2) + a(3) + a(4) = 2 + 5 – 7 = 0. The NAS Problem
21
If we chose q = 5 the vector c is (1, 2, 5, -7, ) and the matrix M is 1 0 0 0 0 1 0 1 0 0 0 2 0 0 1 0 0 5 0 0 0 1 0 -7 0 0 0 0 1
22
The NAS Problem The vector c= ( c, 1) is a solution of the equation M y = 0 y 1 +1 y 6 = 0 y 2 +2 y 6 = 0 y 3 +5 y 6 = 0 y 4 7 y 6 = 0 y 5 + y 6 = 0
23
The NAS Problem Theorem: Given the matrix M and the set S = { 1,…,p} then the SSP as a solution if and only if there exist a nonempty algebraic subset of S. Proof The (q+1)-dimensional vector c= ( c, 1) spans the null space of M M y = 0 and the null space of M has dimension equal to one.
24
The NAS Problem If F S is an algebraic set then its characteristic vector f is expressible as a linear combination of rows of M. Since f and c are orthogonal then i=1,…,q+1 f(i) c(i) = 0 that is 0 = i F c(i) = i F a(i) qed.
25
The NAS Problem Example: let S={1, 2, 3} and a(1) = 2 a(2) = 2 a(3) = 4 then c 0 = (2, 2, 4) c 1 = ( 1, 1, 1) c 2 = ( 1, 1, 1) c 3 = ( 2, 2, 2, 1, 1) let c = (c 0, c 1, c 2, c 3 )
26
Then M would be 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
27
Step (1) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
28
Step (3) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
29
Step (4) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
30
Step (5) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
31
Step (6) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
32
Final step 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3
33
a(i) > 1 i=1,..,p-1 c i = ( ) k i = log 2 a(i)
34
The NAS Problem a(i) = 7 c i = ( -3, -3, 3, -1, -1, 1 ) k i = log 2 7 = 2 a(i) = 8 c i = ( -4, -4, 4, -2, -2, 2, -1, -1, 1 ) k i = log 2 8 = 3
35
The NAS Problem B = max{ |a(i)| : i = 1,…,p} The SSP has input dimension equal to O( p × log 2 (B)). k i log 2 (B) The dimension of the matrix M is q × (q +1) where q ( p + 1 ) × 3 log 2 (B) O( p × log 2 (B) )
36
Solving problem (1) A x = r, x F m j S x j is an F -invariant? A is the vertex-edge incidence matrix of a graph, F is the set of reals and S is singleton. x1x1 x2x2 x7x7 x8x8 x6x6 x4x4 x3x3 x5x5 r1r1 r2r2 r3r3 r4r4 r5r5 r6r6
37
Solving problem (1) Consider the homogeneous system associated to system (1) A y = 0, y R m (2) We call circulation a solution y of system (2). ++ ++ -- -- 0 0 0 0 0 0 0 0 0 0
38
Solving problem (1) Definition : given a circulation y then its support is the set C = { e : y e 0 } 0 0 0 0 ++ ++ -- --
39
Solving problem (1) Theorem 1: The unknown x e is an R- invariant if and only if circulation y with support C then e C. Proof: Let x* be a particular solution of (1). Then x = x* + y So if y e =0, circulation y then x e = x e *, solution x of (1). If x e is invariant then x e – x e * = 0 = y e For every solution x of (1). Therefore y e = 0 for every circulation y.
40
Solving problem (1) Definition : A circulation y with support C is minimal if there is no circulation with support C such that C C. ++ +3 -2 ++ -- ++ --
41
Solving problem (1) The support of minimal circulations are called circuits and are the even cycles and the L-oddsets of the graph. ++ ++ -- -- +2 -- ++ ++ -- -- ++ -2 -- ++ -- -- ++ ++
42
Solving problem (1) Given a circulation y then y = i=1,…,p i y i where i R B={y 1,…, y p } is a base of N each y i is a circuit of G
43
Solving problem (1) +2 -- -- ++ -- ++ -- +β+β + β - β
44
Solving problem (1) Theorem 2: The unknown x e is an R- invariant if and only if circuit y i with support C then e C. Proof: y e = i=1,…,p i y i,e = 0
45
Solving problem (1) An odd edge is an edge of G belonging to every odd cycles of G. A bridge is an edge of G whose removal disconnect G.
46
Solving problem (1) Theorem 3: The unknown x e is an R- invariant if and only if e is an odd edge or is a bridge that disconnect a bipartite component of G. Proof: 1) If e belongs to all odd cycles of G then G cannot contains an l-oddset. 2) If e is a bridge then it cannot belong to an even cycle.
47
Solving problem (1) The case when e is an odd edge. Let for contraddiction D be an even cycle containing e. D C is a set of edge-disjoint cycles not containing e. |D C| = |D| +|C| 2 |D C| |D C| is odd and D C must contains at least one odd cycle (contraddiction).
48
Solving problem (1) The case when e is a bridge disconnecting a bipartite component. e non bipartite component bipartite component
49
Solving problem (1) E(H) = { e : e is a bridge of G} V(H) = { v : v is a connected component of G E(H)} G H
50
Solving problem (1) Step 1
51
Solving problem (1) Step 2
52
Solving problem (1) Step 3
53
Solving problem (1) Step 4
54
Solving problem (1) Step 5
55
Solving problem (1) Step 6
56
Solving problem (1) Step 7
57
Solving problem (1) Step 8
58
Solving problem (1) A DFS traversal of a graph gives a partition of the edges of G tree edges back edges Each back edge e generates a cycle C(e) The cycle C(e) is called a fundamental cycle with respect to the tree T
59
Solving problem (1) Proposition: every cycle of G can be obtained as the symmetric difference of one or more fundamental cycles. If e is an odd edge then 1)it must belong to every fundamental odd cycle of G 1)no fundamental even cycle of G contains e
60
Solving problem (1) A back edge e belong to every fundamental odd cycle of G if and only if C(e) is the only fundamental odd cycle. For every tree edge e we count the number of odd and even fundamental cycles containing e.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.