Presentation is loading. Please wait.

Presentation is loading. Please wait.

Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA

Similar presentations


Presentation on theme: "Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA"— Presentation transcript:

1 Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

2 Failing Query Problems Problem 1. Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints so that it returns a non-empty set of tuples. Assumption: S(A) – information system based on attributes from A, q(B) – query based on attributes from B. Query q(B) is not local for system S(A), if  [B  A]. Problem 2. Given S(A), which represents one of the sites of a distributed autonomous information system, and not local query q(B) submitted to S(A), where A  B  , how to modify q(B) so it can be answered.

3 Failing Query Problem 1 Problem 1. [Cooperative Query Answering] [Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea] age young middle-aged old salary low medium high 18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k Example of a query: (age, 18)  (salary, 40k) Possible relaxations: (age, young)  (salary, 40k) (age, 18)  (salary, low) (age, young)  (salary, low) Preference for relaxation : [1 - age, 2 - salary]

4 abce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c1c1c1c1 e2e2e2e2 Information System S 1 q = a [1,2] *c 1 submitted to S 1 fails (no objects in S 1 satisfying q) Solution: q can be generalized by QAS to q 1 = a 1 *c 1, which is matching objects x 3 and x 5 in S 1. Question: Which of these two objects (x 3 or x 5 ) is closer to q? Attribute a is hierarchical of a structure in Lisp-like notation a(a 1 ( a [1,1], a [1,2] ), a 2 ( a [2,1],…)) Failing Query Problem 1

5 Information System S 1 q = a [1,2] *c 1 submitted to S 1 fails (no objects in S 1 satisfying q) Question: Which of these two objects (x 3 or x 5 ) is closer to q? Let k  m. Then, the distance: δ a (a [i(1), i(2),…, i(k)], a [j(1), j(2),…, j(m)] ) = if [i(1) = j(1)  …  i(n) = j(n)  [ n = k  m  i(n+1)  j(n+1)]], then 1/2 n else 0 δ(x i,x j ) = δ a + δ b + δ c + δ e δ(q, x 3 ) = ½+1+1+1= 3½ δ(q, x 5 ) = ½+1+1+1 = 3½ Result: both are OK abce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c1c1c1c1 e2e2e2e2 Failing Query Problem 1

6 [Muslea, KDD’04] On-line, query-guided algorithm for relaxing failing DNF queries Example. A = {Price, CPU, Display, Weight}. Failing query q(A) = [Price  $2000]  [CPU  2.5GHz]  [Display  17’’]  [Weight  3lbs]. Select randomly chosen small subset of target DB to discover implicit relationships between values of attributes used in query. Discovered Rules: r 1 = [[Price  $2900]  [Display  18’’]  [Weight  4lbs]  [CPU  2.5GHz]]. r 2 = [[Price  $3500]  [CPU  2.5GHz]]. r 3 = ……. Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r 1 is such a rule. Relaxed query: [Price  $2900]  [CPU  2.5GHz]  [Display  17’’]  [Weight  4lbs].

7 Failing Query Problem 2 Problem 2. [Collaborative Query Answering] [Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska] Example of a non-local query Database: Flights(airline; departure time; arrival time; departure airport; arrival airport). select * from Flights where airline = "Delta” departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"

8 Query Processing in Collaborative Systemsabce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 System S 1abcd y1y1y1y1 a1a1a1a1 b2b2b2b2 c1c1c1c1 d1d1d1d1 y2y2y2y2 a2a2a2a2 b1b1b1b1 c2c2c2c2 d2d2d2d2 y3y3y3y3 a1a1a1a1 b [1,1] c2c2c2c2 d2d2d2d2 y4y4y4y4 a1a1a1a1 c2c2c2c2 d1d1d1d1 y5y5y5y5 a2a2a2a2 b2b2b2b2 c1c1c1c1 d2d2d2d2 System S q = a 1  b 1  e 1 submitted to S fails, because attribute e is not in S (clearly b [1,1] is also b 1 ). Find definition of e 1 in S 1 : b 1 →e 1 ; c 1 →e 1 ; a [1,2] →e 1 q = a 1  b 1  e 1  a 1  b 1  (b 1 +c 1 +a [1,2] ) = = a 1 *b 1 +a 1  b 1  c 1 +a 1  b 1  a [1,2] = a 1 *b 1. Objects y 3, y 4 satisfy the query q.

9 Query Processing in Collaborative Systemsabce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 System S 1abcd y1y1y1y1 a1a1a1a1 b2b2b2b2 c1c1c1c1 d1d1d1d1 y2y2y2y2 a2a2a2a2 b1b1b1b1 c2c2c2c2 d2d2d2d2 y3y3y3y3 a1a1a1a1 b [1,1] c2c2c2c2 d2d2d2d2 y4y4y4y4 a1a1a1a1 c2c2c2c2 d1d1d1d1 y5y5y5y5 a2a2a2a2 b2b2b2b2 c1c1c1c1 d2d2d2d2 System S q = a [1,2]  b [1,1] submitted to S 1 fails because of the granularity of b. Find definition b [1,1] in S: a 1  c 2 →b [1,1]. q = a [1,2]  b [1,1]  a [1,2]  a 1  c 2 = a [1,2]  c 2. Objects x 1, x 2 satisfy the query q.

10 Failing Query Problem 2

11 Query Processing in Incomplete IS   X is a set of objects, A is a set of attributes, V a is a set of values of attribute a, where a  A, and V =  {V a : a  A}, S = (X,A,V) is a partially incomplete information system of type, if the following two conditions hold: for any x  X, a  A, - if a S (x) is defined, then [a S (x)  V a or a S (x)={(v i,p i ): 1  i  m}], - if [a S (x)={(v i,p i ): 1  i  m}], then [  i=1…m p i = 1 and (  i)(p i  )] - Also, if [a S (x) = v, then the value v has the same meaning as {(v,1)}] Failing Query Problem 2

12 Incomplete Information System Queries: q 1 (a,b) = a 1 * b 1 q 2 (a,b) = a 1 + b 1 J(a 1 ) = {(x 1,1/3), (x 3,1),(x 5,2/3)} J(b 1 ) = {(x 1,2/3),(x 2,1/3),(x 4,1/2), (x 5,1),(x 7,1/4)} What about J(a 1 * b 1 ) = J(a 1 )  J(b 1 ), J(a 1 + b 1 ) = J(a 1 )  J(b 1 ) ?

13 Interpretations for  and  Assume that: J(a 1 ) = {(x i, p i ): i  K} and J(b 1 ) = {(x i, q i ): i  K}. Interpretation T 0 J(a 1 )  0 J(b 1 ) as {(x i, S 1 (p i, q i ): i  K}, where S 1 (p i, q i ) = [if max(p i, q i ) =1, then min(p i, q i ), else 0]. J(a 1 )  0 J(b 1 ) as {(x i, S 2 (p i, q i ): i  K}, where S 2 (p i, q i ) = [if min(p i, q i )=0, then max(p i, q i ), else 1]. Interpretation T 1 J(a 1 )  1 J(b 1 ) as {(x i, max {0, p i +q i -1}): i  K} and J(a 1 )  1 J(b 1 ) as {(x i, min{1, p i + q i }) : i  K}. Interpretation T 2 J(a 1 )  2 J(b 1 ) = {(x i, [p i  q i ]/[2 - (p i + q i – p i  q i )]): i  K} and J(a 1 )  2 J(b 1 ) = {(x i, [p i + q i ]/[1 + p i  q i ]) : i  K}.

14 Interpretations for  and  Interpretation T3 J(a 1 )  3 J(b 1 ) = {(x i, p i  q i ): i  K} J(a 1 )  3 J(b 1 ) = {(x i, p i +q i - p i  q i ) : i  K} Interpretation T4 J(a 1 )  4 J(b 1 ) = {(x i, [p i  q i ]/[p i + q i – p i  q i ]): i  K} J(a 1 )  4 J(b 1 ) = {(x i, [p i + q i - 2  p i  q i ]/[1 – p i  q i ]) : i  K} Fuzzy Interpretation T5 J(a 1 )  5 J(b 1 ) = {(x i, min {p i, q i }: i  K} J(a 1 )  5 J(b 1 ) = {(x i, max { p i, q i }) : i  K} Another possible interpretation T J(a 1 )  3 J(b 1 ) = {(x i, p i  q i ): i  K} J(a 1 )  5 J(b 1 ) = {(x i, max { p i, q i }) : i  K} Interpretations T 0, T 5, T satisfy property: a  (b  c) = (a  b)  (a  c) a  (b  c) = (a  b)  (a  c)

15 Incomplete IS [S 2 is finer than S 1 ] Assume: S 1, S 2 partially incomplete IS of type λ S 1, S 2 partially incomplete IS of type λ The same objects are stored in both systems The same objects are stored in both systems The same attributes are used to describe objects The same attributes are used to describe objects a S1 (x) ={(a 1i, p 1i ): 1 ≤ m 1 }, a S2 (x) ={(a 2i, p 2i ): 1 ≤ m 2 } a S1 (x) ={(a 1i, p 1i ): 1 ≤ m 1 }, a S2 (x) ={(a 2i, p 2i ): 1 ≤ m 2 } Failing Query Problem 2

16 S 2 is finer than S 1 if: (  x  X)(  a  A)[card(a S1 (x)) ≥ card(a S2 (x))] (  x  X)(  a  A)[card(a S1 (x)) ≥ card(a S2 (x))] (  x  X)(  a  A) [card(a S1 (x)) = card(a S2 (x))]  (  x  X)(  a  A) [card(a S1 (x)) = card(a S2 (x))]  [  i≠j|p 2i - p 2j | >  i≠j |p 1i - p 1j |] [  i≠j|p 2i - p 2j | >  i≠j |p 1i - p 1j |] Incomplete Information System

17 S 2 finer than S 1 S2S2 S1S1

18 Failing Queries in Collaborative IS Assume: Query q = q(B) is submitted to S =(X, A, V), where: B is a set of all attributes used in q A  B≠  Attributes in B\(A  B) are foreign for S Two information systems can collaborate if they agree on the ontology of some of their common attributes The granularity of values of attributes used in a query q may differ from the granularity of values of the same attributes in S

19 Failing Queries in Collaborative IS Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(A  B) at some of the remote sites for S. With each certain rule discovered at a remote site, a number of additional rules can be also discovered.

20 Example age ( child( ≤ 17), young (18, …, 29), middle-aged (30, …, 60), old (61, …, 80), senile ( ≥ 81) ) salary ( low(0, …, 40K), medium (50K, …, 70K), high (80K, …, 100K), very-high ( >100K) ) ( age, young )  ( salary, 40K ) ( age, young )  ( salary, low ) ( age, N )  ( salary, 40K ) ( age, N )  ( salary, low ) Failing Query Problem 2

21 Failing Queries in Collaborative IS S = (X, A, V) – client site A = {a, b, d, …}, c  A V a ={a 1, a 2, a 3 }, V b ={b 1,1, b 1,2, b 1,3, b 2,1, b 2,2, b 2,3, b 3,1, b 3,2, b 3,3 } V d ={d 1, d 2, d 3 } Semantics of hierarchical attributes {a, b, c, d} used by S and systems collaborating with S: a(a 1 [a 1,1, a 1,2, a 1,3 ], a 2 [a 2,1, a 2,2, a 2,3 ], a 3 [a 3,1, a 3,2, a 3,3 ]) b(b 1 [b 1,1, b 1,2, b 1,3 ], b 2 [b 2,1, b 2,2, b 2,3 ], b 3 [b 3,1, b 3,2, b 3,3 ]) c(c 1 [c 1,1, c 1,2, c 1,3 ], c 2 [c 2,1, c 2,2, c 2,3 ], c 3 [c 3,1, c 3,2, c 3,3 ]) d(d 1 [d 1,1, d 1,2, d 1,3 ], d 2 [d 2,1, d 2,2, d 2,3 ], d 3 [d 3,1, d 3,2, d 3,3 ])

22 Assume: Query q = a i,1 * b i * c i,3 * d i is submitted to S. q = a i,1 * [ b i,1 + b i,2 + b i,3 ] * c i,3 * d i = [ a i,1 * b i,1 * c i,3 * d i ] + [ a i,1 * b i,2 * c i,3 * d i ] + [a i,1 * b i,3 * c i,3 * d i ] How to solve query q ? 1. Generalize a i,1 to a i and c i,3 to c. The query has new form: q 1 = a i * [ b i,1 + b i,2 + b i,3 ] * d i 2.a. Objects matching q 1 may satisfy q 2.b. Generalizations decrease the chance that retrieved objects will match query q. Check what values of attributes a and c are implied by d i * b i,1, d i * b i,2, or d i * b i,3 at remote sites for S, and if any of these rules have high confidence and support. S: a[i], b[i,j], d[i]

23 q = a i,1 * [ b i,1 + b i,2 + b i,3 ] * c i,3 * d i = [ a i,1 * b i,1 * c i,3 * d i ] + [ a i,1 * b i,2 * c i,3 * d i ] + [a i,1 * b i,3 * c i,3 * d i ] How to solve query q ? 1. Generalize a i,1 to a i and c i,3 to c. The query has new form: q 1 = a i * b i * d i = [a i * b i,1 * d i ] + [a i * b i,2 * d i ] + [a i * b i,3 * d i ] 2. Check what values of attributes a and c are implied by d i * b i,1, d i * b i,2, or d i * b i,3 at remote sites for S, and if any of these rules have high confidence and support. Assume that: d i  b i,1  a i,2, d i  b i,2  c i,3 are certain rules, extracted at a remote site for S. We get q  [ a i,1 * b i,2 * d i ] + [a i,1 * b i,3 * c i,3 * d i ] local non-local S: a[i], b[i,j], d[i]

24 q=q(a [3,1,3,2], b 1, c 2 ) Possible generalization: q 1 =q 1 (a 3, b 1, c 2 ) Rules extracted at remote sites which define any of the values below a [3] will help in solving q. Rules describing values not belonging to {a [3,1], a [3,1,3], a [3,1,3,2] } are used to reduce the size of the query (to remove some conjuncts). Failing Query Problem 2

25 Questions? Thank You


Download ppt "Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA"

Similar presentations


Ads by Google