Download presentation
Presentation is loading. Please wait.
Published byHoward Russell Modified over 9 years ago
1
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM 08 2015/8/261
2
Outline. Motivation Preliminaries Conqueror : algorithm ◦ Selection loop ◦ Projection loop ◦ Constants loop ◦ Eliminating redundancies Experiments Conclusion 2015/8/262
3
Motivation. First query ask for the actors have starred in a movie of the genre ‘drama’. Second query ask for ‘drama’ and ‘comedy’. Now suppose the answer to the first query consists of 1000 actors, and the answer to the second query consists of 900 actors. 2015/8/263
4
Motivation (cont) It reveals the potentially interesting pattern that actors starring in ‘drama’ movies typically (with a probability of 90%) also star in a ‘comedy’ movie. In general, we are looking for pairs of queries Q 1,Q 2, such that Q 1 asks for a set of tuples satisfying a certain condition and Q2 asks for those tuples satisfying a more specific condition. 2015/8/264
5
Preliminaries. Relational database: R(R 1,…,R n ) Definition 1: ◦ Simple Conjunctive Query π X σ F (R 1 ×···× R n ) ◦ F : R i.A = R j.B or R k.A=“c” ◦ x : attributes from R example : Q1: π A,B R or Q2 : π A,B σ A=B R 2015/8/265
6
Preliminaries (cont) Definition 2 : Containment ◦ Two conjunctive queries Q 1 and Q 2 over R we write Q 1 ⊆ Q 2 if for every possible instance I of R, Q 1 (I) ⊆ Q 2 (I) Definition 3 : Diagonally contained ◦ Q 1 is diagonally contained in Q 2 if Q 1 is contained in a projection of Q 2 (Q 1 ⊆ π X Q 2 ) write Q 1 ⊆ Δ Q 2 2015/8/266
7
Preliminaries (cont) Definition 4 : Association Rule ◦ An association rule is of the form Q 1 ⇒ Q 2, such that Q 1 and Q 2 are both simple conjunctive queries and Q 2 ⊆ Δ Q 1 2015/8/267
8
Preliminaries (cont) Definition 5 : Support ◦ The support of a conjunctive query Q in an instance I is the number of distinct tuples in the answer of Q on I. ◦ A query is said to be frequent in I if its support exceeds a given minimal support threshold. ◦ The support of an association rule Q 1 ⇒ Q 2 in I is the support of Q 2 in I, an association rule is called frequent in I if Q 2 is frequent in I. 2015/8/268
9
Preliminaries (cont) Definition 6 : Confidence ◦ An association rule Q 1 ⇒ Q 2 is said to be confident if the support of Q 2 divided by the support of Q 1 exceeds a given minimal confidence threshold. 2015/8/269
10
Conqueror : Algorithm Divided into two phases. ◦ In a first phase, all frequent simple conjunctive queries are generated. ◦ Then, in a second phase, all confident association rules over these frequent queries are generated. 2015/8/2610
11
Algorithm (cont) Property 1 : ◦ Let Q 1 and Q 2 be two simple conjunctive queries. If Q 2 ⊆ Δ Q 1, then support(Q 1 ) ≥ support(Q 2 ). 2015/8/2611
12
Algorithm (cont) Selection loop: ◦ Generate all instantiations of F, without constants, in a breadth-first manner. Projection loop: ◦ For each generated selection, generate all instantiations of X in a breadth-first manner, and test their frequency. Constants loop: ◦ For each generated query in the projection loop, add constant assignments to F in a breadth-first manner. 2015/8/2612
13
Algorithm (cont) Selection loop ◦ We will use the so called restricted growth string for generating all partitions. ◦ A Restricted Growth string is an array a[1...m] where m is the total number of attributes occurring in the database. ◦ Restricted growth string satisfies the following growth inequality (for i =1, 2,...,n − 1, and with a[1] = 1): a[i +1] ≤ 1+max a[1],a[2],...,a[i]. 2015/8/2613
14
Algorithm (cont) a[1] = 1 i=1, a[1+1] ≤ 1 + max{a[1]} = 2 i=2, a[2+1] ≤ 1 + max{a[1], a[2]} = 3. EXAMPLE 4. ◦ Let A 1,A 2,A 3,A 4 be the set of all attributes occurring in the database. Then, the restricted growth string 1221 represents the conjunction of equalities A 1 = A 4, A 2 = A 3. 2015/8/2614
15
Algorithm (cont) 2015/8/2615
16
Algorithm (cont) Before generating possible projections for a given selection, we first determine whether the selection represents a cartesian product. 2015/8/2616
17
Algorithm (cont) What is cartesian product ◦ To determine whether a selection represents a cartesian product, we interpret each simple conjunctive query as an undirected graph, such that each relation or constant is a node, and each equality in the selection of the query is an edge between the nodes occurring in that equality. 2015/8/2617
18
Algorithm (cont) Projection loop ◦ For every generated projection, we first check whether all more general queries are known to be frequent, and if so, the resulting query is evaluated against the database 2015/8/2618
19
Algorithm (cont) Constant loop ◦ Every block of attribute equalities of the selection can also be set equal to a constant. 2015/8/2619
20
Algorithm (cont) Candidate evaluation ◦ Evaluated against the database by translating each query to SQL. ◦ The result of such a query is then stored in a temporary table ( τ ). SELECT A, COUNT(*) AS sup FROM τ GROUP BY A ◦ The result of these queries is stored in a new temporary table ( τ A ) holding the constant values together with their support. 2015/8/2620
21
Algorithm (cont) Let τ A and τ B be the temporary tables holding the constant values for the attributes A and B together with their support. We can now generate the table τ A,B. This is the generated query for getting the values for τ A,B,C using the temporary tables τ, τ A,B, τ A,C, τ B,C. 2015/8/2621
22
Algorithm (cont) Association rule generation ◦ For all queries Q1 the algorithm finds all queries Q2 such that Q2 ⊆ Δ Q1, it computes the confidence of the rule Q1 ⇒ Q2 and tests whether it is confident. 2015/8/2622
23
Algorithm (cont) Eliminating redundancies ◦ Consider the following association rules, each based on a vertical containment: π R.A,R.B,S.E σ R.C=S.F(R × S) ⇒ π R.A,S.E σ R.C=S.F(R × S) π R.A,S.E σ R.C=S.F(R × S) ⇒ π R.A σ R.C=S.F(R × S) π R.A,R.B,S.E σ R.C=S.F(R × S) ⇒ π R.A σ R.C=S.F(R × S) ◦ Now suppose the first association rule has a confidence of 100%. Then, the confidence of the second and third association rule must be equal. 2015/8/2623
24
Algorithm (cont) LEMMA. An association rule Q1 ⇒ Q2 is redundant if ◦ 1. There exists an association rule Q 3 ⇒ Q 1 with confidence 100% ◦ 2. There exists an association rule Q 4 ⇒ Q 2 with confidence 100%, and Q 4 ⊆ Δ Q 1. 2015/8/2624
25
Experiments. The IMDB snapshot consist of three tables ACTORS (A), MOVIES (M) and GENRES (G),and two tables that represent the connections between them namely ACTORMOVIES (AM) and GENREMOVIES (GM). We can conclude that every movie has a genre because of the following association rule with 100% Confidence π M.MID (M) ⇒ π M.MID σ GM.MID=M.MID (M × GM) 2015/8/2625
26
Experiments (cont) In our database, not every movie has to have an actor associated with it as the following rule only has 75.91% confidence. π M.MID (M) ⇒ π M.MID σ AM.MID=M.MID (M × AM) We can find ‘frequent’ genres in which actors play. It has 40.44% confidence, so 40.44% of the actors play in a ‘Documentary’ (genre id 3) while the same rule for ‘Drama’ has 49.85% confidence. 2015/8/2626
27
Experiments (cont) 81.60% of the actors in genre ‘Music’ (genre id 16) only play in one movie. But the same rule for genre ‘Crime’ has only 49.87% confidence. 2015/8/2627
28
Conclusion. 2015/8/2628
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.