Download presentation
Presentation is loading. Please wait.
Published byChristiana Holt Modified over 9 years ago
1
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules
2
1998 년 8 월 7 일 Data Engineering Lab 성 유진 2 Abstract Standpoint of supporting human-centered discovery of Knowledge –lack of user exploration and control –lack of focus –rigid notion of relationship Constrained association queries –pruning using monotonicity, succinctness
3
1998 년 8 월 7 일 Data Engineering Lab 성 유진 3 Introduction Problem1 (Lack of User Exploration and Control) –Mining Process => Black Box –(user can’t preempt and needs to wait for hours) –establish clear breakpoints to allow user feedback Problem2 (Lack of Focus) –on which to focus the mining to find association between sets of items whose types do not overlap
4
1998 년 8 월 7 일 Data Engineering Lab 성 유진 4 associations from item sets whose total price is at least $1,000 –provide a rich interface for the user to express focus (CAQ) Problem3 (Rigid notion of Relationship) –significance metrics : –separate criteria for selecting candidates for the antecedent and consequent: association from items to sets of types pepsi => snacks
5
1998 년 8 월 7 일 Data Engineering Lab 성 유진 5
6
1998 년 8 월 7 일 Data Engineering Lab 성 유진 6 Architecture Phase 1 –user initially specifies CAQ includes a set of constraints C C is applicable to the antecedent and consequent –output: pairs of candidates(S a, S c ) S a, S c have support over thresholds –user can add, delete, of modify the constraints as many times as desired
7
1998 년 8 월 7 일 Data Engineering Lab 성 유진 7 Phase 2 –significance metric –a threshold for the metric –whatever further conditions to be imposed ont the antecedent and consequent classical association mining - confidence (as significance metric) - confidence threshold - require ( S a S c) be frequent
8
1998 년 8 월 7 일 Data Engineering Lab 성 유진 8
9
1998 년 8 월 7 일 Data Engineering Lab 성 유진 9 Constrained Association Queries CAQ –S Item : S is a set variable on the Item domain –{(S 1, S 2 ) |C}, C is a set of constraints on S 1, S 2 –frequent constraints freq(S i ) –trans(TID, Itemset), iteminfo(Item, Type, Price) –S.price 100 : all items in S are of price less than of equal to $100 –{snacks, sodas} S.Type
10
1998 년 8 월 7 일 Data Engineering Lab 성 유진 10 CAQ Examples –{(S 1, S 2 ) | S 1 Item & S 2 Item & count(S 1 ) = 1 & count(S 2 ) = 1 & freq(S 1 ) & freq(S 2 )} S 1.Type S 2.Type and max(S 1.Price) avg(S 2.Price) –{(S 1, S 2 ) | agg 1 (S 1.Price) 100 & agg 2 (S 2.Price 1000} –{(S 1, S 2 ) | S 1.Type {Snacks} & S 2.Type {beers} & max(S 1.Price) min(S 2.Price) Sound/Complete –algorithm is sound if it only finds frequent sets that satisfy the given constraints –algorithm is complete if all frequent sets satisfying the given constraints are found
11
1998 년 8 월 7 일 Data Engineering Lab 성 유진 11 Goal –to push the constraints as deeply as possible inside the computation of frequent set –classical algorithm + test them for constraint satisfaction => too inefficient –sound/complete : anti-monotone, succinctness
12
1998 년 8 월 7 일 Data Engineering Lab 성 유진 12 Anti-Monotone Constraints Find constraints which satisfy anti-monotone –prune away a significant num of candidates Definition –A 1-var constraint C is anti-monotone iff for all sets S, S’: S S’ & S satisfies C S’ satisfies C Identify which constraints are anti-monotone –Fig3 –min(S) v (anti-monotone), min(S) v (not )
13
1998 년 8 월 7 일 Data Engineering Lab 성 유진 13
14
1998 년 8 월 7 일 Data Engineering Lab 성 유진 14 Succinct Constraints once-and-for-all (before any iteration takes place) –not generate and test paradigm –how to succinctness member generating functions –definition SATc(Item) : the set of item sets satisfying C, pruned space –C 1 S.Price 100, pruned space for C 1 contains only item sets such that each item in the set has a price at least $100 selection predicate, p
15
1998 년 8 월 7 일 Data Engineering Lab 성 유진 15
16
1998 년 8 월 7 일 Data Engineering Lab 성 유진 16 Example C 1 S.Price 100, let Item 1 = price 100 (Item): C 1 is succinct because its pruned space SATc 1 (Item) is simply 2 item 1 C 2 {snacks, sodas} S.Type : Let Item 2, Item 3, Item 4 be the sets type = ‘snacks’(Item), type = ‘sodas’(Item), type ‘snacks’ type ‘sodas’ (Item) C 2 is succint SAT C 2 (Item) can be expressed as 2 item - 2 item2 - 2 item3 - 2 item4 - 2 item2 item4 - 2 item3 item4
17
1998 년 8 월 7 일 Data Engineering Lab 성 유진 17 Example C 1 S.Price 100, MGF = {X |X Item 1 & C } C 2 {snacks, sodas} S.Type, MGF = {X 1 X 2 X 3 | X 1 Item 2 & X 1 & X 2 Item 3 & X 2 & X 3 Item 4 }
18
1998 년 8 월 7 일 Data Engineering Lab 성 유진 18 Algorithms Algorithm Apriori+ –computes the frequent set => among frequent set, those which satisfy constraints become answer set Algorithm Hybrid(m) –in case (C - C freq ) is more selective, apriori+ is inefficient –First check C freq for m iterations –to reduce the remaining I/O cost, it switches to checking (C- C freq )
19
1998 년 8 월 7 일 Data Engineering Lab 성 유진 19
20
1998 년 8 월 7 일 Data Engineering Lab 성 유진 20 CAP algorithm 4 Cases succinct and Anti-monotone –Replace C 1 in the Apriori Algorithm by C 1 c succinct but not anti-monotone
21
1998 년 8 월 7 일 Data Engineering Lab 성 유진 21 Anti-monotone but Non-succinct –Define C k as in apriori algorithm, drop the candidates S if S fails C –constraint satisfaction is tested before counting is done neither –Induce any weaker constraint C’ from C, depending on whether C’ is anti-monotone and /or sucinct, use the above strategies – Once all frequent sets are generated, test them for satisfaction of C
22
1998 년 8 월 7 일 Data Engineering Lab 성 유진 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.