Presentation is loading. Please wait.

Presentation is loading. Please wait.

1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.

Similar presentations


Presentation on theme: "1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules."— Presentation transcript:

1 1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules

2 1998 년 8 월 7 일 Data Engineering Lab 성 유진 2 Abstract Standpoint of supporting human-centered discovery of Knowledge –lack of user exploration and control –lack of focus –rigid notion of relationship Constrained association queries –pruning using monotonicity, succinctness

3 1998 년 8 월 7 일 Data Engineering Lab 성 유진 3 Introduction Problem1 (Lack of User Exploration and Control) –Mining Process => Black Box –(user can’t preempt and needs to wait for hours) –establish clear breakpoints to allow user feedback Problem2 (Lack of Focus) –on which to focus the mining  to find association between sets of items whose types do not overlap

4 1998 년 8 월 7 일 Data Engineering Lab 성 유진 4  associations from item sets whose total price is at least $1,000 –provide a rich interface for the user to express focus (CAQ) Problem3 (Rigid notion of Relationship) –significance metrics : –separate criteria for selecting candidates for the antecedent and consequent:  association from items to sets of types pepsi => snacks

5 1998 년 8 월 7 일 Data Engineering Lab 성 유진 5

6 1998 년 8 월 7 일 Data Engineering Lab 성 유진 6 Architecture Phase 1 –user initially specifies CAQ includes a set of constraints C C is applicable to the antecedent and consequent –output: pairs of candidates(S a, S c ) S a, S c have support over thresholds –user can add, delete, of modify the constraints as many times as desired

7 1998 년 8 월 7 일 Data Engineering Lab 성 유진 7 Phase 2 –significance metric –a threshold for the metric –whatever further conditions to be imposed ont the antecedent and consequent  classical association mining - confidence (as significance metric) - confidence threshold - require ( S a  S c) be frequent

8 1998 년 8 월 7 일 Data Engineering Lab 성 유진 8

9 1998 년 8 월 7 일 Data Engineering Lab 성 유진 9 Constrained Association Queries CAQ –S  Item : S is a set variable on the Item domain –{(S 1, S 2 ) |C}, C is a set of constraints on S 1, S 2 –frequent constraints freq(S i ) –trans(TID, Itemset), iteminfo(Item, Type, Price) –S.price  100 : all items in S are of price less than of equal to $100 –{snacks, sodas}  S.Type

10 1998 년 8 월 7 일 Data Engineering Lab 성 유진 10 CAQ Examples –{(S 1, S 2 ) | S 1  Item & S 2  Item & count(S 1 ) = 1 & count(S 2 ) = 1 & freq(S 1 ) & freq(S 2 )} S 1.Type  S 2.Type   and max(S 1.Price)  avg(S 2.Price) –{(S 1, S 2 ) | agg 1 (S 1.Price)  100 & agg 2 (S 2.Price  1000} –{(S 1, S 2 ) | S 1.Type  {Snacks} & S 2.Type  {beers} & max(S 1.Price)  min(S 2.Price) Sound/Complete –algorithm is sound if it only finds frequent sets that satisfy the given constraints –algorithm is complete if all frequent sets satisfying the given constraints are found

11 1998 년 8 월 7 일 Data Engineering Lab 성 유진 11 Goal –to push the constraints as deeply as possible inside the computation of frequent set –classical algorithm + test them for constraint satisfaction => too inefficient –sound/complete : anti-monotone, succinctness

12 1998 년 8 월 7 일 Data Engineering Lab 성 유진 12 Anti-Monotone Constraints Find constraints which satisfy anti-monotone –prune away a significant num of candidates Definition –A 1-var constraint C is anti-monotone iff for all sets S, S’: S  S’ & S satisfies C  S’ satisfies C Identify which constraints are anti-monotone –Fig3 –min(S)  v (anti-monotone), min(S)  v (not )

13 1998 년 8 월 7 일 Data Engineering Lab 성 유진 13

14 1998 년 8 월 7 일 Data Engineering Lab 성 유진 14 Succinct Constraints once-and-for-all (before any iteration takes place) –not generate and test paradigm –how to succinctness member generating functions –definition SATc(Item) : the set of item sets satisfying C, pruned space –C 1  S.Price  100, pruned space for C 1 contains only item sets such that each item in the set has a price at least $100 selection predicate, p

15 1998 년 8 월 7 일 Data Engineering Lab 성 유진 15

16 1998 년 8 월 7 일 Data Engineering Lab 성 유진 16  Example  C 1  S.Price  100, let Item 1 = price  100 (Item):  C 1 is succinct because its pruned space SATc 1 (Item) is simply 2 item 1  C 2  {snacks, sodas}  S.Type : Let Item 2, Item 3, Item 4 be the sets  type = ‘snacks’(Item),  type = ‘sodas’(Item),  type  ‘snacks’  type  ‘sodas’ (Item)  C 2 is succint SAT C 2 (Item) can be expressed as 2 item - 2 item2 - 2 item3 - 2 item4 - 2 item2  item4 - 2 item3  item4

17 1998 년 8 월 7 일 Data Engineering Lab 성 유진 17  Example C 1  S.Price  100, MGF = {X |X  Item 1 & C   } C 2  {snacks, sodas}  S.Type, MGF = {X 1  X 2  X 3 | X 1  Item 2 & X 1   & X 2  Item 3 & X 2   & X 3  Item 4 }

18 1998 년 8 월 7 일 Data Engineering Lab 성 유진 18 Algorithms Algorithm Apriori+ –computes the frequent set => among frequent set, those which satisfy constraints become answer set Algorithm Hybrid(m) –in case (C - C freq ) is more selective, apriori+ is inefficient –First check C freq for m iterations –to reduce the remaining I/O cost, it switches to checking (C- C freq )

19 1998 년 8 월 7 일 Data Engineering Lab 성 유진 19

20 1998 년 8 월 7 일 Data Engineering Lab 성 유진 20 CAP algorithm 4 Cases  succinct and Anti-monotone –Replace C 1 in the Apriori Algorithm by C 1 c  succinct but not anti-monotone

21 1998 년 8 월 7 일 Data Engineering Lab 성 유진 21  Anti-monotone but Non-succinct –Define C k as in apriori algorithm, drop the candidates S if S fails C –constraint satisfaction is tested before counting is done  neither –Induce any weaker constraint C’ from C, depending on whether C’ is anti-monotone and /or sucinct, use the above strategies – Once all frequent sets are generated, test them for satisfaction of C

22 1998 년 8 월 7 일 Data Engineering Lab 성 유진 22


Download ppt "1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules."

Similar presentations


Ads by Google