1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.

Slides:



Advertisements
Similar presentations
Association Rules Mining
Advertisements

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Mining Multiple-level Association Rules in Large Databases
Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla,
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Effect of Support Distribution l Many real data sets have skewed support distribution Support distribution of a retail data set.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CPS : Information Management and Mining
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Organization “Association Analysis”
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Minqi Zhou Minqi Zhou Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis algorithms have the potential to generate a large number.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Evaluation of Association Patterns
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
Fast Algorithms for Association Rule Mining
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Association Analysis: Basic Concepts and Algorithms
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) Warsaw University of Technology.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach,
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Mining various kinds of Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1. Basic Association Analysis (IDM ch. 6) 1. Review 2. Maximal and Closed Itemsets 3. Rule Generation 4. Kuis 2. Support Vector Machines / SVM (IDM ch.
1. UTS 2. Basic Association Analysis (IDM ch. 6) 3. Practical: 1. Project Proposal 2. Association Rules Mining (DMBAR ch. 16) 1. online radio 2. predicting.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules
COMP 5331: Knowledge Discovery and Data Mining
©Jiawei Han and Micheline Kamber
CSE4334/5334 Data Mining Lecture 15: Association Rule Mining (2)
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Department of Computer Science National Tsing Hua University
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Presentation transcript:

1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule postmining: visualization and validation zInteresting association rules.

2 Rule Validations zOnly a small subset of derived rules might be meaningful/useful yDomain expert must validate the rules zUseful tools: yVisualization yCorrelation analysis

3 Visualization of Association Rules: Plane Graph

4 Visualization of Association Rules (SGI/MineSet 3.0)

5 Pattern Evaluation zAssociation rule algorithms tend to produce too many rules ymany of them are uninteresting or redundant yconfidence(A  B) = p(B|A) = p(A & B)/p(A) yConfidence is not discriminative enough criterion y Beyond original support & confidence yInterestingness measures can be used to prune/rank the derived patterns

6 Application of Interestingness Measure Interestingness Measures

7 Computing Interestingness Measure zGiven a rule X  Y, information needed to compute rule interestingness can be obtained from a contingency table YY Xf 11 f 10 f 1+ Xf 01 f 00 f o+ f +1 f +0 |T| Contingency table for X  Y f 11 : support of X and Y f 10 : support of X and Y f 01 : support of X and Y f 00 : support of X and Y Used to define various measures u support, confidence, lift, Gini, J-measure, etc.

8 Drawback of Confidence Coffee Tea15520 Tea Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9 … >0.75  Although confidence is high, rule is misleading  P(Coffee|Tea) = …>>0.75

9 Statistical-Based Measures zMeasures that take into account statistical dependence )()(),( )()( ),( )( )|( YPXPYXPPS YPXP YXP Interest YP XYP Lift    Does X lift the probability of Y? i.e. probability of Y given X over probability of Y. This is the same as interest factor I =1 independence, I> 1 positive association (<1 negative) Many other measures PS: Piatesky-Shapiro

10 Example: Lift/Interest Coffee Tea15520 Tea Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Lift = 0.75/0.9= (< 1, therefore is negatively associated)

11 Drawback of Lift & Interest YY X100 X YY X900 X Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1 u Lift favors infrequent items u Other criteria proposed Gini, J-measure, etc.

12 There are lots of measures proposed in the literature Some measures are good for certain applications, but not for others What criteria should we use to determine whether a measure is good or bad? What about Apriori- style support based pruning? How does it affect these measures?

13 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule derivation, visualization and validation zMulti-level Associations zSummary

14 Multiple-Level Association Rules zItems often form hierarchy. zItems at the lower level are expected to have lower support. zRules regarding itemsets at appropriate levels could be quite useful. zTransaction database can be encoded based on dimensions and levels zWe can explore shared multi- level mining Food bread milk skim SunsetFraser 2%white wheat

15 Mining Multi-Level Associations zA top_down, progressive deepening approach: y First find high-level strong rules: milk  bread [20%, 60%]. y Then find their lower-level “weaker” rules: 2% milk  wheat bread [6%, 50%]. zVariations at mining multiple-level association rules. yLevel-crossed association rules: 2% milk  Wonder wheat bread yAssociation rules with multiple, alternative hierarchies: 2% milk  Wonder bread

16 Multi-level Association: Uniform Support vs. Reduced Support zUniform Support: the same minimum support for all levels y+ One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. y– Lower level items do not occur as frequently. If support threshold xtoo high  miss low level associations xtoo low  generate too many high level associations zReduced Support: reduced minimum support at lower levels yThere are 4 search strategies: xLevel-by-level independent xLevel-cross filtering by k-itemset xLevel-cross filtering by single item xControlled level-cross filtering by single item

17 Uniform Support Multi-level mining with uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Back

18 Reduced Support Multi-level mining with reduced support 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Back Milk [support = 10%]

19 Multi-level Association: Redundancy Filtering zSome rules may be redundant due to “ancestor” relationships between Example ymilk  wheat bread [support = 8%, confidence = 70%] y Say that 2%Milk is 25% of milk sales, then: y2% milk  wheat bread [support = 2%, confidence = 72%] zWe say the first rule is an ancestor of the second rule. zA rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.

20 Multi-Level Mining: Progressive Deepening zA top-down, progressive deepening approach: y First mine high-level frequent items: milk (15%), bread (10%) y Then mine their lower-level “weaker” frequent itemsets: 2% milk (5%), wheat bread (4%) zDifferent min_support threshold across multi-levels lead to different algorithms: yIf adopting the same min_support across multi-levels then toss t if any of t’s ancestors is infrequent. yIf adopting reduced min_support at lower levels then examine only those descendents whose ancestor’s support is frequent/non-negligible.

21 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule derivation, visualization and validation zMulti-level Associations zTemporal associations and frequent sequences zOther association mining methods zSummary zTemporal associations and frequent sequences [later]

22 Other Association Mining Methods zCHARM: Mining frequent itemsets by a Vertical Data Format zMining Frequent Closed Patterns zMining Max-patterns zMining Quantitative Associations [e.g., what is the implication between age and income?] zConstraint-base association mining z Frequent Patterns in Data Streams: very difficult problem. Performance is a real issue zConstraint-based (Query-Directed) Mining zMining sequential and structured patterns

23 Summary zAssociation rule mining yprobably the most significant contribution from the database community in KDD zNew interesting research directions yAssociation analysis in other types of data: spatial data, multimedia data, time series data, zAssociation Rule Mining for Data Streams: a very difficult challenge.

24 Statistical Independence zPopulation of 1000 students y600 students know how to swim (S) y700 students know how to bike (B) y420 students know how to swim and bike (S,B) yP(S  B) = 420/1000 = 0.42 yP(S)  P(B) = 0.6  0.7 = 0.42 yP(S  B) = P(S)  P(B) => Statistical independence yP(S  B) > P(S)  P(B) => Positively correlated yP(S  B) Negatively correlated