1 Classification Using Statistically Significant Rules Sanjay Chawla School of IT University of Sydney (joint work with Florian Verhein and Bavani Arunasalam)

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Evaluating Classifiers
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Fast Algorithms for Association Rule Mining
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Elsayed Hemayed Data Mining Course
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association rule mining
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Department of Computer Science National Tsing Hua University
Lecture 11 (Market Basket Analysis)
Association Analysis: Basic Concepts
Presentation transcript:

1 Classification Using Statistically Significant Rules Sanjay Chawla School of IT University of Sydney (joint work with Florian Verhein and Bavani Arunasalam)

2 Overview Data Mining Tasks Associative Classifiers Support and Confidence for Imbalanced Datasets The use of exact tests to mine rules Experiments Conclusion

3 Data Mining Data Mining research has settled into an equilibrium involving four tasks Pattern Mining (Association Rules) Classification Clustering Anomaly or Outlier Detection Associative Classifier DB ML

4 Outlier Detection Outlier Detection (Anomaly Detection) can be studied from two aspects –Unsupervised Nearest Neighbor or K-Nearest Neighbor Problem –Supervised Classification for imbalanced data set –Fraud Detection –Medical Diagnosis

5 Association Rule Mining In terms of impact nothing rivals association rule mining within the data mining community –SIGMOD 93 (~4100 citations) Agrawal, Imielinski, Swami –VLDB 94 (~4900 Citations) Agrawal, Srikant –C (~7000 citations) Ross Quinlan –Gibbs Sampling 84 (IEEE PAMI, ~5000 citations) Geman & Geman –Content Addressable Network (~3000) Ratnasamy, Francis, Hadley, Karp

6 Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD) Example: –An implication expression of the form X  Y, where X and Y are itemsets –Example: {Milk, Diaper}  {Beer} Rule Evaluation Metrics –Support (s) Fraction of transactions that contain both X and Y –Confidence (c) Measures how often items in Y appear in transactions that contain X From “Introduction to Data Mining”, Tan,Steinbach and Kumar

7 Association Rule Mining (ARM) Task Given a set of transactions T, the goal of association rule mining is to find all rules having –support ≥ minsup threshold –confidence ≥ minconf threshold Brute-force approach: –List all possible association rules –Compute the support and confidence for each rule –Prune rules that fail the minsup and minconf thresholds  Computationally prohibitive!

8 Mining Association Rules Two-step approach: 1.Frequent Itemset Generation –Generate all itemsets whose support  minsup 2.Rule Generation –Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive

9 Frequent Itemset Generation Given d items, there are 2 d possible candidate itemsets From “Introduction to Data Mining”, Tan,Steinbach and Kumar

10 Reducing Number of Candidates Apriori principle: –If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure: –Support of an itemset never exceeds the support of its subsets –This is known as the anti-monotone property of support

11 Found to be Infrequent Illustrating Apriori Principle Pruned supersets From “Introduction to Data Mining”, Tan,Steinbach and Kumar

12 Classification using ARM TIDItemsGender 1Bread, MilkF 2Bread, Diaper, Beer, EggsM 3Milk Diaper, Beer, CokeM 4Bread, Milk, Diaper, BeerM 5Bread, Milk, Diaper, CokeF In a Classification task we want to predict the class label (Gender) using the attributes A good (albeit stereotypical) rule is {Beer,Diaper}  Male whose support is 60% and confidence is 100%

13 Classification Based on Association (CBA): Liu, Hsu and Ma (KDD 98) Mine association rules of the form A  c on the training data Prune or Select rules using a heuristic Rank rules –Higher confidence; higher support; smallest antecedent New data is passed through the ordered rule set –Apply first matching rule or variation thereof

14 Several Variations AcronymAuthorsForumComments CMARLi, Han, PeiICDM 01Use Chi^2 --Antonie, Zaiane DMKD 04Pos and Neg rules FARMERCong et. alSIGMOD 04row enumeration Top-KCong et. alSIGMOD 05Limit no. of rules CCCSArunasalam et. Al. SIGKDD 06Support-free

15 Downsides of Support (1) High support does not necessarily mean the rule is statistically significant. –Meggido and Srikant (KDD 98) claim that high support & confidence filter out non-significant rules. –However the null hypothesis is that true support s = minsup –Alternative hypothesis is s > minsup –Assumes rules with high support are significant (provides no evidence that they are)

16 Downside of Support (2) High support does not mean good classification performance: –Many good rules have low support! –Evidenced by requirement of low support in CBA, CMAR, etc.

17 Downsides of Support (3) Support is biased towards the majority class –Eg: classes = {yes, no}, sup({yes})=90% –minSup > 10% wipes out any rule predicting “no” –Suppose X  no has confidence 1 and support 3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class! In summary, support has many downsides –especially for classification.

18 Downside of Confidence(1) Conf(A  C) = 20/25 = 0.8 Support(A  C) = 20/100 = 0.2 Correlation between A and C: Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated.

19 Downside of Confidence (2) Reasonable to expect that for “good rules” the antecedent and consequent are not independent! Suppose –P(Class=Yes) = 0.9 –P(Class=Yes|X) = 0.9

20 Complement Class Support(CCS) The following are equivalent for a rule A  C 1.A and C are positively correlated 2.The support of the antecedent(A) is less than CCS(A  C) 3.Conf(A  C) is greater than the support of Consequent(C)

21 Downsides of Confidence (3) Another useful observation Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class. Confidence (support) tends to bias the majority class.

22 Statistical Significant Rules Support is a computationally efficient measure (anti-monotonic) –Tendency to “force” a statistical interpretation on support Lets start with a statistically correct approach –And “force” it to be computationally efficient

23 Exact Tests Let the class variable be {0,1}. Suppose we have two rules X  1 and Y  1 We want to determine if the two rules are different, i.e., they have different effect on “causing” or ‘associating” with 1 or 0 –e.g., medicine and placebo

24 Exact Tests We assume X and Y are binomial random variables with the same parameter p We want to determine, the probability that a specific table instance occurs “purely by chance” Table[a,b;c,d]

25 Exact Tests We can calculate the exact probability of a specific table instance without resorting to asymptotic approximations. This can be used to calculate the p-value of [a,b;c,d]

26 Fisher Exact Test Given a table, [a,b;c,d], Fisher Exact Test will find the probability (p-value) of obtaining the given table or a more positively associated table under the assumption that X and Y come from the same distribution.

27 Forcing “anti-monotonic” We test a rule X  1 against all its immediate generalizations {X-z  1; z in X} The The rule is significant if –P value < significance level (typically 0.05) Use a bottom up approach and only test rules whose immediate generalizations are significant Webb[06] has used Fisher Exact tests for generic association rule mining

28 Example Suppose we have already determined that the rules (A = a1)  1 and (A = a2)  1 are significant. Now we want to test if –X=(A =a1) ^ (A=a2)  1 is significant Then we carry out a FET on X and X –{A=a1} and X and X-{A=a2}. If the minimum of their p-value is less than the significance level we keep the X  1 rule, otherwise we discard it.

29 Contigency Table

30 Ranking Rules We have already observed that –high confidence rule for the majority class may be “more” negatively correlated than the same rule predicting the other class –A high positively correlated rule that predicts the minority class may have a lower confidence than the same rule predicting the other class

31 Experiments: Random Dataset Attributes independent and uniformly distributed. Makes no sense to find any rules – other than by chance However minSup=1% and minConf=0.5 mines 4149/ over 31% of all possible rules. Using our FET technique with standard significance level we find only 11 (0.08%)

32 Experiments: Balanced Dataset Similar performance (within 1%)

33 Experiments: Balanced Dataset But mines only 0.06% the number of rules By searching only 0.5% of the search space And using only 0.4% of the time.

34 Experiments: Imbalanced Dataset Higher performance than support-confidence techniques Using 0.07% of the search space and time, 0.7% the number of rules.

35 Contributions Strong evidence and arguments against the use of support and confidence for imbalanced classification Simple technique for using Fisher’s Exact test for finding positively associated and statistically significant rules. –Uses on average 0.4% of the time, searches only 0.5% of the search space, finds only 0.06% of the rules as support-confidence techniques. –Similar performance on balanced datasets, higher on imbalanced datasets. Parameter free (except for significance level)

36 References Verhein and Chawla, Classification Using Statistically Significant Rules Arunasalam and Chawla, CCCS: A Top- Down Associative Classifier for Imbalanced Class Distribution [ACM SIGKDD 2006; pp ]