1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
1 Classification Using Statistically Significant Rules Sanjay Chawla School of IT University of Sydney (joint work with Florian Verhein and Bavani Arunasalam)
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Research Project Mining Negative Rules in Large Databases using GRD.
Mining Association Rules
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Data Mining Find information from data data ? information.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney

2 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

3 Data Mining Data Mining research has settled into an equilibrium involving four tasks Pattern Mining (Association Rules) Classification Clustering Anomaly or Outlier Detection Associative Classifier DB ML

4 Association Rule Mining In terms of impact nothing rivals association rule mining within the data mining community –SIGMOD 93 (~4100 citations) Agrawal, Imielinski, Swami –VLDB 94 (~4900 Citations) Agrawal, Srikant –C (~7000 citations) Ross Quinlan –Gibbs Sampling 84 (IEEE PAMI, ~5000 citations) Geman & Geman –Content Addressable Network (~3000) Ratnasamy, Francis, Hadley, Karp

5 Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD) Example: –An implication expression of the form X  Y, where X and Y are itemsets –Example: {Milk, Diaper}  {Beer} Rule Evaluation Metrics –Support (s) Fraction of transactions that contain both X and Y –Confidence (c) Measures how often items in Y appear in transactions that contain X From “Introduction to Data Mining”, Tan,Steinbach and Kumar

6 Mining Association Rules Two-step approach: 1.Frequent Itemset Generation –Generate all itemsets whose support  minsup 2.Rule Generation –Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is computationally expensive

7 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

8 Associative Classifiers Most of the Associative Classifiers are based on rules discovered using the support-confidence criterion. The classifier itself is a collection of rules ranked using their support or confidence.

9 Associative Classifiers (2) TIDItemsGender 1Bread, MilkF 2Bread, Diaper, Beer, EggsM 3Milk Diaper, Beer, CokeM 4Bread, Milk, Diaper, BeerM 5Bread, Milk, Diaper, CokeF In a Classification task we want to predict the class label (Gender) using the attributes A good (albeit stereotypical) rule is {Beer,Diaper}  Male whose support is 60% and confidence is 100%

10 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

11 Imbalanced Data Set In some application domains, Data Sets are Imbalanced : –The proportion of samples from one class is much smaller than the other class/classes. –And the smaller class is the class of interest. Support and confidence are biased toward the majority class, and do not perform well in such cases.

12 Downsides of Support Support is biased towards the majority class –Eg: classes = {yes, no}, sup({yes})=90% –minSup > 10% wipes out any rule predicting “no” –Suppose X  no has confidence 1 and support 3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class!

13 Downside of Confidence(1) Conf(A  C) = 20/25 = 0.8 Support(A  C) = 20/100 = 0.2 Correlation between A and C: Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated.

14 Downside of Confidence (2) Reasonable to expect that for “good rules” the antecedent and consequent are not independent! Suppose –P(Class=Yes) = 0.9 –P(Class=Yes|X) = 0.9

15 Downsides of Confidence (3) Another useful observation Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class. Confidence (support) tends to bias the majority class.

16 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

17 Contingency Table A 2 * 2 Contingency Table for X → y. We will use the notation [a, b; c, d] to represent this table.

18 Fisher Exact Test Given a table, [a, b; c, d], Fisher Exact Test will find the probability (p-value) of obtaining the given table under the hypothesis that {X, ¬X} and {y, ¬y} are independent. The margin sums (∑rows, ∑cols) are fixed.

19 Fisher Exact Test (2) The p-value is given by: We will only use rules whose p-values are below the level of significant desired (e.g. 0.01). Rules that pass this test are statistically significant in the positively associated direction (e.g. X → y).

20 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

21 Class Correlation Ratio In Class Correlation, we are interested in rules X → y where X is more positively correlated with y than it is with ¬y. The correlation is defined by: where |T| is the number of transactions n.

22 Class Correlation Ratio (2) We then use corr() to measure how correlated X is with y compared to ¬y. X and y are positively correlated if corr(X→y)>1, and negatively correlated if corr(X→y)<1.

23 Class Correlation Ratio (3) Based on correlation corr(), we define the Class Correlation Ratio (CCR): The CCR measures how much more positively the antecedent is correlated with the class it predicts (e.g. y), relative to the alternative class (e.g. ¬y).

24 Class Correlation Ratio (4) We only use rules with CCR higher than a desired threshold, so that no rules are used that are more positively associated with the classes they do not predict.

25 The two measurements We perform the following tests to determine whether a potentially interesting rule is indeed interesting: –Check the significant of a rule X → y by performing the Fisher’s Exact Test. –Check whether CCR(X→y) > 1. Those rules that pass the above two tests are candidates for the classification task.

26 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

27 Search and Pruning Strategies To avoid examining the whole set of possible rules, we use search strategies that ensure the concept of being potential interesting is anti-monotonic: X→y might be considered as potential interesting if and only if all {X’→y|X’ in X} have been found to be potentially interesting.

28 Search and Pruning Strategies (2) The contingency table [a, b; c, d] used to test for the significance of the rule X → y in comparison to one of its generalizations X-{z} → y for the Aggressive search strategy.

29 Example Suppose we have already determined that the rules (A = a1)  1 and (A = a2)  1 are significant. Now we want to test if X=(A =a1) ^ (A=a2)  1 is significant Then we carry out a FET and calculate the CCR on X and X –{A=a1} (i.e. z = {a2})and X and X- {A=a2} (i.e. z = {a1}). If the minimum of their p-value is less than the significance level, and their CCR is greater than 1, we keep the X  1 rule, otherwise we discard it.

30 Ranking Rules Strength Score (SS): –In order to determine how interesting a rule is, we need a ranking (ordering) of the rules, and the ordering is defined by the Strength Score.

31 Overview Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets –Fisher’s Exact Test –Class Correlation Ratio (CCR) –Searching and Pruning Strategies –Experiments

32 Experiments (Balanced Data) The preceding approach is represented by “SPARCCC”. The experiments on Balanced Data Sets show that the average accuracy of SPARCCC compares favourably to CBA and C4.5. –The table below is the prediction accuracy on balanced data sets.

33 Experiments (Imbalanced Data) True Positive Rate (Recall/Sensitivity) is a better performance measure for imbalanced data sets. SPARCCC overcomes other rule based techs such as CBA and CCCS. –The table below is True Positive Rate of the Minority Class on Imbalanced version of the Datasets.

34 References Florian Verhein, Sanjay Chawla. Using Significant, Positively Associated and Relatively Class Correlated Rules For Associative Classification of Imbalanced Datasets. The 2007 IEEE International Conference on Data Mining. Omaha NE, USA. October 28-31, 2007.