Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Identifying Interesting Association Rules with Genetic Algorithms
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Relationship Mining Association Rule Mining Week 5 Video 3.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Effect Size and Meta-Analysis
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Active Learning and Collaborative Filtering
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Efficient Mining of Both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA.
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
1 Exploratory Tools for Follow-up Studies to Microarray Experiments Kaushik Sinha Ruoming Jin Gagan Agrawal Helen Piontkivska Ohio State and Kent State.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.
Fast Algorithms for Association Rule Mining
Bulut, Singh # Selecting the Right Interestingness Measure for Association Patterns Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava Department of Computer.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Querying Structured Text in an XML Database By Xuemei Luo.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Uncertainty in Expert Systems
Association Rule Mining
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Bug Localization with Association Rule Mining Wujie Zheng
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Farzaneh Mirzazadeh Fall 2007
Association Analysis: Basic Concepts and Algorithms
Mining Unexpected Rules by Pushing User Dynamics
Probabilistic Ranking of Database Query Results
Association Analysis: Basic Concepts
Presentation transcript:

Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002

Outline The problems Our solutions Work to do

Definitions Association rule: Association rule mining searches for interesting relationships among items in a given data set. Such interesting relationships are typically expressed in an association rule in the form of X=>Y, where X and Y are sets of items. It can be read that, whenever a transaction T contains X, it probably will contain Y. Metrics: The probability is defined as the percentage of transactions containing Y in addition to X with respect to the overall number of transactions containing X. This probability is called confidence (or strength). While the confidence measure represents the certainty of a rule, support is used to represent the usefulness of the rule [1]. Formally, the support of a rule is defined as the percentage of transactions containing both X and Y with respect to the number of transactions in the database. Interesting rules. A rule is considered to be interesting if its confidence and support exceed certain thresholds. Such thresholds are generally assumed to be given by domain experts.

The Problems While the support-confidence framework has been widely used for measuring the interestingness of association rules, it is known that 1. the resulting rules may be misleading [4-8]. A rule with high support and high confidence may still not indicate that X and Y are dependent. 2. The use of thresholds of support and confidence for pruning may obscure important rules, 3. and also many unimportant rules may remain in the resulting rule set.

Many metrics… To address the problems with support- confidence framework, many other metrics are proposed: interest, conviction, gini index, Laplace, phi-coefficients, collective strength, reliability, …. So far, we can find at least 21 metrics in the literature. What to choose??? P. Tan, V. Kumar, J. Srivastava, “Selecting the right Interesting measure for Association pattern.” ACM SIGKDD ’02, 2002.

Our Solutions Six Principles plus partial order, in contrast to prior total order or partial order of support-confidence framework, 1. Implication 2. Correlation 3. Novelty 4. Utility 5. Top-N-rules 6. Efficiency

Implication principle Principle 1 (implication principle): If a set of measures is defined to reflect the interestingness of an association rule, then at least one measure m i (X=>Y)in the set should satisfy the constraint m i (X=>Y)>m i (Y=>X) when P(X)<P(Y).

Correlation principle Principle 2 (correlation principle): If a set of measures is defined to reflect the interestingness of an association rule X=>Y, then at least one measure m i (X=>Y) in the set should be directly proportional to the covariance of X and Y.

Novelty principle Principle 3 (novelty principle): If a set of measures is defined to reflect the interestingness of an association rule X=>Y, then for a given P(XY), at least one measure m i in the set should reflect its novelty. The novelty measure mi should be inversely proportional to p=max{P(X),P(Y)}.

Utility principle Principle 4 (utility principle): If a set of measures is defined to reflect the interestingness of an association rule X=>Y, then at least one measure mi in the set should reflect its utility, i.e., mi is a monotone increasing function with respect to P[XY].

Top-N-rule principle Principle 5 (top-N-rule principle): If a synthetic measure is defined to sort the rules for presenting the top N rules to users, then it is desirable that this measure obeys the principles 1-4.

Efficiency principle Principle 6 (efficiency principle): If a set of measures is defined to reflect the interestingness of an association rule, then it is desirable that thresholds used with those measures help reduce computation complexity.

Partial results SupportconfidenceInterestConvictionReliability Implication x X (when positively correlated) Correlation XXX Novelty XX (when negatively related) Utility X

A few conclusions No measure is absolutely better than the others for obtaining the Top-N rules. When using a synthetic measure such as reliability or conviction, support is still an important utility measure. Interest still should be used as a novelty measure in order to fully characterize rules. Interest not only can be used as a good correlation measure, it also can be used as a good novelty measure. It is always 1 when the rule contains no novel information. When Interest is used as a synthetic measure for ranking rules, then confidence should also be included in addition to support. This is because Interest is a poor measure for implication examination. While we may have three alternate frameworks for fully characterizing rules (support-confidence-interest, support- conviction-interest, support-reliability-interest), the support- confidence-interest framework is best. The other two work well only when rules are positively correlated.

Partial Order Instead of support-confidence framework, we suggest: Support-confidence-interest framework Support-conviction-interest framework Support-reliability-interest framework Other Framework??? Which is the best???

Work to Do Evaluate the frameworks with realistic application data (Image data, KDD cup data, Skyrocket data, …, criticized for lack of support applications) Efficiency principle? P-tree algorithms and other algorithms for comparison Other possible frameworks? Ours are for objective metrics, how to combine subjective metrics for top-N rules?