Mining interesting association rules Loo Kin Kong 22 Feb 2002.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Identifying Interesting Association Rules with Genetic Algorithms
CSE 634/590 Data mining Extra Credit: Submitted By: Moieed Ahmed
Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.
Action Rules Discovery /Lecture I/ by Zbigniew W. Ras UNC-Charlotte, USA.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Are We Really Discovering “Interesting” Knowledge from Data? Alex A. Freitas University of Kent, UK.
Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Decision Tree Algorithm
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Dr. Abdul Aziz Associate Dean Faculty of Computer Sciences Riphah International University Islamabad, Pakistan Dr. Nazir A. Zafar.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
DATA MINING -ASSOCIATION RULES-
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Data Mining – Intro.
Meta Learning and Active Learning: Meta Learning and Active Learning: Collaborative Knowledge Discovery in Distributed Systems Dr Yonghong Peng Department.
Equivalence Class Testing
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Three methods developed for these objectives Based on machine learning and supervised learning Under the evolutionary paradigm –specifically Genetic Programming.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
CS690L Data Mining: Classification
Research issues on association rule mining Loo Kin Kong 26 th February, 2003.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining – Intro.
What Is Cluster Analysis?
Kyriaki Dimitriadou, Brandeis University
Data Mining Association Analysis: Basic Concepts and Algorithms
Introduction C.Eng 714 Spring 2010.
Association Rule Mining
Mining Unexpected Rules by Pushing User Dynamics
Discriminative Frequent Pattern Analysis for Effective Classification
Classification and Prediction
©Jiawei Han and Micheline Kamber
Graph Classification SEG 5010 Week 3.
Exploiting the Power of Group Differences to Solve Data Analysis Problems Classification Guozhu Dong, PhD, Professor CSE
Presentation transcript:

Mining interesting association rules Loo Kin Kong 22 Feb 2002

Plan Motivation Interestingness measures for association rules Objective measures Subjective measures Research issues Conclusion

Motivation KDD aims at finding new and interesting knowledge from databases, but... Numerous association rules are generated in a mining process Some of the mined rules may be trivial facts... “ Pregnant ”  “ Female ”, Supp=20%, Conf=100%

Motivation (Cont ’ d)... while some other rules may be redundant “ Drive fast ”  “ Had an accident ”, Supp=10%, Conf=40% “ Drive fast ” and “ Born in HK ”  “ Had an accident ”, Supp=9%, Conf=42% The study of “ interestingness ” of association rules aims at presenting only the rules that are interesting to the user Closely related to the study of “ surprisingness ” or “ unexpectedness ” of association rules

Interestingness: two approaches Objective measures (data-driven) Subjective measures (user-driven)

Objective measures Mined rules are ranked by a pre-defined ranking system, or Mined rules are filtered by a set of pre-defined pruning rules

Pruning rules [Shah 99] proposed five pruning rules: For two rules r 1 and r 2 with similar strength: if r 1 = A  C and r 2 = A  B  C, r 2 is redundant if r 1 = A  C and r 2 = B  C, while B  A but A  B is false, r 2 is redundant if r 1 = A  C and r 2 = B  C, while B  A and A  B are both true, both r 1 and r 2 are weak if r 1 = A  B and r 2 = A  B  C, r 1 is redundant if r 1 = A  B and r 2 = A  C, while B  C, r 2 is redundant

Small disjuncts A disjunct is a conjunctive set of conditions e.g., (C 11  C 12 ...  C 1m )  (C 21  C 22 ...  C 2n ) ... The size of a disjunct is determined by the number of tuples covered by the disjunct Small disjuncts may contain surprising knowledge, although they are prone to errors

Small disjuncts (Cont ’ d) [Freitas 98] proposed that an association rule can be regarded as a disjunct For a rule (disjunct) to be considered surprising, for each rule r = (i 1  i 2 ...  i n )  I c, count(r) = 0 for each minimal generalization r ’ of r if RHS of r ’  RHS of r count(r)++ Rules are ranked according to their counts

Subjective measures Users are required to specify whether the mined rules are interesting... But it is impossible to do so rule by rule Hence rules are handled collectively

Rule templates Interesting and uninteresting rules can be specified with templates [Klemettinen et al. 94] A rule template specifies what attributes to occur in the LHS and RHS of a rule e.g., any rule in the form “ Pregnant ” & (any number of conditions)  “ Female ” is uninteresting

Eliminating uninteresting rule families Proposed in [Sahar 99] For a rule r = A  B, r ’ = a  b is an ancestor rule if a  A and b  B. r’ is said to cover r. An ancestor rule can be classified as one of the following: True-Not-Interesting (TNI) Not-True-Interesting (NTI) Not-True-Not-Interesting (NTNI) True-Interesting (TI)

Eliminating uninteresting rule families (Cont ’ d) The algorithm: Let  denote the set of association rules from data mining Iteratively: The ancestor rule r ’ that covers the largest number of rules in  is presented to user for classification r ’ is classified as one of TNI, NTI, NTNI and TI  is pruned according to the classification of r ’

Research issues The problem of rule interestingness is difficult because domain knowledge and/or user interaction [Sahar 99] Possible research directions: Machine learning on interesting rules How interestingness information is used in a data mining process

Conclusion Association rule interestingness is an important, but difficult, problem Measures of rule interestingness include subjective and objective ones Objective interestingness measures are data driven Subjective interestingness measures require users to specify whether a rule is interesting

References [Dong et al. 01] Guozhu Dong and Kaustubh Deshpande. Efficient Mining of Niches and Set Routines. PAKDD01. [Freitas 98] Alex A. Freitas. On Objective Measures of Rule Surprisingness. PKDD98. [Klemettinen et al. 94] Mika Klemettinen et al. Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM94. [Sahar 99] Sigal Sahar. Interestingness Via What Is Not Interesting. KDD99. [Shah 99] Devavrat Shah. Interestingness and Pruning of Mined Patterns. DMKD99.

Discussion