Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li.

Slides:



Advertisements
Similar presentations
Protein Secondary Structure Prediction Using BLAST and Relaxed Threshold Rule Induction from Coverings Leong Lee Missouri University of Science and Technology,
Advertisements

Data Mining Techniques Association Rule
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Affinity Set and Its Applications Moussa Larbani and Yuh-Wen Chen.
ANALYZING MORE GENERAL SITUATIONS UNIT 3. Unit Overview  In the first unit we explored tests of significance, confidence intervals, generalization, and.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Clustering Categorical Data: An Approach Based on Dynamical Systems (1998) David Gibson, Jon Kleinberg, Prabhakar Raghavan VLDB Journal: Very Large Data.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
PSY 307 – Statistics for the Behavioral Sciences
Mutual Information Mathematical Biology Seminar
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Association Rules Olson Yanhong Li. Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large.
Fast Algorithms for Association Rule Mining
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Decision Tree Models in Data Mining
Chapter 13 – Association Rules
CS 349: Market Basket Data Mining All about beer and diapers.
Basic Data Mining Techniques
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
CIS 600: Master's Project Online Trading and Data Mining- Based Marketing of IT Books Supervisor : Dr. Haiping Xu Student : Tsung-Ta Tu Student ID :
Apriori Algorithms Feapres Project. Outline 1.Association Rules Overview 2.Apriori Overview – Apriori Advantage and Disadvantage 3.Apriori Algorithms.
Mining various kinds of Association Rules
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
2008/9/15fuzzy set theory chap01.ppt1 Introduction to Fuzzy Set Theory.
1 Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison Shih-Ming Bai and Shyi-Ming Chen Department of Computer Science and Information.
Decision Tree Algorithms Rule Based Suitable for automatic generation.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
David L. Olson Department of Management University of Nebraska
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Waikato Environment for Knowledge Analysis
MIS2502: Data Analytics Classification using Decision Trees
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Presentation transcript:

Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li

Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large databases IF X THEN Y –Initial data mining analysis –Not predictive SUPPORT: degree to which relationship appears in data CONFIDENCE: probability that if X, then Y

Association Rule Algorithms APriori Agrawal et al., 1993; Agrawal & Srikant, 1994 –Find correlations among transactions, binary values Weighted association rules Cai et al., 1998; Lu et al Cardinal data Srikant & Agrawal, 1996 –Partitions attribute domain, combines adjacent partitions until binary

Fuzzy Analysis Deal with vagueness & uncertainty Fuzzy Set Theory –Zadeh [1965] Probability Theory –Pearl [1988] Rough Set Theory –Pawlak [1982] Set Pair Theory –Zhao [2000]

Fuzzy Association Rules Most based on APriori algorithm Treat all attributes as uniform Can increase number of rules by decreasing minimum support, decreasing minimum confidence –Generates many uninteresting rules –Software takes a lot longer

Gyenesei (2000) Studied weighted quantitative association rules in fuzzy domain –With & without normalization –NONNORMALIZED Used product operator to define combined weight and fuzzy value If weight small, support level small, tends to have data overflow –NORMALIZED Used geometric mean of item weights as combined weight Support then very small

Algorithm Get membership functions, minimum support, minimum confidence Assign weight to each fuzzy membership for each attribute (categorical) Calculate support for each fuzzy region If support > minimum, OK If confidence > minimum, OK If both OK, generate rules

Demo Model: Loan App CaseAgeIncomeRiskCreditResult Red Green Green Amber Green Green Green Green Green Red1

Fuzzified Age Figure 2: The membership functions of attibute Age Age Membership value YoungMiddleOld

Fuzzify Age CaseAgeYoungMiddleOld

Calculate Support for Each Pair of Fuzzy Categories Membership value –Identify weights for each attribute –Identify highest fuzzy membership category for each case Membership value = minimum weight associated with highest fuzzy membership category Support –Average membership value for all cases

Support by Single Item CategoryWeightSup(R jk ) Age YoungR Age MiddleR Age OldR Income HighR Income MiddleR Income LowR Risk HighR Risk MiddleR Risk LowR Credit GoodR Credit BadR

Support If support for pair of categories is above minimum support, retain Identifies all pairs of fuzzy categories with sufficiently strong relationship For outcomes, R 51 (On Time) strong, R 52 (Default) not

Support by Pair: minsup 0.25 R 11 R R 22 R R 11 R R 22 R R 11 R R 31 R R 11 R R 31 R R 22 R R 41 R

Support by Triplet: minsup 0.25 R 22 R 41 R R 22 R 31 R R 22 R 31 R R 31 R 41 R

Quartets None qualify, so algorithm stops

Confidence Identify direction For those training set cases involving the pair of attributes, what proportion came out as predicted?

Confidence Values: Pairs Minimum confidence 0.9 R 22 →R R 41 R 22 →R R 41 →R R 41 R 51 →R R 22 →R R 22 R 51 →R R 51 →R R 31 R 41 →R R 31 →R R 31 R 51 →R R 41 →R R 51 R 41 →R R 31 →R R 51 →R R 41 →R R 51 →R

4 Rules IF Income is Middle THEN Outcome is On-Time –R 22 →R 51 support 0.490confidence IF Credit is Good THEN Outcome is On-Time –R 41 →R 51 support 0.576confidence IF Income is Middle AND Credit is Good THEN Outcome is On-Time –R 22 R 41 →R 51 support 0.419confidence IF Risk is High AND Credit is Good THEN Outcome is On-Time –R 31 R 41 →R 51 support 0.266confidence 0.993

Rules vs. Support

Rules vs. Confidence

Higher order combinations Try triplets –If ambitious, sets of 4, and beyond Here, none Problems: –Computational complexity explodes –Doesn’t guarantee total coverage That also would explode complexity Can control by lowering minsup, minconf

Simulation Testing Selected 550 cases –Held out 100 Randomly assigned weights to each fuzzy region of each attribute –minsup {0.35, 0.45, 0.55, 0.65} –minconf {0.7, 0.8, 0.9}

Simulation Results