1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
A distributed method for mining association rules
Data Mining Techniques Association Rule
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
Mining Multiple-level Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
ICML-2002Xindong Wu, University of Vermont, USA 1 Mining both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Association Rules
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu Faculty of Mathematics.
Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
CS548 Spring 2016 Association Rules Showcase by Shijie Jiang, Yuting Liang and Zheng Nie Showcasing work by C.J. Carmona, S. Ramírez-Gallego, F. Torres,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
By Arijit Chatterjee Dr
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts
Presentation transcript:

1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 2, MARCH/APRIL 2003

2 Pre-work Knowledge management. Knowledge discovery Data mining. Data warehouse

3 Knowledge Management Building data warehousing by Knowledge management

4 Knowledge Discovery and Data Mining Data mining is a tool of knowledge discovery

5 Why data mining Simon Commodities Supermarket If a supermarket manager, simon, want to arrange these commodities into supermarket, how to do will make more revenues, conveniences …. if one customer buys milk then he is likely to buy bread, so...

6 Why data mining Simon Before long, if simon want to send some advertisement letters for customers, how to consider the individual differences is an important task. Mary always buys diapers and milk powders, she may have a baby, so ….

7 The role of Data mining Preprocess data Useful patterns Knowledge and strategy

8 Mining association rules Bread Milk IF bread is bought then milk is bought

9 Mining steps step1: define minsup and minconf ex: minsup=50% minconf=50% step2: find large itemsets step3: generate association rules

10 Example Large itemsets

11 Outline Introduction Weights of Data Sources Rule Selection Synthesizing High-Frequency Rules Algorithm Relative Synthesizing Model Experiments Conclusion

12 AB→C A→D B→E AB→C A→D B→E Introduction Framework DB 1 DB 2... DB n RD 1 RD 2 RD n... GRB Synthesizing High-Frequency Rules Weighting Ranking AB→C A→D B→E

13 Weights of Data Sources Definition D i : data sources S i : set of association rules from D i R i : association rule 3 Steps Step 1 : union of all S i Step 2 : assigning each R i a weight Step 3 : assigning each D i a weight & normalization

14 Example 3 Data Sources (minsupp=0.2, minconf=0.3) S 1 AB→C with supp=0.4, conf=0.72 A→D with supp=0.3, conf=0.64 B→E with supp=0.34, conf=0.7 S 2 B→C with supp=0.45, conf=0.87 A→D with supp=0.36, conf=0.7 B→E with supp=0.4, conf=0.6 S 3 AB→C with supp=0.5, conf=0.82 A→D with supp=0.25, conf=0.62

15 Step 1 Union of all S i S’ = {S 1, S 2, S 3 } R 1 : AB→C S 1, S 3  2 times R 2 : A→D S 1, S 2, S 3  3 times R 3 : B→E S 1, S 2  2 times R 4 : B→C S 2  1 time

16 Step 2 Assigning each R i a weight R 1 R 2 R 3 R 4 WR1 =WR1 = = 0.25 WR2 =WR2 = = WR3 =WR3 = = 0.25 WR4 =WR4 = = 0.125

17 Step 3 Assigning each D i a weight W D 1 2*0.25+3* *0.25=2.125 W D 2 1* *0.25+3*0.375=2 W D 3 2*0.25+3*0.375=1.625 Normalization W D 1  2.125/( )= W D 2  2/( )=0.348 W D 3  1.625/( )=0.2825

18 Why Rule Selection ? Goal Extracting High-Frequency Rules Low-Frequency Rules  Noise Solution If Num(R i ) / n <  n : data sources, Num(R i ) : frequency of R i Then Rule R i  be wiped out

19 Rule Selection Example : 10 Data Sources D 1 ~D 9 : {R 1 : X→Y} D 10 : {R 1 : X→Y, R 2 : X 1 →Y 1, …, R 11 : X 10 →Y 10 } Let  =0.8 Num(R 1 ) / 10 = 10/10 = 1 >   keep Num(R 2~11 ) / 10 = 1/10 = 0.1 <   be wiped out D 1 ~D 10 : {R 1 : X→Y} W R 1 : 10/10=1  W D 1~10 : 10*1 / 10*10*1 = 0.1 n Num(R 1 ) WR1WR1

20 Comparison Without Rules Selection W D 1~9  W D 10  With Rules Selection W D 1~10  0.1 From High-Frequency Rules Point of view Weight Errors D 1~9  | |  D 10  | |  Total Error = 0.01

21 Synthesizing High-Frequency Rules Algorithm 5 Steps Step 1 : Rules Selection Step 2 : Weights of Data Sources Step 2.1 : union of all S i Step 2.2 : assigning each R i a weight Step 2.3 : assigning each D i a weight & normalization Step 3 : computing supp & conf of each R i Step 4 : ranking all rules by support Step 5 : output the High-Frequency Rules

22 An Example 3 Data Sources  =0.4, minsupp=0.2, minconf=0.3

23 Step 1 Rules Selection R 1 : AB→C S 1, S 3  2 times Num(R 1 ) / 3 = 0.66  keep R 2 : A→D S 1, S 2, S 3  3 times Num(R 2 ) / 3 = 1  keep R 3 : B→E S 1, S 2  2 times Num(R 3 ) / 3 = 0.66  keep R 4 : B→C S 2  1 time Num(R 4 ) / 3 = 0.33  wiped out

24 Step 2 : Weights of Data Sources Weights of R i Weight of D i W D 1  2*0.29+3*0.42+2*0.29=2.42 W D 2  3*0.42+2*0.29=1.84 W D 3  2*0.29+3*0.42=1.84 Normalization W D 1  2.42/( )=0.3695=0.396 W D 2  1.84/( )=0.302 W D 3  1.84/( )=0.302 WR1 =WR1 = = 0.29 WR2 =WR2 = = 0.42 WR2 =WR2 = = 0.29

25 Step 3 Computing supp & conf of each R i Support AB  C 0.396* *0.5= A  D 0.396* *0.36=0.228 B  E 0.396* *0.4=0.255 Confidence AB  C 0.396* *0.82=0.532 A  D 0.396* *0.7=0.465 B  E 0.396* *0.6=0.458 WD 1 =0.396 WD 2 =0.302 WD 3 =0.302

26 Step 4 & Step 5 Ranking all rules by support & output minsupp=0.2, minconf=0.3 AB  C, B  E, A  D Ranking 1. AB  C (0.3094) 2. B  E (0.255) 3. A  D (0.228) Output – 3 rules AB  C(0.3094, 0.532) B  E (0.255, 0.458) A  D (0.228, 0.465)

27 Internet Relative Synthesizing Model Framework Web books journals X→Y conf=0.7 X→Y conf=0.72 X→Y conf=0.68 X→Y conf=? Synthesizing clustering method roughly method Unknown D i

28 Synthesizing Methods Physical Meaning if the confidences  irregularly distributed Maximum synthesizing operator Minimum synthesizing operator Average synthesizing operator if the confidences (X)  normal distribution clustering  interval [a, b] satisfy 1. P{ a  X  b } (m/n)  2. | b – a |   3. a, b > minconf.

29 Clustering Method 5 Steps Step 1 : closeness  1 - | conf i – conf j | The distance relation table Step 2 : closeness degree measure The confidence-confidence matrix Step 3 : two confidences  close enough ?  The confidence relationship matrix Step 4 : classes creating [a, b]  interval of the confidence of rule X→Y Step 5 : interval verifying satisfy the constraints ?

30 An Example Assume rule  X→Y conf 1 =0.7, conf 2 =0.72, conf 3 =0.68, conf 4 =0.5 conf 5 =0.71, conf 6 =0.69, conf 7 =0.7, conf 8 = parameters =0.7  =0.08  =0.69

31 Step 1 : Closeness Example conf 1 =0.7, conf 2 =0.72 c 1, 2 = 1 - | conf 1 - conf 2 | = 1 - | |=0.98

32 Step 2 : Closeness Degree Measure Example         

33 Step 3 : Close Enough ? Example  =6.9 > 6.9 < 6.9

34 Step 4 : Classes Creating Example Class 1 : conf 1~3, conf 5~7 1 Class 2 : conf 4 Class 3 : conf 8 2 3

35 Step 5 : Interval Verifying Example Class 1 conf 1 =0.7, conf 2 =0.72, conf 3 =0.68, conf 5 =0.71, conf 6 =0.69, conf 7 =0.7 [min, max] = [conf 3, conf 2 ] = [0.68, 0.72] constraint 1  P{ 0.68  X  0.72 } (6/8)  (0.7) constraint 2  | | (0.04) <  (0.08) constraint 3  0.68, 0.75 > minconf. (0.65) In the same way Class 2 & Class 3  be wiped out Result  X→Y : conf=[0.68, 0.72] Support ? In the same way  Interval

36 Roughly Method Example R : AB→C supp 1 =0.4, conf 1 =0.72 supp 2 =0.5, conf 2 =0.82 Maximum max ( supp (R) )=max (0.4, 0.5)=0.5 max ( conf (R) )=max (0.72, 0.82)=0.82 Minimum & Average min  0.4, 0.72 avg  0.45, 0.77

37 Experiments Time SWNBS (without rules selection) SWBRS (with rules selection) SWNBS > SWBRS Error first 20 frequent itemset Max= Avg=

38 Conclusion Synthesizing Model Data Sources  known weighting Data Sources  unknown clustering method roughly method

39 Future works Sequence pattern Combine GA and other techniques