Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla,

Slides:

Advertisements

Similar presentations

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.

Advertisements

Recap: Mining association rules from large datasets

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Identifying Interesting Association Rules with Genetic Algorithms

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

gSpan: Graph-based substructure pattern mining

Mining Multiple-level Association Rules in Large Databases

Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla,

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.

LOGO Association Rule Lecturer: Dr. Bo Yuan

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

1 Association Graphs Selim Mimaroglu University of Massachusetts Boston.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. (now: University of Illinois.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.

Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.

Fast Algorithms for Association Rule Mining

1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.

Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}

Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

SEG Tutorial 2 – Frequent Pattern Mining.

Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.

Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.

Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Ch5 Mining Frequent Patterns, Associations, and Correlations

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Information Systems Data Analysis – Association Mining Prof. Les Sztandera.

Mining various kinds of Association Rules

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

Association Rule Mining

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Mining Multiple-level Association Rules in Large Databases

Mining Association Rules

Frequent Pattern Mining

©Jiawei Han and Micheline Kamber

Market Basket Many-to-many relationship between different objects

Big Data Analytics: HW#2

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Data Mining Association Rules: Advanced Concepts and Algorithms

Transactional data Algorithm Applications

Association Rule Mining

Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS

©Jiawei Han and Micheline Kamber

Presentation transcript:

Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla, Missouri. Presented by Ebrahim Kobeissi IEEE Transactions on Knowledge and Data Engineering,

OUTLINE 1. Introduction 2. Multiple-Level Association Rules 3. A Method For Mining M-L Association Rules 4. Conclusions 5. Future Work 6. Exam Questions 2

INTRODUCTION Why Multiple-Level (ML) Association Rules Pre-requisites for M-L Data Mining (MLDM *) Possible Approaches and Rationale Assumptions How this differs from previous research *MLDM=Multiple-Level Data Mining 3

WHY MLDM? Rule A=> 70% of customers who bought diapers also bought beer Rule B =>45% of customers who bought cloth diapers also bought dark beer Rule C =>35% of customers who bought Pampers also bought Samuel Adams beer. 4

WHY MLDM? What are the conceptual differences between the three rules? Rule A applies at a generic higher level of abstraction (product) Rule B applies at a more specific level of abstraction (category) Rule C applies at the lowest level of abstraction (brand). This process is called drilling down. 5

WHY MLDM? More specific information at lower levels Hence essential to mine at different levels for any tree Different levels of association rules enable different strategies Helps to factor out uninteresting or coincidental rules 6

PRE-REQUISITES FOR MLDM 1. Data Representation At Different Levels Of Abstraction Level 1: {DIAPERS, BEER} Level 2: {CLOTH,DISPOSABLE} {REGULAR,LITE} Level 3: {‘BUMKINS’,’KUSHIES’,’PAMPERS’,’HUGGIES’} {‘BUDWEISER’, ‘MILLER LITE’, ‘SAMUEL ADAMS’, ‘HEINIKEN’} 2. Efficient Methods for ML Rule Mining(focus of this paper) 7

HIERARCHY TYPES Generalization Specialization (isa relationships) Generalization/Specialization With Multiple Inheritance Whole-Part hierarchies (is-part-of; has-part) 8

GENERALIZATION- SPECIALIZATION 2-Wheels MotorcycleSUV Vehicle 4-Wheels SedanBicycle 9

GENERALIZATION-SPECIALIZATION WITH MULTIPLE INHERITANCE Recreational SnowmobileBicycle Vehicle Commuting Car 10

WHOLE-PART HIERARCHIES Hard Drive PlatterCPU Computer Motherboard RAM RW Head 11

FOCUS OF THE PAPER Determining Efficient Mining Of Multiple-Level Association Rules 12

DIFFERENT METHODS Apply single-level Apriori Algorithm to each of the multiple levels under the same miniconf and minisup. Potential Problems? Higher Levels of abstraction have higher support and lower levels have lower support What is the single most optimum minisup for all levels? Too high a minisup => not too many itemsets for lower levels Too low a minisup => far too many uninteresting rules 13

POSSIBLE SOLUTIONS Have different minisup for different levels Maybe also different miniconf at different levels Progressively decrease minisup as we go down the tree to lower levels This paper studies a progressive deepening method developed by extension of the Apriori Algorithm 14

MAIN ASSUMPTION Explore only descendants of frequent items at any level. In other words, if an item is non-frequent at one level, its descendants no longer figure in further analysis ARE THERE ANY PROBLEMS THAT CAN ARISE BECAUSE OF THIS ASSUMPTION? 15

Will this potentially eliminate possible interesting rules for itemsets at one level whose ancestors are not frequent at higher levels?  If so, can be addressed by 2 workarounds level passage threshold 2 minisup values at higher levels – one for filtering infrequent items, the other for passing down frequent items to lower levels; latter called level passage threshold (lph) sub frequent The lph may be adjusted by user to allow descendants of sub frequent items What is a potential problem with this approach? 16

DIFFERENCES FROM PREVIOUS RESEARCH Other studies use same minisup across different levels of the hierarchy This study…. Uses different minisup values at different levels of the hierarchy Analyzes different optimization techniques Studies the use of interestingness measures 17

MULTIPLE LEVEL ASSOCIATION RULES Definitions Example Taxonomy Working of the algorithm 18

DEFINITIONS- 1 Assume Database contains: 1. Item dataset containing item description: {, } 2. A transaction dataset T containing set of transactions.. {Tid,{Ap…Aq}} where Tid is a transaction identifier(key ) 19

DEFINITION 2.1 A pattern or an itemset A is one item Ai or a set of conjunctive items Ai Λ …. Λ Aj. The support of a pattern is the number of transactions that contain A vs. the total number of transactions  σ(A|S) The confidence of a rule A => B in S is given by: φ(A=>B) = σ(AUB)/ σ(A) ie. it is the conditional probability of B occurring given that A has occurred Specify 2 thresholds  minisup(σ’) and miniconf (φ’); different values at different levels 20

DEFINITION 2.2 pattern A is frequent in set S at level l if : the support of A is no less than the corresponding minimum support threshold σ’ rule A => B is strong for a set S, if: a. each ancestor of every item in A and B is frequent at its corresponding level b. A Λ B is frequent at the current level and c. the confidence of A => B is no less than the miniconf at that level This ensures that the patterns examined at the lower levels arise from itemsets that have a high support at higher levels 21

HOW DOES THE METHOD WORK? Example: Find multiple-level strong association rules for purchase patterns related to category, content and brand 1. Retrieve relevant data from TABLE 2 and merge into a generalized sales_item table with their relevant bar codes replaced by a bar_code set as in TABLE 3TABLE 2 TABLE 3 2. Find frequent patterns and strong rules at highest level. 1-item, 2-item,…k-item itemsets may be discovered of the form {bread, vegetable, milk,…} 3. At the next level the process is repeated but the itemsets will be more specific ex: {2% milk, lettuce, white bread} 4. Repeat steps 2 to 3 at all levels until no more FPs 22

Referenced Tables 23

OUTLINE 1. Introduction 2. Multiple-Level Association Rules 3. A Method For Mining M-L Association Rules 4. Conclusions 5. Future Work 6. Exam Questions 24

ALGORITHM: Taxonomy For This Exercise food milk 2% chocolate bread white wheat Dairyland Foremost Old MillsWonder L1, L2 and L3 correspond to the 3 levels of the hierarchy 25

ALGORITHM: Dataset For This Exercise Trans-idBar_code_set {17325, 92108, ….} {23423, 56432,…} Bar_codeCategoryBrandContentSizePrice 17325milkForemost2%1 Gal$3.31 ….. ……….. GIDBarcode_setCategoryContentbrand 112{17325,31414,91265,…}Milk2%Foremost ….. ……….. TABLE 1 Sales-transaction Table TABLE 2 sales_item (Description) Relation TABLE 3 Generalized sales_item Description Table 26

ALGORITHM: Explanation Of GID G I D = Level 1 Item 1 ‘Milk’ Level 2 Item 1 ‘2%’ Level 3 Item 2 ‘Foremost’ Foremost 2% Milk 27

ALGORITHM: Encoded Transaction Table T[1] TIDItems T1{111, 121, 211, 221} T2{111, 211, 222, 323} T3{112, 122, 221, 411} T4{111, 121} T5{111, 122, 211, 221, 413} T6{211, 323, 524} T7{323, 411, 524, 713} 28

ALGORITHM : Step 1 TIDItems T1{111, 121, 211, 221} T2{111, 211, 222, 323} T3{112, 122, 221, 411} T4{111, 121} T5{111, 122, 211, 221, 413} T6{211, 323, 524} T7{323, 411, 524, 713} T [1] Level 1 MiniSup = 4 Level 1 frequent 1-item itemsets 5{1**} 5 Support {2**} Itemset L[1,1] 29

ALGORITHM: Step 2 {111,121,211,221}T1 {111,211,222}T2 {112,122,221}T3 {111,121}T4 {211}T6 {111,122,211,221}T5 ItemsTID Filtered T [2] 5{1**} 5 Support {2**} Itemset L[1,1] only keep items in L[1,1] from T[1] ItemsetSupport {1**,2**}4 L [1, 2] 30

Level-2 minsup = 3 L[2,1] 4{12*} 4{21*} 5{11*} 4 Support {22*} Itemset L[2,2] 3{12*,22*} 3{11*,21*} 4{11*,22*} 4{11*,12*} 3 Support {21*,22*} Itemset L[2,3] 3{11*,21*,22*} 3{11*,12*,22*} SupportItemset ALGORITHM : Step 3 {111,121,211,221}T1 {111,211,222}T2 {112,122,221}T3 {111,121}T4 {211}T6 {111,122,211,221}T5 ItemsTID Filtered T [2] 31

Algorithm: Level 3 Ops Level 3 Minisup=3 L(3,1) 4{211} 3{221} 4{111} SupportItemset L(3,2) 3{111,211} SupportItemset {111,121,211,221}T1 {111,211,222}T2 {112,122,221}T3 {111,121}T4 {211}T6 {111,122,211,221}T5 ItemsTID Filtered T [2] 32

CONCLUSIONS Extended the association rules from single-level to multiple-level. A top-down progressive deepening technique is developed for mining multiple-level association rules. Filtering of uninteresting association rules 33

FUTURE WORK Can study developing efficient algorithms for mining multiple-level sequential patterns Another interesting issue is the study of mining multiple-level correlations in databases Cross-level associations Interestingness of rules 34

Exam Question 1 Q. What is a major drawback to multiple-level data mining using the same minisup at all levels of a concept hierarchy? A. Large support exists at higher levels of the hierarchy; smaller support at lower levels. In order to insure that sufficiently strong association rules are generated at the lower levels, we must reduce the support at higher levels which, in turn, could result in generation of many uninteresting rules at higher levels. Thus we are faced with the problem of determining which is the optimal minisup at all levels 35

Exam Question 2 Q. What are the 2 pre-requisites to performing multiple-level association rule mining? A. To explore multiple-level association rule mining, one needs to provide: 1) Data at multiple levels of abstraction, and 2) Efficient methods for multiple-level rule mining 36

Exam Question 3 Q. Give an example of a multiple-level association rule. A. Multiple-Level Association Rules operate on a taxonomy or concept hierarchy. At a higher level in the hierarchy one may have a very general rule such as 80% of people who buy bread also buy milk. At a lower level in the hierarchy the rule becomes more specific. For example, 24% of the people who buy Foremost 2% milk also buy Wonderbread 37