Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Data Mining Techniques Association Rule

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

gSpan: Graph-based substructure pattern mining

Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.

Mining Multiple-level Association Rules in Large Databases

Mining Multiple-level Association Rules in Large Databases Authors : Jiawei Han Simon Fraser University, British Columbia. Yongjian Fu University of Missouri-Rolla,

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Rakesh Agrawal Ramakrishnan Srikant

Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Automated rule Generation Maryam Mustafa Sarah Karim

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

DATA MINING -ASSOCIATION RULES-

Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

Association Rules Olson Yanhong Li. Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large.

Fast Algorithms for Association Rule Mining

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

Mining Association Rules

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.

Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.

ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Ch5 Mining Frequent Patterns, Associations, and Correlations

1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.

Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.

Mining various kinds of Association Rules

A New Method to Forecast Enrollments Using Fuzzy Time Series and Clustering Techniques Kurniawan Tanuwijaya 1 and Shyi-Ming Chen 1, 2 1 Department of Computer.

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者：林靜怡.

Association Rule Mining

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering,

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Data Mining  Association Rule  Classification  Clustering.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

1 Top Down FP-Growth for Association Rule Mining By Ke Wang.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.

Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.

Gspan: Graph-based Substructure Pattern Mining

Byung Joon Park, Sung Hee Kim

Discriminative Frequent Pattern Analysis for Effective Classification

Market Basket Analysis and Association Rules

FP-Growth Wenlong Zhang.

Presentation transcript:

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang 報告人 : Huai-Ping Chu 2008/11/15

Outline Abstract Introduction Review of related mining algorithms The proposed algorithm An example Conclusion

Abstract In real applications, different items may have different support criteria to judge their importance, taxonomic relationships among items may appear, and data may have quantitative values. A fuzzy multiple-level mining algorithm for extracting knowledge implicit in quantitative transactions with multiple minimum supports of items is proposed to derive large itemsets and discover cross-level fuzzy association rules under the maximum-itemset minimum-taxonomy support constraint.

Introduction An association rule is expressed as the form A  B, where A and B are sets of items, such that the presence of A in a transaction will imply the presence of B in the same transaction. Srikant & Agrawal proposed a method for mining association rules from data sets using quantitative and categorical attributes. Hong et al. proposed a fuzzy mining algorithm for managing quantitative data.

Introduction (cont.) Liu et al. proposed an approach for mining association rules with non-uniform minimum support values, which allowed users to specify different minimum supports to different items and used the lowest minimum support among all the items in the itemset as the minimum support value of the itemset. Lee, Hong & Lin proposed a simple and efficient algorithm based on the apriori approach to generate large itemsets under the maximum constraints of multiple minimum supports.

Introduction (cont.) Han et al. and Agrawal et al. proposed respectively algorithms to discover association rules on multiple- level taxonomic relationships among items. This paper thus proposes a fuzzy multiple-level mining algorithm with multiple supports of items for extracting implicit knowledge from transactions stored as quantitative values, which integrates fuzzy- set concepts, data-mining technologies and multiple- level taxonomy to find fuzzy association rules.

Review of related mining algorithms Mining multiple-level association rules. Mining association rules with multiple minimum supports.

1. Mining multiple-level association rules Relevant item taxonomies are usually predefined in real-word applications and can be represented as hierarchy tree. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed from lower-level nodes.

The method of Han & Fu : Nodes in predefined taxonomies are first encoded using sequences of numbers and the symbol “ * ” according to their positions in the hierarchy tree. (1**) (11*) (111)(112) (12*) (2**) (21*) (22*) (211)(212)

A top-down progressively deepening search approach is used and exploration of “ level-crossing ” association relationships is allowed. Candidate itemsets at certain levels may thus contain items at lower levels. EX: Large items at level 2 may be paired with large items at level 1 to form candidate 2-itemsets at level 2 (such as {11*,2**}).

2. Mining association rules with multiple minimum supports Liu et al. proposed an approach for mining association rules with non-uniform minimum support values, allowing users to specify different minimum supports to different items. The minimum support value of an itemset is defined as the lowest minimum supports among the items in the itemset.

The minimum support of an item means that the occurrence frequency of the item must be larger than or equal to it for being considered in the next mining steps. If the support of an item is not larger than or equal to the support threshold, the item is not worth considering. When the minimum support value of an itemset is defined as the lowest minimum supports of the items in it, the itemset may be large, but items included int it may be small.

EX : Minimum support of item A is 20%. Minimum support of item B is 40%. If the support of item B is 30%, smaller than its minimum support 40%, and then the 2-itemset {A,B} should note be worth considering. It is meaningful to assign the minimum support of an itemset as the maximum of the minimum supports of the items contained in the itemset.

The proposed algorithm The mining algorithm for fuzzy multiple-level association rules under the maximum-itemset minimum-taxonomy support constraint of multiple minimum supports: INPUT: A set of quantitative transaction data, a taxonomy with the primitive items assigned their own minimum supports, a set of of membership functions, and a minimum confidence value. OUTPUT: A set of fuzzy multiple-level association rules under maximum constraints of multiple minimum supports.

Step 1: Encode the taxonomy using a sequence of numbers and the symbol “ * ”. Step 2: Translate the item names in the transaction data according to the encoding schema. Step 3: Group the items with the same first k in each transaction D i, and add the amounts of the items in the same groups in D i.

Step 4: Calculate the occurring count of each group in all the transactions. Remove the group with their counts less than their respective support thresholds. Step 5: Transform the quantitative value of each remaining group in each transaction data into a fuzzy set f ij represented as (f k ij1 /R k j1 + f k ij2 /R k j2 + … + f k ijh /R k jh ), k is the level number, h is the number of fuzzy regions for I k j.

Step 6: Collect the fuzzy regions (linquistic terms) with membership values ＞ 0 to form the candidate set C k 1. Step 7: Check whether the value count k jl of each region R k jl in C k 1 ≧ the threshold, which is the minimum of minimum supports of the primitive items desceding from it. If R k jl satisfies the threshold, put it into the large 1-itemset (L k 1 ) for level k.

Step 8: Generate the candidate set C k 2 from L 1 1, L 2 1, …, L k 1 to find “level-crossing” large itemsets with satisfying following condition: Each 2-itemset in C k 2 must contain at least one item in L k 1. The two regions in a 2-itemset may not have the same item name. The two item names in a 2-itemset may not be with the hierarchy relation in the taxonomy. Both of the support values of the two large 1-itemsets comprising a candidate 2-itemset must ≧ the maximum of the minimum supports of the two large 1-itemsets.

Step 9: Do the following substeps for each newly formed candidate 2-itemset s with regins(s 1, s 2 ) in C k 2 : Calculate the fuzzy value of s in each transaction D i as f is = f is1 Λ f is2 Calculate the scalar cardinality of s in all the transaction data as count s = Σf is If count s ≧ the maximum of the minimum supports of the items contained in it, put s into L k 2.

Step 10: Repeat above similar steps and generate all large q-itemset. Step 11: Construct the fuzzy association rules for the q-itemset by the following substeps: Form all possible association rules as follows: S 1 Λ … Λ S r-1 Λ S r+1 Λ … Λ S q  S r r=1 to q Calculate the confidence values of all association rules by

Step 12: Output the rules with confidence values ≧ the predefined confidence value.

An example

All possible association rules are formed as follows: If 2** = Middle, then 3** = Middle; If 3** = Middle, then 2** = Middle; If 21* = Middle, then 22* = Low; If 22* = Low, then 21* = Middle; If 22* = Low, then 32* = Middle; If 32* = Middle, then 22* = Low.

The confidence of the above association rules are calculated – If 2** = Middle, then 3** = Middle, with conf = If 3** = Middle, then 2** = Middle, with conf = If 21* = Middle, then 22* = Low, with conf = If 22* = Low, then 21* = Middle, with conf = 046. If 22* = Low, then 32* = Middle, with conf = If 32* = Middle, then 22* = Low, with conf = 1.0.

Assume the confidence is set at 0.8 in this example. The following three association rules are generated. If 21* = Middle, then 22* = Low, with conf = If 22* = Low, then 32* = Middle, with conf = If 32* = Middle, then 22* = Low, with conf = 1.0.

Conclusion  This algorithm offers an solution for three issues that usually occur in real mining application: using different criteria to judge the importance of different items, managing taxonomic relationships among items, and dealing quantitative data sets.  In this algorithm, the minimum support for an item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it and the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset.

THANK YOU !!