Mining Association Rules from Stars

Slides:

Advertisements

Similar presentations

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Advertisements

A distributed method for mining association rules

Frequent Closed Pattern Search By Row and Feature Enumeration

1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.

FP-Growth algorithm Vasiljevic Vladica,

FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts and Algorithms.

Data Mining Association Analysis: Basic Concepts and Algorithms

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Performance and Scalability: Apriori Implementation.

林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.

Sequential PAttern Mining using A Bitmap Representation

Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.

Mining High Utility Itemset in Big Data

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

Association Analysis (3)

Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.

On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.

Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.

Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.

Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.

Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.

CFI-Stream: Mining Closed Frequent Itemsets in Data Streams

Reducing Number of Candidates

Data Mining Association Analysis: Basic Concepts and Algorithms

Frequent Pattern Mining

Byung Joon Park, Sung Hee Kim

Chapter 6 Tutorial.

Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc

CARPENTER Find Closed Patterns in Long Biological Datasets

Dynamic Itemset Counting

Data Mining Association Analysis: Basic Concepts and Algorithms

Clustering Categorical Data Using Summaries

Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,

Mining Frequent Itemsets over Uncertain Databases

DIRECT HASHING AND PRUNING (DHP) ALGORITHM

A Parameterised Algorithm for Mining Association Rules

Association Rule Mining

Farzaneh Mirzazadeh Fall 2007

Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004.

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

732A02 Data Mining - Clustering and Association Analysis

Frequent-Pattern Tree

Geometrically Inspired Itemset Mining*

Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.

Maintaining Frequent Itemsets over High-Speed Data Streams

DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004

Finding Frequent Itemsets by Transaction Mapping

15-826: Multimedia Databases and Data Mining

K.L Ong, W. Li, W.K. Ng, and E.P. Lim

Presentation transcript:

Mining Association Rules from Stars Department of Information & Computer Education, NTNU Mining Association Rules from Stars Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02), December 09 - 12 2002, Maebashi City, Japan. Advisor：Jia-Ling Koh Speaker：Chen-Yi Lin

Outline Introductions Problem Definition The Proposed Method Department of Information & Computer Education, NTNU Outline Introductions Problem Definition The Proposed Method Experimental Results Conclusions

Department of Information & Computer Education, NTNU Introductions In real life, a database is typically made up of multiple tables and one important case is where some of the tables form a star schema. Dimension table Fact table (FT)

Problem Definition (1/2) Department of Information & Computer Education, NTNU Problem Definition (1/2) Dimension table contains primary key (tid), some other attributes and no foreign keys. The attributes in the dimension tables are unique. The attributes take categorical values. Fact table (FT) stores the tids from dimension tables as foreign keys.

Problem Definition (2/2) Department of Information & Computer Education, NTNU Problem Definition (2/2) categorical value tid Dimension table and its binary representation

The Proposed Method (1/8) Department of Information & Computer Education, NTNU The Proposed Method (1/8) tid_list is an ordered list of elements of the form tid(count). : e.g.

The Proposed Method (2/8) Department of Information & Computer Education, NTNU The Proposed Method (2/8) Minsup=5 count=6 count=5 Hence the itemset is frequent

The Proposed Method (3/8) Department of Information & Computer Education, NTNU The Proposed Method (3/8) Binding multiple Dimension Tables (1) To assign each combination of tid from A and tid from B in FT a new tid (2) and to set the tid in the tid_lists for items in AB to the corresponding new tid.

The Proposed Method (4/8) Department of Information & Computer Education, NTNU The Proposed Method (4/8) The set of frequent itemsets with items from tables A and/or B The set of frequent itemsets with items from tables A An example of “binding” order

The Proposed Method (5/8) Department of Information & Computer Education, NTNU The Proposed Method (5/8) (1) (2)

The Proposed Method (6/8) Department of Information & Computer Education, NTNU The Proposed Method (6/8) The fact table FT is scanned once and the information is stored into a data structure Prefix Tree each node has a label (a tid) and a counter.

The Proposed Method (7/8) Department of Information & Computer Education, NTNU The Proposed Method (7/8) counter tid Prefix tree structure representing

The Proposed Method (8/8) Department of Information & Computer Education, NTNU The Proposed Method (8/8) Collapsing the prefix tree

Experimental Results (1/5) Department of Information & Computer Education, NTNU Experimental Results (1/5) All experiments are conducted on SUN Ultra-Enterprise Generic_106541-18 with SunOS 5.7 and 8192MB Main Memory. Programs are written in C++.

Experimental Results (2/5) Department of Information & Computer Education, NTNU Experimental Results (2/5) In the first dataset, items in A and B are strongly related, such that frequent itemsets contain items across A and B, while items in C are not involved. In the second dataset, items in A, B and C are all strongly related, so that maximal frequent itemsets always contain items from all of A, B and C.

Experimental Results (3/5) Department of Information & Computer Education, NTNU Experimental Results (3/5) masl: implementing tid_list as a linked list structure masb: implementing tid_list as a fixed-size bitmap and an array of count fpt: the join-before-mine approach with FP-tree algorithm [HPY00] Running time for (A, B) related and (A, B, C) related datasets

Experimental Results (4/5) Department of Information & Computer Education, NTNU Experimental Results (4/5) Mixture datasets 10% of transactions contain frequent itemsets from only A, B, C, respectively. 15% contain frequent itemsets from AB, BC, AC, respectively. 10% contain frequent itemsets from ABC. 15% are random noise.

Experimental Results (5/5) Department of Information & Computer Education, NTNU Experimental Results (5/5) Running time for mixture datasets

Department of Information & Computer Education, NTNU Conclusions In the paper, the proposed method is a new algorithm for mining association rules on a star schema without performing the natural join. The proposed method can be generalized to be applied to a snowflake structure.