Farzaneh Mirzazadeh Fall 2007

Slides:

Advertisements

Similar presentations

Association Rules Evgueni Smirnov.

Advertisements

Association Rule Mining

Recap: Mining association rules from large datasets

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Data Mining Association Analysis: Basic Concepts and Algorithms

Rakesh Agrawal Ramakrishnan Srikant

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.

Data Mining Association Analysis: Basic Concepts and Algorithms

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

Association Analysis: Basic Concepts and Algorithms.

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

Data Mining Association Analysis: Basic Concepts and Algorithms

Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,

Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

Fast Algorithms for Association Rule Mining

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

Lecture14: Association Rules

Mining Association Rules

Mining Association Rules

Performance and Scalability: Apriori Implementation.

1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.

1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

Sampling Large Databases for Association Rules Jingting Zeng CIS 664 Presentation March 13, 2007.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

On the Discovery of Interesting Patterns in Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

A Research Oriented Study Report By :- Akash Saxena

Association Rules Repoussis Panagiotis.

Frequent Pattern Mining

Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc

Market Basket Many-to-many relationship between different objects

Dynamic Itemset Counting

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Hash-Based Improvements to A-Priori

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Association Rule Mining

A Parameterised Algorithm for Mining Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad

Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Association Analysis: Basic Concepts and Algorithms

DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004

Association Analysis: Basic Concepts

Presentation transcript:

Farzaneh Mirzazadeh Fall 2007 Sampling Large Databases for Association Rules (Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007

Outline Introduction Preliminaries Definitions, and Problem Statement Two General Approaches Sampling Method for Mining Association Rules The algorithm Analysis Experimental Results

Introduction Problem: Discovery of Association Rules Domain: Very Large Databases Bottleneck: Time Main Memory Processes: Ignorable Disk I/O: An Influential Factor Suggestion: Minimize the Number of Scans of the Database Only One Full Pass Over the Database

Introduction(Con’t) Overview of Toivonen’s Method Main Steps: Pick a random sample from the database. Use the sample to determine all probable association rules. Verify the results with the rest of the database, i.e. Eliminated incorrectly detected association rules and add missing association rules. The Main Contribution: To show that all exact frequencies can be found efficiently, by analyzing first a random sample and then the whole database with the proposed method.

Preliminaries Items Transactions Support of an itemset I={I1,I2,…,Im} Transactions r={t1,t2, …, tn}, tj I Support of an itemset Percentage of transactions which contain that itemset. Frequent Itemsets Association Rules Strong Association Rules

Preliminaries Association Rule: implication X  Y where X,Y  I and X  Y = Ø; Support of Association Rule X  Y: Percentage of transactions that contain X Y Confidence of Association Rule X  Y: Ratio of number of transactions that contain X  Y to the number that contain X Problem: Find the strong association rules of a given set I with respect to threshold min_fr and confidence min_conf.

Algorithms for Mining Association Rules Level-wise Algorithms Idea: If a set is not frequent then its supersets can not be frequent. On level k, candidate itemsets X of size k are generated such that all subsets of X are frequent. Partition Algorithm Idea: Partition the data to sections small enough to be handled in main memory. First Pass: Find locally frequent Itemsets. Second Pass: Union of the local frequent itemsets

Sampling for Frequent Sets Major Steps Random sampling Finding the frequent itemsets of the sample Finding other probable candidates using the concept of Negative Border Using the rest of the database to check the candidates

Negative Border All sets which are not in our frequent itemsets, but all their subsets are. minimal itemsets not in S, where S is the collection of frequent itemsets Example: S = {{A}, {B}, {C}, {F}, {A,B}, {A,C}, {A,F}, {C,F}, {A,C,F}} = {{B, C}, {B, F}, {D}, {E}}

Frequent Set Discovery Intuition: Given a collection S of sets that are frequent, the negative border contains the closest itemsets that could be frequent too. After finding the collection of frequent itemsets, S, we check negative border of S: If no frequent items are added=> We can conclude that all frequent sets are already found. (Why?) Decrease minimum support to increase the chance of success. If at least one frequent itemset is found in negative border => We can conclude that some of its supersets may be frequent.(Why?) In the case of failure, we can either report failure and stop, or scan the database again and check the supersets to find the exact result. Success Failure

Toivonen’s Algorithm

Failure Handling In the fraction of cases where a possible failure is reported, all frequent sets can be found by making a second pass over the database: The algorithm simply computes the collection of all sets that could possibly be frequent.

Analysis of Sampling Sample Size and Probability of Failure

Experimental Results

Conclusion Advantages: Reduced failure probability, while keeping candidate-count low enough for memory Disadvantages: Potentially large number of candidates in second pass

References [1] H. Toivonen, Sampling Large Databases for Association Rules, Proc. of VLDB Conference, India, 1996.

Questions ?

Thank you