Identifying Interesting Association Rules with Genetic Algorithms

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Fast Algorithms for Association Rule Mining
Research Project Mining Negative Rules in Large Databases using GRD.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Genetic Algorithm.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
The Generational Control Model This is the control model that is traditionally used by GP systems. There are a distinct number of generations performed.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
9/03Data Mining – Association G Dong (WSU) 1 5. Association Rules Market Basket Analysis APRIORI Efficient Mining Post-processing.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association Rule Mining
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
Genetic Algorithms. The Basic Genetic Algorithm 1.[Start] Generate random population of n chromosomes (suitable solutions for the problem) 2.[Fitness]
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Genetic Algorithm (Knapsack Problem)
Data Mining – Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
Association Rule Mining
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Analysis: Basic Concepts and Algorithms
CSE4334/5334 Data Mining Lecture 15: Association Rule Mining (2)
Presentation transcript:

Identifying Interesting Association Rules with Genetic Algorithms Elnaz Delpisheh York University Department of Computer Science and Engineering April-10-17

Data mining I = {i1,i2,...,in} is a set of items. Too much data Data Data Mining I = {i1,i2,...,in} is a set of items. D = {t1,t2,...,tn} is a transactional database. ti is a nonempty subset of I. An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ . Apriori algorithm is mostly used for association rule mining. {milk, eggs}{bread}. Association rules There exist other algorithms apart from apriori such as Fp-growth.

Apriori Algorithm TID List of item IDs T100 I1,I2,I3 T200 I2, I4 T300 I1, I2, I3, I5 T900 I1, I2, I3

Apriori Algorithm (Cont.)

Association rule mining Too much data Data Data Mining Too many association rules Association rules

Interestingness criteria Comprehensibility. Conciseness. Diversity. Generality. Novelty. Utility. ...

Interestingness measures Subjective measures Data and the user’s prior knowledge are considered. Comprehensibility, novelty, surprisingness, utility. Objective measures The structure of an association rule is considered. Conciseness, diversity, generality, peculiarity. Example: Support It represents the generality of a rule. It counts the number of transactions containing both A and B.

Drawbacks of objective measures Detabase-dependence Lack of knowledge about the database Threshold dependence Solution Multiple database reanalysis Problem Large number of disk I/O Problem with Multiple database reanalysis is that, some databases are simply large. Association rule mining must confront exponential search spaces. Detabase-independence This approach does not require users to specify thresholds. Instead of generating unknown number of interesting rules like the traditional models, only the most interesting rules are extracted according to the interestingness measure as defined by the fitness function! Detabase-independence

Genetic algorithm-based learning (ARMGA ) Initialize population Evaluate individuals in population Repeat until a stopping criteria is met Select individuals from the current population Recombine them to obtain more individuals Evaluate new individuals Replace some or all the individuals of the current population by off-springs Return the best individual seen so far Usually genetic algorithm for rule mining are divided into 2 groups according to their encoding of rules in the population of chromosomes. -Michigan Approach Many ppl have used this approach. However, if the number of rules is too many, this approach is impractical. -Pittsburgh Approach

ARMGA Modeling Given an association rule XY Requirement Conf(XY) > Supp(Y) Aim is to maximise Conf(XY) > Supp(Y), since we are only interested in positive rules.

ARMGA Encoding Michigan Strategy Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅. For example {A1,...,Aj}{Aj+1,...,Ak} Michigan Approach Each rule is encoded into an individual Pittsburgh Approach A set of rules are encoded into a chromosome.

ARMGA Encoding (Cont.) The aforementioned encoding highly depends on the length of the chromosome. We use another type of encoding: Given a set of items {A,B,C,D,E,F} Association rule ACFB is encoded as follows 00A11B00C01D11E00F 00: Item is antecedent 11: Item is consequence 01/10: Item is absent

ARMGA Operators Select Crossover Mutation

ARMGA Operators-Select Select(c,ps): Acts as a filter of the chromosome C: Chromosome Ps: pre-specified probability

ARMGA Operators-Crossover This operation uses a two-point strategy

ARMGA Operators-Mutate

ARMGA Initialization

ARMGA Algorithm

Empirical studies and Evaluation Implement the entire procedure using Visual C++ Use WEKA to produce interesting association rules Compare the results