Discriminative Pattern Mining

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Information Management course
Association rule mining
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
COMP 5331: Knowledge Discovery and Data Mining
Market Basket Analysis and Association Rules
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis Techniques for Bioinformatics Problems
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis: Basic Concepts and Algorithms
Data Warehousing Mining & BI
Investigation of sub-patterns discovery and its applications
Market Basket Analysis and Association Rules
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Department of Computer Science National Tsing Hua University
Association Rule Mining
Association Analysis: Basic Concepts
What Is Association Mining?
Presentation transcript:

Discriminative Pattern Mining By Mohammad Hossain

Based on the paper Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data by 1. Gang Fang 2. Gaurav Pandey 3. Wen Wang 4. Manish Gupta 4.Michael Steinbach 5.Vipin Kumar

What is Discriminative Pattern A pattern is said to be Discriminative when its occurrence in two data sets (or in two different classes of a single data set) is significantly different. One way to measure such discriminative power of a pattern is to find the difference between the supports of the pattern in two data sets. When this support-difference (DiffSup) is greater then a threshold the the pattern is called discriminative.

An example Transaction-id Items 10 A, C 20 B, C 30 A, B, C 40 A, B, C, D Transaction-id Items 10 A, B 20 A, C 30 A, B, E 40 A, C, D Pattern Support in D+ Support in D- DiffSup A 3 4 1 B 2 C AB AC ABC If we consider the DiffSup =2 then the pattern C and ABC become interesting patterns.

Importance Discriminative patterns have been shown to be useful for improving the classification performance for data sets where combinations of features have better discriminative power than the individual features For example, for biomarker discovery from case-control data (e.g. disease vs. normal samples), it is important to identify groups of biological entities, such as genes and single-nucleotide polymorphisms (SNPs), that are collectively associated with a certain disease or other phenotypes

As a result, it will not work in Apriori like framework. P1 = {i1, i2, i3} P2 = {i5, i6, i7} P3 = {i9, i10} P4 = {i12, i13, i14}. P C1 C2 DifSup P1 6 P2 P3 3 P4 9 2 7 DiffSup is NOT Anti-monotonic As a result, it will not work in Apriori like framework. P1 P2 P3 P4 i1 i2 i3 i5 i6 i7 i9 i10 i12 i13 i14 1 2 6 7

Apriori: A Candidate Generation-and-Test Approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example Supmin = 2 Itemset sup {A} 2 {B} 3 {C} {D} 1 {E} Database TDB Itemset sup {A} 2 {B} 3 {C} {E} L1 C1 Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E 1st scan C2 Itemset sup {A, B} 1 {A, C} 2 {A, E} {B, C} {B, E} 3 {C, E} C2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} L2 2nd scan Itemset sup {A, C} 2 {B, C} {B, E} 3 {C, E} C3 L3 Itemset {B, C, E} 3rd scan Itemset sup {B, C, E} 2

Pattern Support in D+ Support in D- DiffSup A 3 4 1 B 2 C AB AC ABC But here we see, though the patterns AB and AC both have DiffSup < threshold (2) their super set ABC has DiffSup = 2 which is equal to threshold and thus becomes interesting. So AB, AC cannot be pruned.

BASIC TERMINOLOGY AND PROBLEM DEFINITION Let D be a dataset with a set of m items, I = {i1, i2, ..., im}, two class labels S1 and S2. The instances of class S1 and S2 are denoted by D1 and D2. We have |D| = |D1| + |D2|. For a pattern (itemset) α = {α1,α2,...,αl} the set of instances in D1 and D2 that contain α are denoted by Dα1 and Dα2. The relative supports of α in classes S1 and S2 are RelSup1(α) = |Dα1 |/|D1| and RelSup2(α) = |Dα2 |/}D2| The absolute difference of the relative supports of α in D1 and D2 is denoted as DiffSup(α) = |RelSup1(α) − RelSup2(α)|

New function Some new functions are proposed that has anti-monotonic property and can be used in a apriori like frame work for pruning purpose. One of them is BiggerSup defined as: BiggerSup(α) = max(RelSup1(α), RelSup2(α)). BiggerSup is anti-monotonic and the upper bound of DiffSup. So we may use it for pruning in the apriori like frame work.

BiggerSup is a weak upper bound of DiffSup. For instance, in the previous example if we want to use it to find discriminative patterns with thresold 4, P3 can be pruned, because it has a BiggerSup of 3. P2 can not be pruned (BiggerSup(P2) = 6), even though it is not discriminative (DiffSup(P2) = 0). More generally, BiggerSup-based pruning can only prune infrequent non-discriminative patterns with relatively low support, but not frequent non- discriminative patterns.

A new measure: SupMaxK The SupMaxK of an itemset α in D1 and D2 is defined as SupMaxK(α) = RelSup1(α) − maxβ⊆α(RelSup2(β)), where |β| = K If K=1 then it is called SupMax1 and defined as SupMax1(α) = RelSup1(α) − maxa∈α(RelSup2({a})). Similarly with K=2 we can define SupMax2 which is also called SupMaxPair.

Properties of the SupMaxK Family

Relationship between DiffSup, BiggerSup and the SupMaxK Family

SupMaxPair: A Special Member Suitable for High-Dimensional Data In SupMaxK, as K increases we get more complete set of discriminative patterns. But as K increased the complexity of calculation of SupMaxK also increases. In fact the complexity of calculation of SupMaxK is O(mK). So for high dimensional data (where m is large) high value of K (K>2)makes it infeasible. In that case SupMaxPair can be used.