Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Decision Tree Rong Jin. Determine Milage Per Gallon.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.
Performance and Scalability: Apriori Implementation.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
A LAZY APPROACH TO ASSOCIATIVE CLASSIFICATION Elena Baralis, Silvia Chiusano, Paolo Garza Dipartimento di Automatica e Informatica, Politecnico di Torino,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Bi-Clustering COMP Seminar Spring 2008.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
A Melody Composer for both Tonal and Non-Tonal Languages
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Association Rules.
EECS 647: Introduction to Database Systems
Waikato Environment for Knowledge Analysis
CARPENTER Find Closed Patterns in Long Biological Datasets
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
Privacy Preserving Data Mining
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis: Basic Concepts and Algorithms
Approximate Frequency Counts over Data Streams
Discriminative Pattern Mining
Frequent-Pattern Tree
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Department of Computer Science National Tsing Hua University
Multiplication of Matrices
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
15-826: Multimedia Databases and Data Mining
Association Analysis: Basic Concepts
Presentation transcript:

Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering Nanyang Technological University, Singapore A joint work with Guimei Liu, Limsoon Wong 13 August 2007

The research problem Input data: x_11 x_12 x_13 x_14 … x_1n …………………………………. x_m1 x_m2 x_m3 x_m4 … x_mn n features (order of 1000) m samples class P N gene1 gene2 gene3 gene4 … gene_n

Objectives To discover Which itemsets are statistically important to separate these different classes? Which itemsets are redundant? The concise representation. Test statistics: odds ratio, relative risk, student’s-t, chi-square, etc. Output: a ranking list of equivalence classes under some statistical test. <generators, closed pattern>

… New problem Not an enumeration of frequent itemsets Not an enumeration of solely closed patterns Not an enumeration of solely generators Not a simple sum of the closed patterns and generators The output is: Closed pattern Its generators Closed pattern Its generators Closed pattern Its generators …

Contribution Depth-first search of closed patterns and their associated generators in parallel A unified approach regardless of the variety of the test statistics Easy to handle multiple classes of data Not one-vs-one style Not all-vs-all style (exhaustive pairwise, like in SVM)

A data set

Frequent itemsets (patterns) Support threshold = 2 A total of 16 Freq. patterns

Equivalence classes The empty set ({}:5); its tid-set={T1 … T5} {b:4, e:4, be:4}; its tid-set={T2…T5} {c:4}; its tid-set={T1, T2, T3, T5} {a:3, ac:3}; its tid-set={T1, T3, T5} {bc:3, ce:3, bce:3} {ab, ae, abc, abe, ace, abce} An EC is a set of itemsets which always occur in the same set of transactions.

Closed Patterns and Generators A closed pattern is the maximal pattern of an equivalence class; the minimal ones are called generators. Support threshold = 2

An example

Observation 1

Observation 2

Revised FP-tree for pruning non-generators

To identify closed patterns in parallel (1) Tail structure is added (2) Store all full-support items

An option to find Delta-discriminative equivalence classes Non-redundant Actually, they are emerging patterns But equivalence of the minimal EPs is identified Unknown before

Performance comparison

Conclusion Useful for Classification problems, in particular multiple-class classification problems Risk factors assessment for financial market analysis Bioinformatics: evaluation of motifs/signature patterns