Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
From Feature Construction, to Simple but Effective Modeling, to Domain Transfer Wei Fan IBM T.J.Watson
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia.
Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun.
Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph.
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Mining Association Rules
Machine Learning: Intro and Supervised Classification
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Minimum Redundancy and Maximum Relevance Feature Selection
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
Reduced Support Vector Machine
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.
An Introduction to Support Vector Machines Martin Law.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Whole Genome Expression Analysis
by B. Zadrozny and C. Elkan
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
An Introduction to Support Vector Machines (M. Law)
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Ensemble Methods in Machine Learning
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining Introduction to Classification using Linear Classifiers
Erich Smith Coleman Platt
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Machine Learning Basics
Waikato Environment for Knowledge Analysis
Introduction Feature Extraction Discussions Conclusions Results
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
Association Rule Mining
A Fast and Scalable Nearest Neighbor Based Classification
Discriminative Frequent Pattern Analysis for Effective Classification
Graph Classification SEG 5010 Week 3.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure How to find good features from semi-structured raw data for classification

Feature Construction Most data mining and machine learning model assume the following structured data: (x 1, x 2,..., x k ) -> y where xis are independent variable y is dependent variable. y drawn from discrete set: classification y drawn from continuous variable: regression When feature vectors are good, differences in accuracy among learners are not much. Questions: where do good features come from?

Frequent Pattern-Based Feature Extraction Data not in the pre-defined feature vectors Transactions Biological sequence Graph database Frequent pattern is a good candidate for discriminative features So, how to mine them?

FP: Sub-graph A discovered pattern NSC 4960 NSC NSC NSC NSC (example borrowed from George Karypis presentation)

Frequent Pattern Feature Vector Representation P 1 P 2 P 3 Data Data Data Data ……… | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name NN DT SVM LR Mining these predictive features is an NP-hard problem. 100 examples can get up to patterns Most are useless

Example 192 examples 12% support (at least 12% examples contain the pattern), 8600 patterns returned by itemsets 192 vs 8600 ? 4% support, 92,000 patterns 192 vs 92,000 ?? Most patterns have no predictive power and cannot be used to construct features. Our algorithm Find only 20 highly predictive patterns can construct a decision tree with about 90% accuracy

Data in bad feature space Discriminative patterns A non-linear combination of single feature(s) Increase the expressive and discriminative power of the feature space An example XYC Data is non-linearly separable in (x, y) x y 1 1

New Feature Space Data is linearly separable in (x, y, F ) Mine & Transform Solving Problem Map Data to a Different Space XYC XY F:x=0, y=0 C x y F ItemSet: F: x=0,y=0 Association rule F: x=0 y=0

Computational Issues Measured by its frequency or support. E.g. frequent subgraphs with sup 10% or 10% examples contain these patterns Ordered enumeration: cannot enumerate sup = 10% without first enumerating all patterns > 10%. NP hard problem, easily up to patterns for a realistic problem. Most Patterns are Non-discriminative. Low support patterns can have high discriminative power. Bad! Random sampling not work since it is not exhaustive. Most patterns are useless. Random sample patterns (or blindly enumerate without considering frequency) is useless. Small number of examples. If subset of vocabulary, incomplete search. If complete vocabulary, wont help much but introduce sample selection bias problem, particularly to miss low support but high info gain patterns

1. Mine frequent patterns (>sup) Frequent Patterns DataSet mine Mined Discriminative Patterns select 2. Select most discriminative patterns; 3. Represent data in the feature space using such patterns; 4. Build classification models. F1 F2 F4 Data Data Data Data ……… represent | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name NN DT SVM LR Conventional Procedure Feature Construction and Selection Two-Step Batch Method

Two Problems Mine step combinatorial explosion Frequent Patterns DataSe t mine 1. exponential explosion 2. patterns not considered if minsupport isnt small enough

Two Problems Select step Issue of discriminative power Frequent Patterns Mined Discriminative Patterns select 3. InfoGain against the complete dataset, NOT on subset of examples 4. Correlation not directly evaluated on their joint predictability

Direct Mining & Selection via Model- based Search Tree Basic Flow Mined Discriminative Patterns Compact set of highly discriminative patterns Divide-and-Conquer Based Frequent Pattern Mining 2 Mine & Select P: 20% Y 3 Y 6 Y + Y Y 4 N Few Data N N + N 5 N Mine & Select P:20% 7 N … … Y dataset 1 Mine & Select P: 20% Most discriminative F based on IG Feature Miner Classifier Global Support: 10*20%/10000 =0.02%

Analyses (I) 1. Scalability (Theorem 1) Upper bound Scale down ratio to obtain extremely low support pat: 2. Bound on number of returned features (Theorem 2)

4. Non-overfitting 5. Optimality under exhaustive search Analyses (II) 3. Subspace is important for discriminative pattern Original set: no-information gain if C 1 and C 0 : number of examples belonging to class 1 and 0 P 1 : number of examples in C 1 that contains a pattern α P 0 : number of examples in C 0 that contains the same pattern α Subsets could have info gain:

Experimental Studies: Itemset Mining (I) Scalability Comparison Datasets#Pat using MbT sup Ratio (MbT #Pat / #Pat using MbT sup) Adult % Chess + ~0% Hypo % Sick % Sonar % 2 Mine & Select P: 20% Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N 2 Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N

Experimental Studies: Itemset Mining (II) Accuracy of Mined Itemsets 4 Wins 1 loss much smaller number of patterns

Experimental Studies: Itemset Mining (III) Convergence

Experimental Studies: Graph Mining (I) 9 NCI anti-cancer screen datasets The PubChem Project. URL: pubchem.ncbi.nlm.nih.gov. Active (Positive) class : around 1% - 8.3% 2 AIDS anti-viral screen datasets URL: H1: CM+CA – 3.5% H2: CA – 1%

Experimental Studies: Graph Mining (II) Scalability 2 Mine & Select P: 20% Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N 2 Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N

Experimental Studies: Graph Mining (III) AUC and Accuracy AUC 11 Wins 10 Wins 1 Loss

AUC of MbT, DT MbT VS Benchmarks Experimental Studies: Graph Mining (IV) 7 Wins, 4 losses

Summary Model-based Search Tree Integrated feature mining and construction. Dynamic support Can mine extremely small support patterns Both a feature construction and a classifier Not limited to one type of frequent pattern: plug-play Experiment Results Itemset Mining Graph Mining Software and Dataset available from: