Building Global Models from Local Patterns A.J. Knobbe.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Support Vector Machines
Machine learning continued Image source:
Supervised Learning Recap
Minimum Redundancy and Maximum Relevance Feature Selection
Subgroup Discovery Finding Local Patterns in Data.
Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb.
What is Statistical Modeling
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Support Vector Machines Kernel Machines
ROC Curves.
Machine Learning: Symbol-Based
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Data Mining – Intro.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
This week: overview on pattern recognition (related to machine learning)
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classification Heejune Ahn SeoulTech Last updated May. 03.
An Introduction to Support Vector Machines (M. Law)
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
CS690L Data Mining: Classification
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
Building Classifiers from Pattern Teams Knobbe, Valkonet.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Decision Tree Learning
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
MAXIMALLY INFORMATIVE K-ITEMSETS. Motivation  Subgroup Discovery typically produces very many patterns with high levels of redundancy  Grammatically.
Data Mining and Decision Support
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CS 9633 Machine Learning Support Vector Machines
Data Mining – Intro.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Data Mining Lecture 11.
Pawan Lingras and Cory Butz
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Maximally Informative k-Itemsets
Neuro-Computing Lecture 4 Radial Basis Function Network
Classification and Prediction
Chapter 7: Transformations
Feature Selection Methods
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Presentation transcript:

Building Global Models from Local Patterns A.J. Knobbe

Feature-continuum attributes (constructed) features patterns classifiers target concept

Two-phased process Break discovery up into two phases Transform complex problem into more simple one  frequent patterns  correlated patterns  interesting subgroups  decision boundaries  …  frequent patterns  correlated patterns  interesting subgroups  decision boundaries  …  redundancy reduction  dependency modeling  global model building  …  redundancy reduction  dependency modeling  global model building  … Pattern Discovery phase Pattern Combination phase  Pattern Teams  pattern networks  global predictive models  …

Task: Subgroup Discovery Subgroup Discovery: Find subgroups that show substantially different distribution of target concept. top-down search for patterns inductive constraints (sometimes monotonic) evaluation measures: novelty, X 2, information gain also known as rule discovery, correlated pattern mining

Novelty Also known as weighted relative accuracy Balance between coverage and unexpectedness nov(S,T) = p(ST) – p(S)  p(T) between −.25 and.25, 0 means uninteresting TF T F nov(S  T) = p(ST)−p(S)  p(T) =.42 −.297 =.123 subgroup target

Demo Subgroup Discovery redundancy exists in set of local patterns

Demo Subgroup Discovery

Pattern Combination phase Feature selection, redundancy reduction – Pattern Teams Dependency modeling – Bayesian networks – Association rules Global modeling – Classifiers, regression models

Pattern Teams & Pattern Networks

Pattern Teams Pattern Discovery typically produces very many patterns with high levels of redundancy  Report small informative subset with specific properties Promote dissimilarity of patterns reported Additional value of individual patterns Consider extent of patterns – Treat patterns as binary features/items

Intuitions No two patterns should cover same set of examples No pattern should cover complement of another pattern No pattern should cover logical combination of two or more other patterns Patterns should be mutually exclusive The pattern set should lead to the best performing classifier Patterns should lie on convex hull in ROC-space

Quality measures for pattern sets Judge pattern sets on the basis quality function Joint Entropy (miki) Exclusive Coverage Wrapper accuracy Area Under Curve in ROC-space Bayesian Dirichlet equivalent uniform unsupervised supervised

Pattern Teams 82 subgroups discovered 4 subgroups in pattern team

Pattern Network Again, treat patterns as binary features Bayesian networks – conditional independence of patterns Explain relationships between patterns Explain role of patterns in Pattern Team

Demo Pattern Team & Network redundancy removed to find truly divers patterns, in this case using maximization of joint entropy

Demo Pattern Team & Network peak around 89k peak around 16k peak around 39k pattern team, and related patterns can be presented in a bayesian network

Properties of SD phase in PC What knowledge about Subgroup Discovery parameters can be exploited in Combination? Interestingness – Are interesting subgroups diverse? – Are interesting subgroups correlated? Information content Support of patterns

joint entropy of 2 interesting subgroups subgroups are very novel, 1 bit of information subgroups are relatively novel, up to 2 bits of information

correlation of interesting subgroups subgroups are novel, but potentially independent subgroups are very novel, and correlate

Building Classifiers from Local Patterns

Combination strategies How to interpret a pattern set? Conjunctive (intersection of patterns) Disjunctive (union of patterns) Majority vote (equal weight linear separator) … Contingencies/Classifiers

Decision Table Majority (DTM) Treat every truth-assignment as contingency Classification based on conditional probability Use majority class for empty contingencies Only works with Pattern Team (else overfitting)

Support Vector Machine (SVM) SVM with linear kernel Binary data All dimensions have same scale Works with large pattern sets Subgroup discovery has removed XOR-like dependencies Interesting subgroups correlate

XOR-like dependencies

p2p2 p1p1

p2p2 p1p1 (0,0) (1,0) (0,1) (1,1)

Division of labour between 2 phases Subgroup Discovery Phase – Feature selection – Decision boundary finding/thresholding – Multivariate dependencies (XOR) Pattern Combination Phase – Pattern selection – Combination (XOR?) – Class assignment

Combination-aware Subgroup Discovery Better global model Superficially uninteresting patterns can be reported pruning of search space (new rule-measures) subgroups are not novel, team is optimal

Combination-aware Subgroup Discovery Subgroup Discovery ++: Find a set of subgroups that show substantially different distribution of target concept. Considerations – support of pattern – diversity of pattern – …

Conclusions Less hasty approach to model building Interesting patterns serve two purposes – understandable knowledge – building blocks of global model Pattern discovery without combination limited Information exchange between phases Integration of two phases non-trivial