Exploiting the Power of Group Differences to Solve Data Analysis Problems Classification Guozhu Dong, PhD, Professor CSE guozhu.dong@wright.edu.

Slides:

Advertisements

Similar presentations

Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.

Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Decision Tree Algorithm

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Ensemble Learning: An Introduction

Lecture 5 (Classification with Decision Trees)

Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.

Classification.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Basic Data Mining Technique

Benk Erika Kelemen Zsolt

MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Evaluating Results of Learning Blaž Zupan

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

COT6930 Course Project. Outline Gene Selection Sequence Alignment.

Data Mining and Decision Support

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Introduction to Machine Learning, its potential usage in network area,

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

7. Performance Measurement

Who am I? Work in Probabilistic Machine Learning Like to teach 

How to forecast solar flares?

Semi-Supervised Clustering

Advanced data mining with TagHelper and Weka

Rule Induction for Classification Using

Trees, bagging, boosting, and stacking

Chapter 6 Classification and Prediction

Evaluating Results of Learning

Perceptrons Lirong Xia.

Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:

Dipartimento di Ingegneria «Enzo Ferrari»,

Data Mining (and machine learning)

Classification and Prediction

Features & Decision regions

REMOTE SENSING Multispectral Image Classification

Evaluation and Its Methods

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Discriminative Frequent Pattern Analysis for Effective Classification

Classification and Prediction

Classification in Complex Systems

CSCI N317 Computation for Scientific Applications Unit Weka

Machine Learning in Practice Lecture 23

Statistical Learning Dong Liu Dept. EEIS, USTC.

Ensemble learning.

Learning Chapter 18 and Parts of Chapter 20

Data Mining Class Imbalance

Evaluation and Its Methods

©Jiawei Han and Micheline Kamber

Machine Learning with Clinical Data

Roc curves By Vittoria Cozza, matr

Evaluation and Its Methods

Exploiting the Power of Group Differences to Solve Data Analysis Problems Guozhu Dong, PhD, Professor CSE

Exploiting the Power of Group Differences to Solve Data Analysis Problems Outlier & Intrusion Detection Guozhu Dong, PhD, Professor CSE

Perceptrons Lirong Xia.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Exploiting the Power of Group Differences to Solve Data Analysis Problems Classification Guozhu Dong, PhD, Professor CSE guozhu.dong@wright.edu

Where Are We Now Introduction and overview Preliminaries Emerging patterns: definitions and mining Using emerging patterns as features and regression terms Classification using emerging patterns Clustering and clustering evaluation using emerging patterns Outlier and intrusion detection using emerging patterns Ranking attributes for problems with complex multi-attribute interactions using emerging patterns Pattern aided regression and classification Interesting applications of emerging patterns

Emerging Pattern Based Classification Preliminaries on classification CAEP: Classification by Aggregating Power of EPs Also, handling imbalanced classification with score normalization DeEPs: Instance based Classification using EPs ECP: Using CAEP on tiny training dataset for lead compound optimization Why and how EPs are useful for classification Note: This part is using EPs as basic conditions or as basic classifiers; later we use EPs as subpopulation handles for pattern aided regression and classification. Guozhu Dong 2019

Classification: The Problem Want to build a classification model to accurately predict categorical class labels on data instances The classification model is built using a training dataset having class labels Mathematically, a classifier (model) is a function mapping tuples to classes. There are many ways to specify classifier models, many methods to build classification models, and many measures to evaluate classifier model performance. Typical applications Credit/loan approval/denial Medical diagnosis: is a tissue cancerous or benign? Fraud detection: is a transaction fraudulent? Web page categorization: which category a page belongs to Guozhu Dong 2019

Classifier Performance Evaluation Measures 1: Positive, 0: Negative A classifier’s predictions PredP PredN TrueP TP:2 FN:2 P:4 TrueN FP:1 TN:1 N:2 A1 A2 A3 C PredC 1 2 Also AUC of ROC Guozhu Dong 2019

Challenges High dimensional data Imbalanced classification Hard datasets Complex interactions Small disjuncts Complicated class boundaries Many classes * Small disjuncts are patterns covering a very small number of examples Guozhu Dong 2019

Why and How EPs Are Useful for Classification Discriminative patterns are useful They can capture complex interactions They can capture small disjuncts They can be mined even for tiny training dataset They are highly predictive (on data they match) Guozhu Dong 2019

CAEP: Classification by Aggregating Power of EPs We want a classification method that uses a fairly complete set of discriminative patterns combines the discriminative power of multiple matching discriminative patterns using all matching patterns in the pattern set of the model [Dong+Zhang et al 99] Guozhu Dong 2019

CAEP Description If we only have JEPs, then score(Ci,T) = sum of the supports of the matching JEPs.

CAEP Illustration We have two classes: P and N We have 200 EPs for P, and 150 EPs for N Want to classify an instance t Suppose: 3 EPs for P match t, and 2 EPs for N match t Patterns matching t has the following characteristics Score(P,t) = 0.3*5/(5+1)+0.1*10/11+0.05=0.39 Score(P,t) = 0.2*7/8 + 0.05 * 20/21= 0.22 If normalization is not performed, t is predicted as belonging to P Class P sup GrowRate Class N P1 0.3 5 Q1 0.2 7 P2 0.1 10 Q2 0.05 20 P3 infinite Guozhu Dong 2019

Strength of a Matching Pattern in Score sup(P)*GrowthRate(P) / (GrowthRate(P)+1) sup(P): patterns matching more instances have larger impact GrowthRate(P) / (GrowthRate(P)+1): patterns with larger growth rate have larger impact GrowthRate(P) / (GrowthRate(P)+1) ~=1 if GrowthRate(P) is very large. It is 1 if P is a JEP Guozhu Dong 2019

Normalizing Scores for Imbalanced Data Data sizes of different classes are imbalanced. Numbers of patterns of different classes also imbalanced. CAEP handles this problem by normalizing (dividing) score(Ci,·) for each class Ci using a fixed percentile (e.g. 85 percentile) of the bag of scores {score(Ci,x) | x in Ci}. Let score’(Ci,t) denote normalized score CAEP assigns an instance t to the class Cj where the normalized score is the largest: score’(Cj,t) = max {score’(C1,t), score’(C2,t)} If numbers of EPs for the classes are small, we can divide the scores for a class by the sum of sup*GrowthRate for all patterns in the class . [Auer et al 2016] Guozhu Dong 2019

Factors Influencing Selection of Emerging Patterns The number of candidate emerging patterns for selection can be large. The following factors are important in pattern selection Support of patterns (individually) GrowthRate of patterns (individually) mds overlap among patterns GrowthRate similarity and support similarity, among patterns with similar mds Guozhu Dong 2019

Most Expressive Jumping Emerging Patterns Observation: If P and Q are two JEPs for a class Ci such that P  Q, then P is more expressive (it matches more instances: it matches all instances that match Q and may match more) P and Q give the same confidence (signal) for assigning instances t matching them to Ci So we call the minimal JEPs (in the set containment sense) the most expressive JEPs Minimal JEPs were used to build powerful classifiers in several methods. [Li+Dong+Kotagiri 2001] Guozhu Dong 2019

Performance of CAEP Experiments show that CAEP Has good performance Is noise tolerant We will point out additional advantages later. Note: No free lunch theorem says that no method is best all the time. Guozhu Dong 2019

Comparison with Other Methods Set of rules used: The Decision Trees algorithms are very greedy in selecting attributes for nodes. So they are very greedy in selecting the set of rules they use. They use a very small subset of all possible rules. The CBA method also uses a very small set of rules [Liu et al 98]. CAEP uses many more discriminate patterns How rules are used: Both CBA and Decision Trees only use one rule to classify an instance. They do not let (multiple) rules vote. CAEP combines the discriminative power of all matching patterns. Guozhu Dong 2019

Many Interesting Applications for CAEP Sequence classification for gene start site prediction Activity recognition from image data with applications to senior home monitoring/care Lead compound optimization in drug candidate selection – chemoinformatics …… Guozhu Dong 2019

Many New Classification Methods are CAEP-Like CMAR [Li+Han+Pei 2001], combining multiple rules using rule with maximum normalized Ch2 CPAR [Yin+Han 2003], combining multiple rules using average accuracy of best k matching rules for each class Causal Associative Classification [Yu+Wu et al 2009], using Markov blanket to select patterns & using CAEP to score. Uses fewer patterns. They differ on how to mine patterns, how to select patterns, how to aggregate the discriminative power of matching patterns Many related methods exist. They belong to the families of “associative classifiers” and “rule/pattern based classifiers” [Yu+Wu et al 2009] Guozhu Dong 2019

DeEPs: Instance based Classification using EPs [Li+Dong et al 2004] Training data: C1, C2 [can generalize to >2 classes] For each case t to be classified, let Data(Ci,t) be the projection of Ci onto t – remove all attribute values not occurring in t mine JEPs for each class C1, C2, resulting in EP sets LazyEPs(C1,t) and LazyEPs(C2,t) let LazyData(Ci,t) be the subset Data(Ci,t) matching some JEP in LazyEPs(Ci,t) use the size of LazyData(C1,t) and LazyData(C2,t) as percentage of C1 and C2 to decide t’s class The matching-size based scoring also handles the imbalanced class problem. Guozhu Dong 2019

ECP: Using CAEP on Tiny Training Dataset for Lead Compound Optimization Drug design often requires molecule design & molecule selection Need to search for highly potent molecule structures, for use as drug candidates Researchers use some intuition to guide the search EG: Pick a set of likely +ve and likely –ve examples; use this set as basis to find the most potent (not necessarily in the picked set) One approach is to use the training set to build a classifier to predict potency of other molecules. Picking training set is labor intensive. Want to pick small training sets. EG 3+ves vs 3-ves, 5+ves vs 5-ves, 10+ves vs 10-ves [Auer+Bajarath 2006, 2008] ECP: Emerging Chemical Pattern Guozhu Dong 2019

avg 0.697 0.575 0.5 0.705 0.568 0.605 0.731 0.587 0.613 avg 0.697 0.575 0.5 0.705 0.568 0.605 0.731 0.587 0.613 BIN: Binary QSAR; DT: Decision Trees; ECP: CAEP using Emerg Chem Patts

Simulated Lead Optimization using CAEP [Auer and Bajorath (2006)] used an iterative procedure for Simulated Lead Optimization, exploiting strength of CAEP with small training data During each iteration, they randomly selected small sets of compounds from the current set of test compounds, got their potency, and divided them into a high potency class and a low potency class (using their mean potency value as the threshold). K=3 or 5 examples per class. This compound set was then used to train the ECP (CAEP) classifier to distinguish higher from lower potency compounds. The class label of remaining test compounds was predicted, assigning each test compound to the high or low potency class. All compounds predicted to have low potency were then removed from the test set; only compounds classified as highly potent were retained for the next iteration. The final (enriched) set after 100s iterations should be highly potent. See figures on next slide. Guozhu Dong 2019

Simulated Lead Optimization using Iterative CAEP Boxpolts for 500 runs of the iterative process. In pharmacology, potency is a measure of drug activity expressed in terms of amount needed to produce effect of given intensity. Want molecules with small IC50