Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.

Slides:

Advertisements

Similar presentations

COMP3740 CR32: Knowledge Management and Adaptive Systems

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

Decision Tree Approach in Data Mining

Classification Techniques: Decision Tree Learning

Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.

Decision Tree Rong Jin. Determine Milage Per Gallon.

Induction of Decision Trees

1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Lecture 5 (Classification with Decision Trees)

Three kinds of learning

Classification.

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.

Ensemble Learning (2), Tree and Forest

Machine Learning CS 165B Spring 2012

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Learning from Observations Chapter 18 Through

Using Emerging Patterns to Analyze Gene Expression Data Jinyan Li BioComputing Group Knowledge & Discovery Program Laboratories for Information Technology.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read “Rule-Based Data Mining Methods for Classification.

CS690L Data Mining: Classification

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Lecture Notes for Chapter 4 Introduction to Data Mining

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.

Solving the Fragmentation Problem of Decision Trees by Discovering Boundary Emerging Patterns Jinyan Li and Limsoon Wong Speaker: Sarah Chan CSIS DB Seminar.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.

Data Science Credibility: Evaluating What’s Been Learned

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Induction of Decision Trees

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Classification with Gene Expression Data

Chapter 18 From Data to Knowledge

Prepared by: Mahmoud Rafeek Al-Farra

Artificial Intelligence

Trees, bagging, boosting, and stacking

Chapter 6 Classification and Prediction

Data Science Algorithms: The Basic Methods

COMP61011 : Machine Learning Ensemble Models

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Decision Tree Saed Sayad 9/21/2018.

Classification and Prediction

Advanced Artificial Intelligence

Introduction to Data Mining, 2nd Edition

Statistical Learning Dong Liu Dept. EEIS, USTC.

Ensemble learning.

Learning Chapter 18 and Parts of Chapter 20

Ensemble learning Reminder - Bagging of Trees Random Forest

INTRODUCTION TO Machine Learning 2nd Edition

©Jiawei Han and Micheline Kamber

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

A task of induction to find patterns

A task of induction to find patterns

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Part 2: Rule-Based Approaches

Copyright © 2004 by Jinyan Li and Limsoon Wong Outline Overview of Supervised Learning Decision Trees Ensembles –Bagging –Boosting –Random forest –Randomization trees –CS4

Copyright © 2004 by Jinyan Li and Limsoon Wong Overview of Supervised Learning

Copyright © 2004 by Jinyan Li and Limsoon Wong Computational Supervised Learning Also called classification Learn from past experience, and use the learned knowledge to classify new data Knowledge learned by intelligent algorithms Examples: –Clinical diagnosis for patients –Cell type classification

Copyright © 2004 by Jinyan Li and Limsoon Wong Data Classification application involves > 1 class of data. E.g., –Normal vs disease cells for a diagnosis problem Training data is a set of instances (samples, points) with known class labels Test data is a set of instances whose class labels are to be predicted

Copyright © 2004 by Jinyan Li and Limsoon Wong Notation Training data {  x 1, y 1 ,  x 2, y 2 , …,  x m, y m  } where x j are n-dimensional vectors and y j are from a discrete space Y. E.g., Y = {normal, disease}. Test data {  u 1, ? ,  u 2, ? , …,  u k, ? , }

Training data: X Class labels Y f(X) A classifier, a mapping, a hypothesis Test data: U Predicted class labels f(U) Copyright © 2004 by Jinyan Li and Limsoon Wong Process

x 11 x 12 x 13 x 14 … x 1n x 21 x 22 x 23 x 24 … x 2n x 31 x 32 x 33 x 34 … x 3n …………………………………. x m1 x m2 x m3 x m4 … x mn n features (order of 1000) m samples class PNPNPNPN gene 1 gene 2 gene 3 gene 4 … gene n Copyright © 2004 by Jinyan Li and Limsoon Wong Relational Representation of Gene Expression Data

Copyright © 2004 by Jinyan Li and Limsoon Wong Features Also called attributes Categorical features –feature color = {red, blue, green} Continuous or numerical features –gene expression –age –blood pressure Discretization

An Example Copyright © 2004 by Jinyan Li and Limsoon Wong

Biomedical Financial Government Scientific Decision trees Emerging patterns SVM Neural networks Classifiers (M-Doctors) Copyright © 2004 by Jinyan Li and Limsoon Wong Overall Picture of Supervised Learning

Copyright © 2004 by Jinyan Li and Limsoon Wong Evaluation of a Classifier Performance on independent blind test data K-fold cross validation: Given a dataset, divide it into k even parts, k-1 of them are used for training, and the rest one part treated as test data LOOCV, a special case of K-fold CV Accuracy, error rate False positive rate, false negative rate, sensitivity, specificity, precision

Copyright © 2004 by Jinyan Li and Limsoon Wong Requirements of Biomedical Classification High accuracy High comprehensibility

Copyright © 2004 by Jinyan Li and Limsoon Wong Importance of Rule-Based Methods Systematic selection of a small number of features used for decision making. Increase the comprehensibility of the knowledge patterns C4.5 and CART are two commonly used rule induction algorithms, or called decision tree induction algorithms

Leaf nodes Internal nodes Root node A B B A A x1x1 x2x2 x4x4 x3x3 > a 1 > a 2 Copyright © 2004 by Jinyan Li and Limsoon Wong Structure of Decision Trees If x 1 > a 1 & x 2 > a 2, then it’s A class C4.5, CART, two of the most widely used Easy interpretation, but accuracy generally unattractive

Elegance of Decision Trees A B BA A Copyright © 2004 by Jinyan Li and Limsoon Wong

CLS (Hunt etal. 1966)--- cost drivenID3 (Quinlan, 1986 MLJ) --- Information-driven C4.5 (Quinlan, 1993) --- Gain ratio + Pruning ideas CART (Breiman et al. 1984) --- Gini Index Brief History of Decision Trees

9 Play samples 5 Don’t A total of 14. A Simple Dataset Copyright © 2004 by Jinyan Li and Limsoon Wong

2 outlook windy humidity Play Don’t sunny overcast rain <= 75 > 75 false true A Decision Tree NP-complete problem Copyright © 2004 by Jinyan Li and Limsoon Wong

Construction of a Decision Tree Determination of the root node of the tree and the root node of its sub-trees

Copyright © 2004 by Jinyan Li and Limsoon Wong Most Discriminatory Feature Every feature can be used to partition the training data If the partitions contain a pure class of training instances, then this feature is most discriminatory

Copyright © 2004 by Jinyan Li and Limsoon Wong Example of Partitions Categorical feature –Number of partitions of the training data is equal to the number of values of this feature Numerical feature –Two partitions

OutlookTempHumidityWindy class Sunny7570truePlay Sunny8090 trueDon’t Sunny8585 falseDon’t Sunny 7295trueDon’t Sunny6970falsePlay Overcast7290truePlay Overcast8378falsePlay Overcast6465truePlay Overcast8175falsePlay Rain7180trueDon’t Rain6570trueDon’t Rain 7580false Play Rain6880false Play Rain7096falsePlay Instance # Copyright © 2004 by Jinyan Li and Limsoon Wong

Total 14 training instances 1,2,3,4,5 P,D,D,D,P 6,7,8,9 P,P,P,P 10,11,12,13,14 D, D, P, P, P Outlook = sunny Outlook = overcast Outlook = rain Copyright © 2004 by Jinyan Li and Limsoon Wong

Total 14 training instances 5,8,11,13,14 P,P, D, P, P 1,2,3,4,6,7,9,10,12 P,D,D,D,P,P,P,D,P Temperature <= 70 Temperature > 70 Copyright © 2004 by Jinyan Li and Limsoon Wong

Three Measures Gini index Information gain Gain ratio

Copyright © 2004 by Jinyan Li and Limsoon Wong Steps of Decision Tree Construction Select the best feature as the root node of the whole tree After partition by this feature, select the best feature (wrt the subset of training data) as the root node of this sub-tree Recursively, until the partitions become pure or almost pure

Copyright © 2004 by Jinyan Li and Limsoon Wong Missing many globally significant rules; mislead the system Characteristics of C4.5 Trees Single coverage of training data (elegance) Divide-and-conquer splitting strategy Fragmentation problem Locally reliable but globally un-significant rules

Copyright © 2004 by Jinyan Li and Limsoon Wong Decision Tree Ensembles Bagging Boosting Random forest Randomization trees CS4

Copyright © 2004 by Jinyan Li and Limsoon Wong h 1, h 2, h 3 are indep classifiers w/ accuracy = 60% C 1, C 2 are the only classes t is a test instance in C 1 h(t) = argmax C  {C1,C2} |{h j  {h 1, h 2, h 3 } | h j (t) = C}| Then prob(h(t) = C 1 ) = prob(h 1 (t)=C 1 & h 2 (t)=C 1 & h 3 (t)=C 1 ) + prob(h 1 (t)=C 1 & h 2 (t)=C 1 & h 3 (t)=C 2 ) + prob(h 1 (t)=C 1 & h 2 (t)=C 2 & h 3 (t)=C 1 ) + prob(h 1 (t)=C 2 & h 2 (t)=C 1 & h 3 (t)=C 1 ) = 60% * 60% * 60% + 60% * 60% * 40% + 60% * 40% * 60% + 40% * 60% * 60% = 64.8% Motivating Example

Copyright © 2004 by Jinyan Li and Limsoon Wong Bagging Proposed by Breiman (1996) Also called Bootstrap aggregating Make use of randomness injected to training data

50 p + 50 n Original training set 48 p + 52 n 49 p + 51 n 53 p + 47 n … A base inducer such as C4.5 A committee H of classifiers: h 1 h 2 …. h k Main Ideas Copyright © 2004 by Jinyan Li and Limsoon Wong

Decision Making by Bagging Given a new test sample T Copyright © 2004 by Jinyan Li and Limsoon Wong

Boosting AdaBoost by Freund & Schapire (1995) Also called Adaptive Boosting Make use of weighted instances and weighted voting

Main Ideas 100 instances with equal weight A classifier h1 error If error is 0 or >0.5 stop Otherwise re- weight: e1/(1-e1) Renormalize to instances with different weights A classifier h2 error Copyright © 2004 by Jinyan Li and Limsoon Wong

Given a new test sample T Decision Making by AdaBoost.M1 Copyright © 2004 by Jinyan Li and Limsoon Wong

Bagging vs Boosting Bagging –Construction of Bagging classifiers are independent –Equal voting Boosting –Construction of a new Boosting classifier depends on the performance of its previous classifier, i.e. sequential construction (a series of classifiers) –Weighted voting

Copyright © 2004 by Jinyan Li and Limsoon Wong Random Forest Proposed by Breiman (2001) Similar to Bagging, but the base inducer is not the standard C4.5 Make use twice of randomness

50 p + 50 n Original training set 48 p + 52 n 49 p + 51 n 53 p + 47 n … A base inducer (not C4.5 but revised) A committee H of classifiers: h 1 h 2 …. h k Main Ideas Copyright © 2004 by Jinyan Li and Limsoon Wong

Root node Original n number of features Selection is from m try number of randomly chosen features A Revised C4.5 as Base Classifier Copyright © 2004 by Jinyan Li and Limsoon Wong

Decision Making by Random Forest Given a new test sample T Copyright © 2004 by Jinyan Li and Limsoon Wong

Randomization Trees Proposed by Dietterich (2000) Make use of randomness in the selection of the best split point

Root node Original n number of features Select one randomly from {feature 1: choice 1,2,3 feature 2: choise 1, 2,. feature 8: choice 1, 2, 3 } Total 20 candidates Equal voting on the committee of such decision trees Main Ideas Copyright © 2004 by Jinyan Li and Limsoon Wong

CS4 Proposed by Li et al (2003) CS4: Cascading and Sharing for decision trees Don’t make use of randomness

Selection of root nodes is in a cascading manner! 1 2 k tree-1 tree-2 tree-k total k trees root nodes Main Ideas Copyright © 2004 by Jinyan Li and Limsoon Wong

Not equal voting Decision Making by CS4 Copyright © 2004 by Jinyan Li and Limsoon Wong

Bagging Random Forest AdaBoost.M1 Randomization Trees CS4 Rules may not be correct when applied to training data Rules correct Copyright © 2004 by Jinyan Li and Limsoon Wong Summary of Ensemble Classifiers

Copyright © 2004 by Jinyan Li and Limsoon Wong Any Question?