Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read “Rule-Based Data Mining Methods for Classification.

Similar presentations


Presentation on theme: "Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read “Rule-Based Data Mining Methods for Classification."— Presentation transcript:

1 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read “Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains”, a tutorial at PKDD04 by Jinyan Li and Limsoon Wong, September 2004. http://sdmc.i2r.a-star.edu.sg/~limsoon/talks/pkdd04/ CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 29 January 2005

2 Course Plan

3 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Outline Overview of Supervised Learning Decision Trees Ensembles –Bagging –Boosting –Random forest –Randomization trees –CS4 Other Methods –K-Nearest Neighbour –Support Vector Machines –Bayesian Approach –Hidden Markov Models –Artificial Neural Networks

4 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Overview of Supervised Learning

5 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Computational Supervised Learning Also called classification Learn from past experience, and use the learned knowledge to classify new data Knowledge learned by intelligent algorithms Examples: –Clinical diagnosis for patients –Cell type classification

6 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Data Classification application involves > 1 class of data. E.g., –Normal vs disease cells for a diagnosis problem Training data is a set of instances (samples, points) with known class labels Test data is a set of instances whose class labels are to be predicted

7 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Typical Notations Training data {  x 1, y 1 ,  x 2, y 2 , …,  x m, y m  } where x j are n-dimensional vectors and y j are from a discrete space Y. E.g., Y = {normal, disease}. Test data {  u 1, ? ,  u 2, ? , …,  u k, ? , }

8 Training data: X Class labels Y f(X) A classifier, a mapping, a hypothesis Test data: U Predicted class labels f(U) Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Process

9 x 11 x 12 x 13 x 14 … x 1n x 21 x 22 x 23 x 24 … x 2n x 31 x 32 x 33 x 34 … x 3n …………………………………. x m1 x m2 x m3 x m4 … x mn n features (order of 1000) m samples class PNPNPNPN gene 1 gene 2 gene 3 gene 4 … gene n Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Relational Representation of Gene Expression Data

10 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Features Also called attributes Categorical features –color = {red, blue, green} Continuous or numerical features –gene expression –age –blood pressure Discretization

11 An Example Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

12 Biomedical Financial Government Scientific Decision trees Emerging patterns SVM Neural networks Classifiers (Medical Doctors) Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Overall Picture of Supervised Learning

13 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Evaluation of a Classifier Performance on independent blind test data K-fold cross validation: Given a dataset, divide it into k even parts, k-1 of them are used for training, and the rest one part treated as test data LOOCV, a special case of K-fold CV Accuracy, error rate False positive rate, false negative rate, sensitivity, specificity, precision

14 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Requirements of Biomedical Classification High accuracy/sensitivity/specificity/precision High comprehensibility

15 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Importance of Rule-Based Methods Systematic selection of a small number of features used for the decision making  Increase the comprehensibility of the knowledge patterns C4.5 and CART are two commonly used rule induction algorithms---a.k.a. decision tree induction algorithms

16 AB A BA x1x1 x2x2 x4x4 x3x3 > a 1 > a 2 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Structure of Decision Trees If x 1 > a 1 & x 2 > a 2, then it’s A class C4.5, CART, two of the most widely used Easy interpretation, but accuracy generally unattractive Leaf nodes Internal nodes Root node B A

17 Elegance of Decision Trees A B BA A Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

18 CLS (Hunt et al. 1966)--- cost drivenID3 (Quinlan, 1986) --- Information-driven C4.5 (Quinlan, 1993) --- Gain ratio + Pruning ideas CART (Breiman et al. 1984) --- Gini Index Brief History of Decision Trees

19 9 Play samples 5 Don’t A total of 14. A Simple Dataset Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

20 2 outlook windy humidity Play Don’t sunny overcast rain <= 75 > 75 false true 2 4 3 3 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong A Decision Tree Construction of a tree is equivalent to determination of the root node of the tree and the root node of its sub- trees

21 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Most Discriminatory Feature Every feature can be used to partition the training data If the partitions contain a pure class of training instances, then this feature is most discriminatory

22 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Example of Partitions Categorical feature –Number of partitions of the training data is equal to the number of values of this feature Numerical feature –Two partitions

23 OutlookTempHumidityWindy class Sunny7570truePlay Sunny8090 trueDon’t Sunny8585 falseDon’t Sunny 7295trueDon’t Sunny6970falsePlay Overcast7290truePlay Overcast8378falsePlay Overcast6465truePlay Overcast8175falsePlay Rain7180trueDon’t Rain6570trueDon’t Rain 7580false Play Rain6880false Play Rain7096falsePlay Instance # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

24 Total 14 training instances 1,2,3,4,5 P,D,D,D,P 6,7,8,9 P,P,P,P 10,11,12,13,14 D, D, P, P, P Outlook = sunny Outlook = overcast Outlook = rain Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

25 Total 14 training instances 5,8,11,13,14 P,P, D, P, P 1,2,3,4,6,7,9,10,12 P,D,D,D,P,P,P,D,P Temperature <= 70 Temperature > 70 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

26 Steps of Decision Tree Construction Select the “best” feature as the root node of the whole tree After partition by this feature, select the best feature (wrt the subset of training data) as the root node of this sub-tree Recursively, until the partitions become pure or almost pure

27 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Three Measures to Evaluate Which Feature is Best Gini index Information gain Information gain ratio

28 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Gini Index

29 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Information Gain

30 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Information Gain Ratio

31 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Missing many globally significant rules; mislead the system Characteristics of C4.5 Trees Single coverage of training data (elegance) Divide-and-conquer splitting strategy Fragmentation problem  Locally reliable but globally insignificant rules

32 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Example Use of Decision Tree Methods: Proteomics Approaches to Biomarker Discovery In prostate and bladder cancers (Adam et al. Proteomics, 2001) In serum samples to detect breast cancer (Zhang et al. Clinical Chemistry, 2002) In serum samples to detect ovarian cancer (Petricoin et al. Lancet; Li & Rao, PAKDD 2004).

33 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Decision Tree Ensembles Bagging Boosting Random forest Randomization trees CS4

34 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong h 1, h 2, h 3 are indep classifiers w/ accuracy = 60% C 1, C 2 are the only classes t is a test instance in C 1 h(t) = argmax C  {C1,C2} |{h j  {h 1, h 2, h 3 } | h j (t) = C}| Then prob(h(t) = C 1 ) = prob(h 1 (t)=C 1 & h 2 (t)=C 1 & h 3 (t)=C 1 ) + prob(h 1 (t)=C 1 & h 2 (t)=C 1 & h 3 (t)=C 2 ) + prob(h 1 (t)=C 1 & h 2 (t)=C 2 & h 3 (t)=C 1 ) + prob(h 1 (t)=C 2 & h 2 (t)=C 1 & h 3 (t)=C 1 ) = 60% * 60% * 60% + 60% * 60% * 40% + 60% * 40% * 60% + 40% * 60% * 60% = 64.8% Motivating Example

35 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Bagging Proposed by Breiman (1996) Also called Bootstrap aggregating Make use of randomness injected to training data

36 50 p + 50 n Original training set 48 p + 52 n 49 p + 51 n 53 p + 47 n … A base inducer such as C4.5 A committee H of classifiers: h 1 h 2 …. h k Main Ideas Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Draw 100 samples with replacement

37 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Decision Making by Bagging Given a new test sample T

38 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Boosting AdaBoost by Freund & Schapire (1995) Also called Adaptive Boosting Make use of weighted instances and weighted voting

39 Main Ideas Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong 100 instances with equal weight A classifier h 1 error If error is 0 or >0.5 stop Otherwise re-weight correctly classified instances by: e1/(1-e1) Renormalize total weight to 1 100 instances with different weights A classifier h 2 error Samples misclassified by h i

40 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Given a new test sample T Decision Making by AdaBoost.M1

41 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Bagging vs Boosting Bagging –Construction of Bagging classifiers are independent –Equal voting Boosting –Construction of a new Boosting classifier depends on the performance of its previous classifier, i.e. sequential construction (a series of classifiers) –Weighted voting

42 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Random Forest Proposed by Breiman (2001) Similar to Bagging, but the base inducer is not the standard C4.5 Make use twice of randomness

43 50 p + 50 n Original training set 48 p + 52 n 49 p + 51 n 53 p + 47 n … A base inducer (not C4.5 but revised) A committee H of classifiers: h 1 h 2 …. h k Main Ideas Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

44 Root node Original n number of features Selection is from m try number of randomly chosen features A Revised C4.5 as Base Classifier Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

45 Decision Making by Random Forest Given a new test sample T

46 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Randomization Trees Proposed by Dietterich (2000) Make use of randomness in the selection of the best split point

47 Root node Original n number of features Select one randomly from {feature 1: choice 1,2,3 feature 2: choice 1, 2,. feature 8: choice 1, 2, 3 } Total 20 candidates Equal voting on the committee of such decision trees Main Ideas Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

48 CS4 Proposed by Li et al (2003) CS4: Cascading and Sharing for decision trees Doesn’t make use of randomness

49 Selection of root nodes is in a cascading manner! 1 2 k tree-1 tree-2 tree-k total k trees root nodes Main Ideas Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

50 Not equal voting Decision Making by CS4 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

51 Bagging Random Forest AdaBoost.M1 Randomization Trees CS4 Rules may not be correct when applied to training data Rules correct Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Summary of Ensemble Classifiers

52 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Any Question?

53 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Other Machine Learning Approaches K-Nearest Neighbour Support Vector Machine Bayesian Approach Hidden Markov Model Artificial Neural Networks

54 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Outline K-Nearest Neighbour Support Vector Machines Bayesian Approach Hidden Markov Models Artificial Neural Networks

55 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong K-Nearest Neighbours

56 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong A common “distance” measure betw samples x and y is where f ranges over features of the samples How kNN Works Given a new case Find k “nearest” neighbours, i.e., k most similar points in the training data set Assign new case to the same class to which most of these neighbours belong

57 Neighborhood 5 of class 3 of class = Illustration of kNN (k=8) Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Image credit: Zaki

58 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Some Issues Simple to implement But need to compare new case against all training cases  May be slow during prediction No need to train But need to design distance measure properly  may need expert for this Can’t explain prediction outcome  Can’t provide a model of the data

59 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Example Use of kNN: Segmentation of White Lesion Matter in MRI Anbeek et al, NeuroImage 21:1037-1044, 2004 Use kNN to automated segmentation of white matter lesions in cranial MR images Rely on info from T1- weighted, inversion recovery, proton density- weighted, T2-weighted, & fluid attenuation inversion recovery scans

60 Example Use of kNN: Ovarian Cancer Diagnosis Based on SELDI Proteomic Data Li et al, Bioinformatics 20:1638-1640, 2004 Use kNN to diagnose ovarian cancers using proteomic spectra Data set is from Petricoin et al., Lancet 359:572-577, 2002 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

61 Enzyme inducers Peroxisome proliferators Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Example Use of kNN: Prediction of Compound Signature Based on Gene Expression Profiles Hamadeh et al, Toxicological Sciences 67:232-240, 2002 Store gene expression profiles corr to biological responses to exposures to known compounds whose toxicological and pathological endpoints are well characterized use kNN to infer effects of unknown compound based on gene expr profiles induced by it

62 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Support Vector Machines

63 (a) Linear separation not possible w/o errors (b) Better separation by nonlinear surfaces in input space (c ) Nonlinear surface corr to linear surface in feature space. Map from input to feature space by “kernel” function   “Linear learning machine” + kernel function as classifier Basic Idea Image credit: Zien Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

64 Hyperplane separating the x’s and o’s points is given by (WX) + b = 0, with (WX) =  j W[j]*X[j]  Decision function is llm(X) = sign((WX) + b)) Linear Learning Machines Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

65 Solution is a linear combination of training points X k with labels Y k W[j] =  k  k *Y k *X k [j], with  k > 0, and Y k = ±1  llm(X) = sign(  k  k *Y k * (X k X) + b) Linear Learning Machines “data” appears only in dot product!

66 llm(X) = sign(  k  k *Y k * (X k X) + b) svm(X) = sign(  k  k *Y k * (  X k  X) + b)  svm(X) = sign(  k  k *Y k * K(X k,X) + b) where K(X k,X) = (  X k  X) Kernel Function Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

67 Kernel Function svm(X) = sign(  k  k *Y k * K(X k,X) + b)  K(A,B) can be computed w/o computing  In fact replace it w/ lots of more “powerful” kernels besides (A B). E.g., –K(A,B) = (A B) d –K(A,B) = exp(– || A B|| 2 / (2*  )),...

68 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong How SVM Works svm(X) = sign(  k  k *Y k * K(X k,X) + b) To find  k is a quadratic programming problem max:  k  k – 0.5 *  k  h  k *  h Y k *Y h *K(X k,X h ) subject to:  k  k *Y k =0 and for all  k, C   k  0 To find b, estimate by averaging Y h –  k  k *Y k * K(X h,X k ) for all  h  0

69 Example Use of SVM: Prediction of Protein- Protein Interaction Sites From Sequences Koike et al, Protein Engineering Design & Selection 17:165-173, 2004 Identification of protein- protein interaction sites is impt for mutant design & prediction of protein- protein networks Interaction sites were predicted here using SVM & profiles of sequentially/spatially neighbouring residues Legend: green=TP, white=TN, yellow=FN, red=FP A : human macrophage migration inhibitory factor B & C: the binding proteins Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

70 Example Use of SVM: Prediction of Gene Function From Gene Expression Brown et al., PNAS 91:262- 267, 2000 Use SVM to identify sets of genes w/ a c’mon function based on their expression profiles Use SVM to predict functional roles of uncharacterized yeast ORFs based on their expression profiles Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

71 Example Use of SVM: Recognition of Protein Translation Initiation Sites Zien et al., Bioinformatics 16:799-807, 2000 Use SVM to recognize protein translation initiation sites from genomic sequences Raw data set is same as Liu & Wong, JBCB 1:139-168, 2003 TIS Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

72 Bayesian Approach

73 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Bayes Theorem P(h) = prior prob that hypothesis h holds P(d|h) = prob of observing data d given h holds P(h|d) = posterior prob that h holds given observed data d

74 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Let H be all possible classes. Given a test instance w/ feature vector {f 1 = v 1, …, f n = v n }, the most probable classification is given by Using Bayes Theorem, rewrites to Since denominator is indep of h j, simplifies to Bayesian Approach

75 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Naïve Bayes But estimating P(f 1 =v 1, …, f n =v n |h j ) accurately may not be feasible unless training data set is sufficiently large “Solved” by assuming f 1, …, f n are indep Then where P(h j ) and P(f i =v i |h j ) can often be estimated reliably from typical training data set

76 Example Use of Bayesian: Design of Screens for Macromolecular Crystallization Hennessy et al., Acta Cryst D56:817-827, 2000 Xtallization of proteins requires search of expt settings to find right conditions for diffraction- quality xtals BMCD is a db of known xtallization conditions Use Bayes to determine prob of success of a set of expt conditions based on BMCD Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

77 Hidden Markov Models

78 How HMM Works HMM is a stochastic generative model for sequences Defined by –finite set of states S –finite alphabet A –transition prob matrix T –emission prob matrix E Move from state to state according to T while emitting symbols according to E sksk s1s1 … s2s2 a1a1 a2a2 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

79 In nth order HMM, T & E depend on all n previous states E.g., for 1st order HMM, given emissions X = x 1, x 2, …, & states S = s 1, s 2, …, the prob of this seq is If seq of emissions X is given, use Viterbi algo to get seq of states S such that S = argmax S Prob(X, S) If emissions unknown, use Baum-Welch algo How HMM Works

80 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Game: –You bet $1 –You roll –Casino rolls –Highest number wins $2 Question: Suppose we played 2 games, and the sequence of rolls was 1, 6, 2, 6. Were we likely to be cheated? Example: Dishonest Casino Casino has two dices: –Fair dice P(i) = 1/6, i = 1..6 –Loaded dice P(i) = 1/10, i = 1..5 P(i) = 1/2, i = 6 Casino switches betw fair & loaded die with prob 1/2. Initially, dice is always fair

81 “Visualization” of Dishonest Casino Copyright © 2004 by Jinyan Li and Limsoon Wong

82 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong 1, 6, 2, 6? We were probably cheated...

83 Example Use of HMM: Protein Families Modelling Baldi et al., PNAS 91:1059- 1063, 1994 HMM is used to model families of biological sequences, such as kinases, globins, & immunoglobulins Bateman et al., NAR 32:D138-D141, 2004 HMM is used to model 6190 families of protein domains in Pfam Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

84 Example Use of HMM: Gene Finding in Bacterial Genomes Borodovsky et al., NAR 23:3554-3562, 1995 Investigated statistical features of 3 classes (wrt level of codon usage bias) of E. coli genes HMM for nucleotide sequences of each class was developed Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

85 Artificial Neural Networks

86 What are ANNs? ANNs are highly connected networks of “neural computing elements” that have ability to respond to input stimuli and learn to adapt to the environment... Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

87 Computing Element Behaves as a monotone function y = f(net), where net is cummulative input stimuli to the neuron net is usually defined as weighted sum of inputs f is usually a sigmoid Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

88 How ANN Works Computing elements are connected into layers in a network The network is used for classification as follows: –Inputs x i are fed into input layer –each computing element produces its corr output –which are fed as inputs to next layer, and so on –until outputs are produced at output layer What makes ANN works is how the weights on the links are learned Usually achieved using “back propagation” Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

89 v ij wjwj zjzj Back Propagation v ji = weight on link betw x i and jth computing element in 1st layer w j be weight of link betw jth computing element in 1st layer and computing element in last layer z j = output of jth computing element in 1st layer Then Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

90 Back Propagation For given sample, y may differ from target output t by amt  Need to propagate this error backwards by adjusting weights in proportion to the error gradient For math convenience, define the squared error as  To find an expression for weight adjustment, we differentiate E wrt v ij and w j to obtain error gradients for these weights v ij wjwj zjzj Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

91 Applying chain rule a few times and recalling definitions of y, z j, E, and f, we derive...

92 Back Propagation v ij wjwj zjzj Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

93 Example Use of ANN: T-Cell Epitopes Prediction Honeyman et al., Nature Biotechnology 16:966-969, 1998 Use ANN to predict candidate T-cell epitopes Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong

94 Any Question?

95 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong References L. Breiman, et al. Classification and Regression Trees. Wadsworth and Brooks, 1984 L. Breiman, “Bagging predictors”. Machine Learning, 24:123--140, 1996 L. Breiman, “Random forests”. Machine Learning, 45:5- 32, 2001 J. R. Quinlan, “Induction of decision trees”. Machine Learning, 1:81--106, 1986 J. R. Quinlan, C4.5: Program for Machine Learning. Morgan Kaufmann, 1993 C. Gini, “Measurement of inequality of incomes”, The Economic Journal, 31:124--126, 1921

96 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong References Y. Freund, et al. “Experiments with a new boosting algorithm”. ICML 1996, pages 148--156 T. G. Dietterich, “An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization”. Machine Learning, 40:139--157, 2000 J. Li, et al. “Ensembles of cascading trees”. ICDM 2003, pages 585--588

97 Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong 2-4pm every Friday at LT33


Download ppt "Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read “Rule-Based Data Mining Methods for Classification."

Similar presentations


Ads by Google