Presentation is loading. Please wait.

Presentation is loading. Please wait.

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Similar presentations


Presentation on theme: "ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data."— Presentation transcript:

1 ISQS 6347, Data & Text Mining1 Ensemble Methods

2 ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

3 ISQS 6347, Data & Text Mining 3 General Idea

4 ISQS 6347, Data & Text Mining 4 Why does it work? Suppose there are 25 base classifiers  Each classifier has error rate,  = 0.35  Assume classifiers are independent  Probability that the ensemble classifier makes a wrong prediction:

5 ISQS 6347, Data & Text Mining 5 Combined Ensemble Models Training Data Sample 1 Score Data Modeling Method Ensemble Model (Average) Model 1 Model 2 Model 3 Sample 2 Sample 3

6 ISQS 6347, Data & Text Mining 6 Combined Ensemble Models Training Data Score Data Modeling Method A Ensemble Model (Average) Model A Model B Modeling Method B

7 ISQS 6347, Data & Text Mining 7 Examples of Ensemble Methods How to generate an ensemble of classifiers?  Bagging  Boosting

8 ISQS 6347, Data & Text Mining 8 Bagging Sampling with replacement Build classifier on each bootstrap sample Each sample has probability (1 – 1/n) n of being selected The probability an observation is not selected is 1 - (1 – 1/n) n. When n is large enough, (1 – 1/n) n  1/e. So the probability is 1 – 1/e  0.632

9 ISQS 6347, Data & Text Mining 9 Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records  Initially, all N records are assigned equal weights  Unlike bagging, weights may change at the end of boosting round

10 ISQS 6347, Data & Text Mining 10 Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds


Download ppt "ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data."

Similar presentations


Ads by Google