Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Similar presentations


Presentation on theme: "Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK."— Presentation transcript:

1

2 Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK

3 1. Facts

4 classifier feature values (object description) class label Classifier ensembles

5 classifier feature values (object description) classifier class label “combiner” Classifier ensembles

6 feature values (object description) class label a neural network classifier combiner classifier Classifier ensembles ensemble?

7 classifier feature values (object description) class label Classifier ensembles classifier combiner ensemble? a fancy combiner

8 classifier feature values (object description) classifier class label combiner classifier? Classifier ensembles a fancy feature extractor classifier

9 a. because we like to complicate entities beyond necessity (anti-Occam’s razor) b. because we are lazy and stupid and can’t be bothered to design and train one single sophisticated classifier c. because democracy is so important to our society, it must be important to classification Why classifier ensembles then?

10 Classifier ensembles Juan : “I just like to combine things…”

11 Classifier ensembles Juan : “I just like combining things…”

12 combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] bagging, boosting, arcing, wagging [Sharkey99] modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. fanciest oldest Classifier ensembles oldest

13 Moscow, Energoizdat, 1981 ≈ 1 c The method of collective recognition

14 classifier ensemble classifier selection (regions of competence) weighted majority vote

15 Collective statistical decisions in [pattern] recognition Moscow, Radio I svyaz’, 1983

16 weighted majority vote

17 Jim Bezdek. This superb graph was borrowed from “Fuzzy models and digital signal processing (for pattern recognition): Is this a good marriage?”, Digital Signal Processing, 3, 1993, 253-270, by my good friend Jim Bezdek. Expectation 196519701975198019851993 Naive euphoria Peak of hype Overreaction to immature technology Depth of Cynicism True user benefit Asymptote of reality

18 Expectation 196519701975198019851993 Naive euphoria Peak of hype Overreaction to immature technology Depth of Cynicism True user benefit Asymptote of reality So where are we?

19 Expectation 19782008 Naive euphoria Peak of hype Overreaction to immature technology Depth of Cynicism True user benefit Asymptote of reality 12345 So where are we?

20 To make the matter worse... Expert 1: Expert 1: J. Ghosh Forum: Forum:3rd International Workshop on Multiple Classifier Systems, 2002 (invited lecture) Quote:current understanding is now quite mature Quote: “... our current understanding of ensemble-type multiclassifier systems is now quite mature...” Expert 2: Expert 2: T.K. Ho Forum: Forum: Invited book chapter, 2002 Quote:we do not yet have a scientific understanding Quote: “Many of the above questions are there because we do not yet have a scientific understanding of the classifier combination mechanisms” half full half empty

21 1. Classifier ensembles 2. AdaBoost – (1) 3. Random Forest – (1) – (2) 4. Decision Templates – (1) – (2) – (3) Number of publications (13 Nov 2008) incomplete for 2008

22 1. Classifier ensembles 2. AdaBoost – (1) 3. Random Forest – (1) – (2) 4. Decision Templates – (1) – (2) – (3) Number of publications (13 Nov 2008) incomplete for 2008 Literature “One cannot embrace the unembraceable.” Kozma Prutkov

23 ICPR 2008 984 papers ~2000 words in the titles first 2 principal components image segment feature video track object feature local select classifier ensembles

24 Expectation 19782008 Naive euphoria Peak of hype Overreaction to immature technology Depth of Cynicism True user benefit Asymptote of reality 12345 So where are we? still here… somewhere…

25 2. Fiction

26 Fiction? Diversity. Diverse ensembles are better ensembles? Diversity = independence? Adaboost. “The best off-the-shelf classifier”?

27 If these reports differ from one another, the computer identifies the two reports with the greatest overlap and produces a "majority report," taking this as the accurate prediction of the future. But the existence of majority reports implies the existence of a "minority report." - a science fiction short story by Philip K. Dick first published in 1956. It is about a future society where murders are prevented through the efforts of three mutants (“precogs”) who can see two weeks ahead in the future. The story was made into a popular film in 2002. Each of the three “precogs” generates its own report or prediction. The three reports of are analysed by a computer. Minority Report Classifier Ensemble

28 And, of course, the most interesting case is when the classifiers disagree – the minority report. Diversity is good Wrong Correct3 classifiers individual accuracy = 10/15 = 0.667 independent classifiers ensemble accuracy (majority vote) = 11/15 = 0.733 identical classifiers ensemble accuracy (majority vote) = 10/15 = 0.667

29 dependent classifiers 1 ensemble accuracy (majority vote) = 7/15 = 0.467 dependent classifiers 2 ensemble accuracy (majority vote) = 15/15 = 1.000 Myth: Independence is the best scenario. Myth: Diversity is always good. identical0.667 independent0.733 dependent 10.467 dependent 21.000 worse than individual better than independence

30 Example The set-up UCI data repository “heart” data set First 9 features; all 280 different partitions into [3, 3, 3] Ensemble of 3 linear classifiers Majority vote 10-fold cross-validation What we measured: Individual accuracies of the ensemble members The ensemble accuracy The ensemble diversity (just one of all these measures…)

31 ensemble is better Ensemble accuracy Individual accuracy average individual accuracy minimum individual accuracy maximum individual accuracy Example 280 ensembles

32 Example diversity Ensemble accuracy ? more diverse less accurate

33 Example

34 diversity Individual accuracy expected large ensemble accuracy large

35 AdaBoost is everything AdaBoost Bagging Swiss Army Knife Russian Army Knife Surely, there is more to combining classifiers than Bagging and AdaBoost

36 The current spotty conclusions are incomprehensible, and are of no generalization or reference value. “This altogether gives a very bad impression of ill-conceived experiments and confusing and unreliable conclusions.... The current spotty conclusions are incomprehensible, and are of no generalization or reference value.” Good study, with very useful information in the Conclusions. “This is a potentially great new method and any experimental analysis would be very useful for understanding its potential. Good study, with very useful information in the Conclusions.” Example – Rotation Forest

37 Kappa-error diagrams for vowel data 89.52 91.93 92.37 95.77

38 Kappa-error diagrams for wave data 83.94 81.78 81.45 81.89

39 Ensemble size % of data sets (out of 32) where the respective ensemble method is best

40 So, no, AdaBoost is NOT everything

41 3. Faults

42 OUR faults! Complacent: We don’t care about terminology. Vain: To get publications, we invent complex models for simple problems or, worse even, complex non-existent problems. Untidy: There is little effort to systemise the area. Ignorant and lazy: By virtue of ignorance we tackle problems well and truly solved by others. Krassi’s motto “I don’t have time to read papers because I am busy writing them”. Haughty: Simple things that work do not impress us until they get proper theoretical proofs.

43 God, seeing what the people were doing, gave each person a different language to confuse them and scattered the people throughout the earth… image taken from http://en.wikipedia.org/wiki/Tower_of_Babel Pattern recognition land Data mining kingdom Machine learning ocean Statistics underworld and… Weka… Terminology

44 instance example observation data point object attribute feature variable classifier hypothesis SVM SMO nearest neighbour lazy learner classifier ensemble learner naïve Bayes AODE decision tree C4.5 J48

45 SVM SMO nearest neighbour lazy learner classifier hypothesislearner naïve Bayes AODE decision tree C4.5J48 instanceexampleobservation data pointobject attribute feature variable classifier ensemble meta learner ML Stats Weka

46 combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] bagging, boosting, arcing, wagging [Sharkey99] modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. Out of fashion Subsumed Classifier ensembles - names

47 combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier ensembles [Drucker94,Filippi94,Sharkey99] United terminology! Yey! MCS – Multiple Classifier Systems Workshops 2000-2009

48 Simple things that work… We detest simple things that work well for an unknown reason!!!

49 flagship of THEORY… Ideal scenario… empirics and applications  hijacked by heuristics…  HEURISTICS Real theory Simple things that work… We detest simple things that work well for an unknown reason!!!

50 Lessons from the past: Fuzzy sets stability of the system? reliability? optimality? why not probability? Who cares?... temperature for washing machine programmes automatic focus in digital cameras ignition angle of internal combustion in cars Because it is computationally simpler (faster) easier to build, interpret and maintain Learn to trust heuristics and empirics…

51 4. Future

52 Expectation 19782008 Asymptote of reality Future Branch out ? Multiple instance learning Non i.i.d. examples Skewed class distributions Noisy class labels Sparse data Non-stationary data classifier ensembles for changing environments classifier ensembles for change detection

53 More forums - meetings/joint projects/workshops at ECML, ICML, COLT. Science magazines and media profiles. Specialised journal on Multiple Classifier Systems? Competitions on MCS Databases of resources – bibliography, software, categorised data repositories Chocolate with negative calories, please!!! Future: to-do list

54 D.J. Hand Classifier Technology and the Illusion of Progress Statistical Science 21(1), 2006, 1-14. “… I am not suggesting that no major advances in classification methods will ever be made. Such a claim would be absurd in the face of developments such as the bootstrap and other resampling approaches, which have led to significant advances in classification and other statistical models. All I am saying is that much of the purported advance may well be illusory.”... So have we truly made progress or are we just kidding ourselves? Empty-y-y… (not even half empty-y-y-y …)

55 Bo-o-ori-i-i-ing....


Download ppt "Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK."

Similar presentations


Ads by Google