Download presentation
Presentation is loading. Please wait.
Published byJoanna Weaver Modified over 9 years ago
1
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor: Samia BOUKIR CLASSIFICATION OF SATELLITE IMAGES USING MARGIN-BASED ENSEMBLE METHODS. APPLICATION TO LAND COVER MAPPING OF NATURAL SITES
2
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Outline Context and objectives Ensemble learning and margin theory Mislabeled training data identification and filtering based on ensemble margin Conclusions 2
3
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Objective = Multiple classifier framework, based on ensemble margin, to effectively and efficiently map remote sensing data Major challenges in ensemble learning Training data class imbalance Training data redundancy Training data class mislabeling 3 Context and Objectives ICIP 2013 September 2013, Melbourne, Australia 3 Forest imageLand cover map
4
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 I. Ensemble learning and margin theory
5
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 5 Condorcet theorem (1785): even if the members of a group have just 50% of chance to individually take the right decision, a majority voting of the same group has nearly 100% of chance to take the right decision! First use of ensemble concept in machine learning: Hansen & Salamon (IEEE PAMI 1990) Classification by Random Forests (decision tree- based ensemble): Breiman (Machine Learning 2001) Marquis de Condorcet French mathematician and philosopher (1743-1794) Introduction to ensemble learning
6
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Typical ensemble method phases Production of multiple homogeneous classifiers, and Their combination 6 Introduction to ensemble learning Typical ensemble creation method = bagging bootstrap sampling (with replacement) over training data to produce diverse classifiers components of ensemble Typical multiple classifiers combination = majority voting ensemble decision = class with most votes 6
7
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin of ensemble methods Difference between votes to different classes Classification confidence One popular ensemble margin = difference between the fraction of classifiers voting correctly and incorrectly 7
8
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 II. Mislabeled training data identification and filtering based on ensemble margin
9
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Mislabeling problem in machine learning I am confused!! Mislabeling error 9 I am a cow!
10
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Ensemble-based class noise identification Typical class noise filtering approach: Majority vote filter (Brodley et al. 1999 ) Principle: 10 If more than half (>50%) of all the base classifiers of the ensemble classify an instance incorrectly, then this instance is tagged as mislabeled. Weakness: It identifies all the clean training instances that have been wrongly classified by the ensemble classifier as mislabeled false positives
11
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance elimination algorithm Approach: noise ranking-based Noise evaluation function: N(x i ) = |margin (x i )| (x i,y i ) S / C(x i ) ≠ y i 1.Construct an ensemble classifier C with all training data (x i,y i ) S 2.Compute the margin of each training instance x i 3.Order all the training instances x i, that have been misclassified, according to their noise evaluation values N(x i ), in descending order 4.Eliminate the first M most likely mislabeled instances x i to form a new cleaner training set 5.Evaluate the cleaned training set by classification performance, on a validation set V 6.Select the best filtered training set Algorithm 11
12
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance correction algorithm Approach: noise ranking-based Noise evaluation function: N(x i ) = |margin (x i )| (x i,y i ) S / C(x i ) ≠ y i 1.Construct an ensemble classifier C with all training data (x i,y i ) S 2.Compute the margin of each training instance x i 3.Order all the training instances x i, that have been misclassified, according to their noise evaluation values N(x i ), in descending order 4.Correct the labels of first M most likely mislabeled instances x i using the predicted labels to form a new cleaner training set. 5.Evaluate the repaired training set by classification performance, on a validation set V 6.Select the best corrected training set. Algorithm 12
13
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results Data sets Three remote sensing datasets for land cover mapping of sites of different types Data setTrainingValidationTestVariablesClasses Forest1946973194642 Urban68003400680034 Agriculture200010002000366 13 Table 1. Data sets
14
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results Noise filtering performance Noise-sensitive ensemble classifier = Boosting Two types of class noise: Random noise = 20% (training and validation sets with a percentage of randomly mislabeled instances) Actual noise = unknown (amount and type) Noise filter strategy: adaptive filtering ADAPTIVE FILTERING eliminates or corrects an amount of ordered detected mislabeled instances = amount that led to maximum accuracy on validation set 14
15
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results: Artificial noise Table 2. Overall accuracies (%) of boosting classifier with no filter, majority vote filtered and margin-based filtered training sets on artificially corrupted data sets (noise rate=20%) 15 Increase in accuracy of up to 5%
16
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results: Actual noise Table 3. Overall accuracies (%) of boosting classifier with no filter, majority vote filtered and margin-based filtered training sets on original data sets 16 Increase in accuracy of 2%
17
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Conclusions Ensemble margin = effective and efficient guideline to ensemble design Ensemble learning and ensemble margin are effective for land cover mapping Ensemble margin-based class noise filters are significantly more accurate than majority votes filters. 17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.