3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor: Samia BOUKIR CLASSIFICATION OF SATELLITE IMAGES USING MARGIN-BASED ENSEMBLE METHODS. APPLICATION TO LAND COVER MAPPING OF NATURAL SITES

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Outline Context and objectives Ensemble learning and margin theory Mislabeled training data identification and filtering based on ensemble margin Conclusions 2

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Objective = Multiple classifier framework, based on ensemble margin, to effectively and efficiently map remote sensing data Major challenges in ensemble learning  Training data class imbalance  Training data redundancy  Training data class mislabeling 3 Context and Objectives ICIP 2013 September 2013, Melbourne, Australia 3 Forest imageLand cover map

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 I. Ensemble learning and margin theory

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 5 Condorcet theorem (1785): even if the members of a group have just 50% of chance to individually take the right decision, a majority voting of the same group has nearly 100% of chance to take the right decision! First use of ensemble concept in machine learning: Hansen & Salamon (IEEE PAMI 1990) Classification by Random Forests (decision tree- based ensemble): Breiman (Machine Learning 2001) Marquis de Condorcet French mathematician and philosopher (1743-1794) Introduction to ensemble learning

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Typical ensemble method  phases   Production of multiple homogeneous classifiers, and  Their combination 6 Introduction to ensemble learning Typical ensemble creation method = bagging  bootstrap sampling (with replacement) over training data to produce diverse classifiers  components of ensemble Typical multiple classifiers combination = majority voting  ensemble decision = class with most votes 6

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin of ensemble methods Difference between votes to different classes Classification confidence One popular ensemble margin = difference between the fraction of classifiers voting correctly and incorrectly 7

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 II. Mislabeled training data identification and filtering based on ensemble margin

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Mislabeling problem in machine learning I am confused!! Mislabeling error 9 I am a cow!

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Ensemble-based class noise identification Typical class noise filtering approach: Majority vote filter (Brodley et al. 1999 ) Principle: 10 If more than half (>50%) of all the base classifiers of the ensemble classify an instance incorrectly, then this instance is tagged as mislabeled. Weakness: It identifies all the clean training instances that have been wrongly classified by the ensemble classifier as mislabeled  false positives

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance elimination algorithm Approach: noise ranking-based Noise evaluation function: N(x i ) = |margin (x i )|  (x i,y i )  S / C(x i ) ≠ y i 1.Construct an ensemble classifier C with all training data (x i,y i )  S 2.Compute the margin of each training instance x i 3.Order all the training instances x i, that have been misclassified, according to their noise evaluation values N(x i ), in descending order 4.Eliminate the first M most likely mislabeled instances x i to form a new cleaner training set 5.Evaluate the cleaned training set by classification performance, on a validation set V 6.Select the best filtered training set Algorithm 11

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance correction algorithm Approach: noise ranking-based Noise evaluation function: N(x i ) = |margin (x i )|  (x i,y i )  S / C(x i ) ≠ y i 1.Construct an ensemble classifier C with all training data (x i,y i )  S 2.Compute the margin of each training instance x i 3.Order all the training instances x i, that have been misclassified, according to their noise evaluation values N(x i ), in descending order 4.Correct the labels of first M most likely mislabeled instances x i using the predicted labels to form a new cleaner training set. 5.Evaluate the repaired training set by classification performance, on a validation set V 6.Select the best corrected training set. Algorithm 12

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results Data sets  Three remote sensing datasets for land cover mapping of sites of different types Data setTrainingValidationTestVariablesClasses Forest1946973194642 Urban68003400680034 Agriculture200010002000366 13 Table 1. Data sets

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results Noise filtering performance  Noise-sensitive ensemble classifier = Boosting  Two types of class noise: Random noise = 20% (training and validation sets with a percentage of randomly mislabeled instances) Actual noise = unknown (amount and type)  Noise filter strategy: adaptive filtering ADAPTIVE FILTERING eliminates or corrects an amount of ordered detected mislabeled instances = amount that led to maximum accuracy on validation set 14

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results: Artificial noise Table 2. Overall accuracies (%) of boosting classifier with no filter, majority vote filtered and margin-based filtered training sets on artificially corrupted data sets (noise rate=20%) 15 Increase in accuracy of up to 5%

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Margin-based mislabeled instance identification results: Actual noise Table 3. Overall accuracies (%) of boosting classifier with no filter, majority vote filtered and margin-based filtered training sets on original data sets 16 Increase in accuracy of 2%

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Conclusions Ensemble margin = effective and efficient guideline to ensemble design Ensemble learning and ensemble margin are effective for land cover mapping Ensemble margin-based class noise filters are significantly more accurate than majority votes filters. 17

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:

Similar presentations

Presentation on theme: "3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:

Similar presentations

Presentation on theme: "3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:"— Presentation transcript:

Similar presentations

About project

Feedback