Download presentation
Presentation is loading. Please wait.
Published byPatrick Goodwin Modified over 11 years ago
1
Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu
2
What is sample selection bias? Inductive learning: training data (x,y) is sampled from the universe of examples. In many applications: training data (x,y) is not sampled randomly. Insurance and mortgage data: you only know those people you give a policy. School data: self-select
3
What is sample selection bias?
4
Ubiquitous Loan Approval Drug screening Weather forecasting Ad Campaign Fraud Detection User Profiling Biomedical Informatics Intrusion Detection Insurance etc
5
Different types of sample selection bias There are different possibilities of how (x,y) is selected S=1 denotes (x,y) is chosen. S is independent from x and y. Total random sample. S is dependent on y not x. Class bias S is dependent on x not on y. Feature bias. S is dependent on both x and y. Both class and feature.
6
Our method Original Dataset Structural Discovery Structural Re-balancing Corrected Dataset
7
Our method Structural Discovery via automatic clustering Key Idea: (1)Binary divide. (2)Stop dividing when most of the labeled data in the cluster have the same label
8
Our method Structural Re-balancing via sample selection Key idea: (1)Select the same proportion from each cluster. (2)Select those confident and representative examples. (3)Label the unlabeled examples by neighbors
9
Our method Theoretical analysis: Lemma 3.1 answers that why select the same proportion of examples from each cluster can reduce sample selection bias? Lemma 3.2 derives a criterion to select confident examples.
10
Feature Bias Accuracy of corrected minus Accuracy of original
11
Class Bias Accuracy of corrected minus Accuracy of original
12
Complete Bias Corrected VS. Original
13
Advantages: 1. Type Independent 2. Model Independent 3. Straightforward Experiment Dataset and the related matlab code can be downloaded at: ftp://202.116.65.69/sxx/SDM08 Or http://www.weifan.info
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.