Presentation is loading. Please wait.

Presentation is loading. Please wait.

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu.

Similar presentations


Presentation on theme: "Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu."— Presentation transcript:

1 Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu

2 What is sample selection bias? Inductive learning: training data (x,y) is sampled from the universe of examples. In many applications: training data (x,y) is not sampled randomly. Insurance and mortgage data: you only know those people you give a policy. School data: self-select

3 What is sample selection bias?

4 Ubiquitous Loan Approval Drug screening Weather forecasting Ad Campaign Fraud Detection User Profiling Biomedical Informatics Intrusion Detection Insurance etc

5 Different types of sample selection bias There are different possibilities of how (x,y) is selected S=1 denotes (x,y) is chosen. S is independent from x and y. Total random sample. S is dependent on y not x. Class bias S is dependent on x not on y. Feature bias. S is dependent on both x and y. Both class and feature.

6 Our method Original Dataset Structural Discovery Structural Re-balancing Corrected Dataset

7 Our method Structural Discovery via automatic clustering Key Idea: (1)Binary divide. (2)Stop dividing when most of the labeled data in the cluster have the same label

8 Our method Structural Re-balancing via sample selection Key idea: (1)Select the same proportion from each cluster. (2)Select those confident and representative examples. (3)Label the unlabeled examples by neighbors

9 Our method Theoretical analysis: Lemma 3.1 answers that why select the same proportion of examples from each cluster can reduce sample selection bias? Lemma 3.2 derives a criterion to select confident examples.

10 Feature Bias Accuracy of corrected minus Accuracy of original

11 Class Bias Accuracy of corrected minus Accuracy of original

12 Complete Bias Corrected VS. Original

13 Advantages: 1. Type Independent 2. Model Independent 3. Straightforward Experiment Dataset and the related matlab code can be downloaded at: ftp://202.116.65.69/sxx/SDM08 Or http://www.weifan.info


Download ppt "Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu."

Similar presentations


Ads by Google