Download presentation
Presentation is loading. Please wait.
Published byMatthew Stanley Modified over 11 years ago
1
Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren 1 Xiaoxiao Shi 1 Wei Fan 2 Philip S. Yu 2 1 Sun Yet-Sun University, China 2 IBM T.J.Watson 3 University of Illinois at Chicago
2
What is sample selection bias? Inductive learning: training data (x,y) is sampled from the universe of examples. In many applications: training data (x,y) is not sampled randomly. Insurance and mortgage data: you only know those people you give a policy. School data: self-select There are different possibilities of how (x,y) is selected (Zadrozny04) S=1 denotes (x,y) is chosen. S is independent from x and y. Total random sample. S is dependent on y not x. Class bias S is dependent on x not on y. Feature bias. S is dependent on both x and y. Both class and feature. Ubiquitous: Loan Approval, Drug screening, Weather forecasting, Ad Campaign, Fraud Detection, User Profiling, Biomedical Informatics, Intrusion Detection Insurance, etc
3
Our method Key ideas: Original DatasetStructural DiscoveryStructural RebalanceCorrected Dataset Automatic Clustering Advantages: 1. Type Independent 2. Model Independent 3. Straightforward 2. Select trustful ones 3. Label by neighbors 1. The same proportion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.