Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

Similar presentations


Presentation on theme: "Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji."— Presentation transcript:

1 Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji University, China 3 IBM T. J. Watson Research Center 4 Hong Kong University of Science and Technology

2 1/18 Why learning with heterogeneous sources? New York Times Training (labeled) Test (unlabeled) Classifier New York Times 85.5% Standard Supervised Learning

3 2/18 New York Times Training (labeled) Test (unlabeled) New York Times Labeled data are insufficient! 47.3% How to improve the performance? In Reality… Why heterogeneous sources?

4 3/18 Why heterogeneous sources? Reuters Labeled data from other sources Target domain test (unlabeled) New York Times 82.6% 1.Different distributions 2.Different outputs 3.Different feature spaces 47.3%

5 Real world examples Social Network: –Can various bookmarking systems help predict social tags for a new system given that their outputs (social tags) and data (documents) are different? WikipediaODPBackflip Blink …… ? 4/18

6 Real world examples Applied Sociology: –Can the suburban housing price census data help predict the downtown housing prices? ? #rooms #bathrooms #windows price 5 2 12 XXX 6 3 11 XXX #rooms #bathrooms #windows price 2 1 4 XXXXX 4 2 5 XXXXX 5/18

7 Other examples Bioinformatics –Previous years’ flu data  new swine flu –Drug efficacy data against breast cancer  drug data against lung cancer –…… Intrusion detection –Existing types of intrusions  unknown types of intrusions Sentiment analysis –Review from SDM  Review from KDD 6/18

8 Learning with Heterogeneous Sources The paper mainly attacks two sub- problems: –Heterogeneous data distributions Clustering based KL divergence and a corresponding sampling technique –Heterogeneous outputs (to regression problem) Unifying outputs via preserving similarity. 7/18

9 Learning with Heterogeneous Sources General Framework Unifying data distributions Unifying outputs Source data Target data Source data Target data 8/18

10 Unifying Data Distributions Basic idea: –Combine the source and target data and perform clustering. –Select the clusters in which the target and source data are similarly distributed, evaluated by KL divergence. 9/18

11 An Example D T Combined Data Adaptive Clustering 10/18

12 Unifying Outputs Basic idea: –Generate initial outputs according to the regression model –For the instances similar in the original output space, make their new outputs closer. 11/18

13 12/18 16 37 26.5 21.2531.75 Initial Outputs Modification

14 Experiment Bioinformatics data set: 13/18

15 Experiment 14/18

16 Experiment Applied sociology data set: 15/18

17 Experiment 16/18

18 17/18 Problem: Learning with Heterogeneous Sources: Heterogeneous data distributions Heterogeneous outputs Solution: Clustering based KL divergence help perform sampling Similarity preserving output generation help unify outputs Conclusions

19 18/18 Thanks!


Download ppt "Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji."

Similar presentations


Ads by Google