Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.

Similar presentations


Presentation on theme: "Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1."— Presentation transcript:

1 Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson NIPS’2009

2 2/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

3 3/46 Ensemble Data …… model 1 model 2 model k Ensemble model Applications: classification, clustering, collaborative filtering, anomaly detection…… Combine multiple models into one!

4 4/46 Stories of Success Million-dollar prize –Improve the baseline movie recommendation approach of Netflix by 10% in accuracy –The top submissions all combine several teams and algorithms as an ensemble Data mining competitions –Classification problems –Winning teams employ an ensemble of classifiers

5 5/46 Why Ensemble Works? (1) Intuition –combining diverse, independent opinions in human decision-making as a protective mechanism (e.g. stock portfolio) Uncorrelated error reduction –Suppose we have 5 completely independent classifiers for majority voting – If accuracy is 70% for each 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) 83.7% majority vote accuracy –101 such classifiers 99.9% majority vote accuracy from T. Holloway, Introduction to Ensemble Learning, 2007.

6 6/46 Why Ensemble Works? (2) Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Some unknown distribution Ensemble gives the global picture! from W. Fan, Random Decision Tree.

7 7/46 Why Ensemble Works? (3) Overcome limitations of single hypothesis –The target function may not be implementable with individual classifiers, but may be approximated by model averaging Decision Tree Model Averaging from I. Davidson et. al., When Efficient Model Averaging Out-Performs Boosting and Bagging, ECML 06.

8 8/46 Research Focus Base models –Improve diversity! Combination scheme –Consensus (unsupervised) –Learn to combine (supervised) Tasks –Classification (supervised ensemble) –Clustering (unsupervised ensemble)

9 9/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

10 10/46 Bagging Bootstrap –Sampling with replacement –Contains around 63.2% original records in each sample Ensemble –Train a classifier on each bootstrap sample –Use majority voting to determine the class label of ensemble classifier Discussions –Incorporate diversity through bootstrap samples –Sensitive base classifiers work better, such as decision tree

11 11/46 Boosting Principles –Boost a set of weak learners to a strong learner –Make records currently misclassified more important AdaBoost –Initially, set uniform weights on all the records –At each round Create a bootstrap sample based on the weights Train a classifier on the sample and apply it on the original training set Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased If the error rate is higher than 50%, start over –Final prediction is weighted average of all the classifiers with weight representing the training accuracy

12 12/46 Classifications (colors) and Weights (size) after 1 iteration Of AdaBoost 3 iterations 20 iterations from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007.

13 13/46 Random Forest Algorithm –For each tree Choose a training set by choosing N times with replacement from the training set For each node, randomly choose m<M features and calculate the best split Fully grown and not pruned –Use majority voting among all the trees Discussions –Bagging+random features: improve diversity

14 14/46 B1: {0,1} B2: {0,1} B3: continuous B2: {0,1} B3: continuous B2: {0,1} B3: continuous B3: continous Random threshold 0.3 Random threshold 0.6 B1 chosen randomly B2 chosen randomly B3 chosen randomly Random Decision Tree from W. Fan, Random Decision Tree.

15 15/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

16 16/46 Clustering Ensemble Goal –Combine “weak” clusterings to a better one from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005

17 17/46 Methods Base Models –Bootstrap samples, different subsets of features –Different clustering algorithms –Random number of clusters Combination –find the correspondence between the labels in the partitions and fuse the clusters with the same labels –treat each output as a categorical variable and cluster in the new feature space

18 18/46 Meta Clustering (1) Cluster-based –Regard each cluster from a base model as a record –Similarity is defined as the percentage of shared common examples –Conduct meta-clustering and assign record to the associated meta-cluster Instance-based –Compute the similarity between two records as the percentage of models that put them into the same cluster v2 v4 v3 v1 v5 v6 c2 c5 c3 c1 c7 c9 c4 c6 c8 c10 from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

19 19/46 Meta Clustering (2) Probability-based –Assume output comes from a mixture of models –Use EM algorithm to learn the model Spectral clustering –Formulate the problem as a bipartite graph –Use spectral clustering to partition the graph v2v4v3v1v5v6 c2c5c3c1 c7c8 c4c6 c9c10 from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

20 20/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

21 21/46 Multiple Source Classification Image CategorizationLike? Dislike?Research Area images, descriptions, notes, comments, albums, tags……. movie genres, cast, director, plots……. users viewing history, movie ratings… publication and co- authorship network, published papers, …….

22 22/46 Model Combination helps! Some areas share similar keywords People may publish in relevant but different areas There may be cross- discipline co-operations supervised unsupervised Supervised or unsupervised

23 23/46 Problem

24 24/46 Motivations Consensus maximization –Combine output of multiple supervised and unsupervised models on a set of objects –The predicted labels should agree with the base models as much as possible Motivations –Unsupervised models provide useful constraints for classification tasks –Model diversity improves prediction accuracy and robustness –Model combination at output level is needed due to privacy-preserving or incompatible formats

25 25/46 Related Work (1) Single models –Supervised: SVM, Logistic regression, …… –Unsupervised: K-means, spectral clustering, …… –Semi-supervised learning, transductive learning Supervised ensemble –Require raw data and labels: bagging, boosting, Bayesian model averaging –Require labels: mixture of experts, stacked generalization –Majority voting works at output level and does not require labels

26 26/46 Related Work (2) Unsupervised ensemble –find a consensus clustering from multiple partitionings without accessing the features Multi-view learning –a joint model is learnt from both labeled and unlabeled data from multiple sources –it can be regarded as a semi-supervised ensemble requiring access to the raw data

27 27/46 Related Work (3)

28 28/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

29 29/46 A Toy Example x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 1 2 3 1 2 3

30 30/46 Groups-Objects x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 1 2 3 1 2 3 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12

31 31/46 Bipartite Graph Groups Objects M1M1 M3M3 object i group j conditional prob vector adjacency initial probability [1 0 0][0 1 0][0 0 1] M2M2 M4M4 ……

32 32/46 Objective Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… minimize disagreement Similar conditional probability if the object is connected to the group Do not deviate much from the initial probability

33 33/46 Methodology Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… Iterate until convergence Update probability of a group Update probability of an object

34 34/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

35 35/46 Constrained Embedding objects groups constraints for groups from classification models

36 36/46 Ranking on Consensus Structure Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… query adjacency matrix personalized damping factors

37 37/46 Incorporating Labeled Information Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… Objective Update probability of a group Update probability of an object

38 38/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

39 39/46 Experiments-Data Sets 20 Newsgroup –newsgroup messages categorization –only text information available Cora –research paper area categorization –paper abstracts and citation information available DBLP –researchers area prediction –publication and co-authorship network, and publication content –conferences’ areas are known

40 40/46 Experiments-Baseline Methods (1) Single models –20 Newsgroup: logistic regression, SVM, K-means, min-cut –Cora abstracts, citations (with or without a labeled set) –DBLP publication titles, links (with or without labels from conferences) Proposed method –BGCM –BGCM-L: semi-supervised version combining four models –2-L: two models –3-L: three models

41 41/46 Experiments-Baseline Methods (2) Ensemble approaches –clustering ensemble on all of the four models- MCLA, HBGF

42 42/46 Accuracy (1)

43 43/46 Accuracy (2)

44 44/46

45 45/46 Conclusions Ensemble –Combining independent, diversified models improves accuracy –Information explosion, various learning packages available Consensus Maximization –Combine the complementary predictive powers of multiple supervised and unsupervised models –Propagate labeled information between group and object nodes iteratively over a bipartite graph –Two interpretations: constrained embedding and ranking on consensus structure Applications –Multiple source learning, Ranking, Truth Finding……

46 46/46 Thanks! Any questions? http://www.ews.uiuc.edu/~jinggao3/nips09bgcm.htm jinggao3@illinois.edu


Download ppt "Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1."

Similar presentations


Ads by Google