Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.

Slides:

Advertisements

Similar presentations

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.

Random Forest Predrag Radenković 3237/10

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.

Data Mining Classification: Alternative Techniques

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Data Mining Classification: Alternative Techniques

Longin Jan Latecki Temple University

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.

Sparse vs. Ensemble Approaches to Supervised Learning

On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.

On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Ensemble Learning: An Introduction

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Examples of Ensemble Methods

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Ensemble Learning (2), Tree and Forest

For Better Accuracy Eick: Ensemble Learning

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.

Data mining and machine learning A brief introduction.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Semi-Supervised Clustering

Trees, bagging, boosting, and stacking

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007

Cost-Sensitive Learning

Combining Base Learners

Cost-Sensitive Learning

Introduction to Data Mining, 2nd Edition

Knowledge Transfer via Multiple Model Local Structure Mapping

Ensemble learning.

A task of induction to find patterns

Ch13. Ensemble method (draft)

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson NIPS’2009

2/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

3/46 Ensemble Data …… model 1 model 2 model k Ensemble model Applications: classification, clustering, collaborative filtering, anomaly detection…… Combine multiple models into one!

4/46 Stories of Success Million-dollar prize –Improve the baseline movie recommendation approach of Netflix by 10% in accuracy –The top submissions all combine several teams and algorithms as an ensemble Data mining competitions –Classification problems –Winning teams employ an ensemble of classifiers

5/46 Why Ensemble Works? (1) Intuition –combining diverse, independent opinions in human decision-making as a protective mechanism (e.g. stock portfolio) Uncorrelated error reduction –Suppose we have 5 completely independent classifiers for majority voting – If accuracy is 70% for each 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) 83.7% majority vote accuracy –101 such classifiers 99.9% majority vote accuracy from T. Holloway, Introduction to Ensemble Learning, 2007.

6/46 Why Ensemble Works? (2) Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Some unknown distribution Ensemble gives the global picture! from W. Fan, Random Decision Tree.

7/46 Why Ensemble Works? (3) Overcome limitations of single hypothesis –The target function may not be implementable with individual classifiers, but may be approximated by model averaging Decision Tree Model Averaging from I. Davidson et. al., When Efficient Model Averaging Out-Performs Boosting and Bagging, ECML 06.

8/46 Research Focus Base models –Improve diversity! Combination scheme –Consensus (unsupervised) –Learn to combine (supervised) Tasks –Classification (supervised ensemble) –Clustering (unsupervised ensemble)

9/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

10/46 Bagging Bootstrap –Sampling with replacement –Contains around 63.2% original records in each sample Ensemble –Train a classifier on each bootstrap sample –Use majority voting to determine the class label of ensemble classifier Discussions –Incorporate diversity through bootstrap samples –Sensitive base classifiers work better, such as decision tree

11/46 Boosting Principles –Boost a set of weak learners to a strong learner –Make records currently misclassified more important AdaBoost –Initially, set uniform weights on all the records –At each round Create a bootstrap sample based on the weights Train a classifier on the sample and apply it on the original training set Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased If the error rate is higher than 50%, start over –Final prediction is weighted average of all the classifiers with weight representing the training accuracy

12/46 Classifications (colors) and Weights (size) after 1 iteration Of AdaBoost 3 iterations 20 iterations from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods

13/46 Random Forest Algorithm –For each tree Choose a training set by choosing N times with replacement from the training set For each node, randomly choose m<M features and calculate the best split Fully grown and not pruned –Use majority voting among all the trees Discussions –Bagging+random features: improve diversity

14/46 B1: {0,1} B2: {0,1} B3: continuous B2: {0,1} B3: continuous B2: {0,1} B3: continuous B3: continous Random threshold 0.3 Random threshold 0.6 B1 chosen randomly B2 chosen randomly B3 chosen randomly Random Decision Tree from W. Fan, Random Decision Tree.

15/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

16/46 Clustering Ensemble Goal –Combine “weak” clusterings to a better one from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005

17/46 Methods Base Models –Bootstrap samples, different subsets of features –Different clustering algorithms –Random number of clusters Combination –find the correspondence between the labels in the partitions and fuse the clusters with the same labels –treat each output as a categorical variable and cluster in the new feature space

18/46 Meta Clustering (1) Cluster-based –Regard each cluster from a base model as a record –Similarity is defined as the percentage of shared common examples –Conduct meta-clustering and assign record to the associated meta-cluster Instance-based –Compute the similarity between two records as the percentage of models that put them into the same cluster v2 v4 v3 v1 v5 v6 c2 c5 c3 c1 c7 c9 c4 c6 c8 c10 from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

19/46 Meta Clustering (2) Probability-based –Assume output comes from a mixture of models –Use EM algorithm to learn the model Spectral clustering –Formulate the problem as a bipartite graph –Use spectral clustering to partition the graph v2v4v3v1v5v6 c2c5c3c1 c7c8 c4c6 c9c10 from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

20/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

21/46 Multiple Source Classification Image CategorizationLike? Dislike?Research Area images, descriptions, notes, comments, albums, tags……. movie genres, cast, director, plots……. users viewing history, movie ratings… publication and co- authorship network, published papers, …….

22/46 Model Combination helps! Some areas share similar keywords People may publish in relevant but different areas There may be cross- discipline co-operations supervised unsupervised Supervised or unsupervised

23/46 Problem

24/46 Motivations Consensus maximization –Combine output of multiple supervised and unsupervised models on a set of objects –The predicted labels should agree with the base models as much as possible Motivations –Unsupervised models provide useful constraints for classification tasks –Model diversity improves prediction accuracy and robustness –Model combination at output level is needed due to privacy-preserving or incompatible formats

25/46 Related Work (1) Single models –Supervised: SVM, Logistic regression, …… –Unsupervised: K-means, spectral clustering, …… –Semi-supervised learning, transductive learning Supervised ensemble –Require raw data and labels: bagging, boosting, Bayesian model averaging –Require labels: mixture of experts, stacked generalization –Majority voting works at output level and does not require labels

26/46 Related Work (2) Unsupervised ensemble –find a consensus clustering from multiple partitionings without accessing the features Multi-view learning –a joint model is learnt from both labeled and unlabeled data from multiple sources –it can be regarded as a semi-supervised ensemble requiring access to the raw data

27/46 Related Work (3)

28/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

29/46 A Toy Example x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x

30/46 Groups-Objects x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x3 x7 x4 x5x6 x1x2 x g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12

31/46 Bipartite Graph Groups Objects M1M1 M3M3 object i group j conditional prob vector adjacency initial probability [1 0 0][0 1 0][0 0 1] M2M2 M4M4 ……

32/46 Objective Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… minimize disagreement Similar conditional probability if the object is connected to the group Do not deviate much from the initial probability

33/46 Methodology Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… Iterate until convergence Update probability of a group Update probability of an object

34/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

35/46 Constrained Embedding objects groups constraints for groups from classification models

36/46 Ranking on Consensus Structure Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… query adjacency matrix personalized damping factors

37/46 Incorporating Labeled Information Groups Objects M1M1 M3M3 [1 0 0][0 1 0][0 0 1] M2M2 M4M4 …… Objective Update probability of a group Update probability of an object

38/46 Outline An overview of ensemble methods –Introduction –Supervised ensemble techniques –Unsupervised ensemble techniques Consensus among supervised and unsupervised models –Problem and motivation –Methodology –Interpretations –Experiments

39/46 Experiments-Data Sets 20 Newsgroup –newsgroup messages categorization –only text information available Cora –research paper area categorization –paper abstracts and citation information available DBLP –researchers area prediction –publication and co-authorship network, and publication content –conferences’ areas are known

40/46 Experiments-Baseline Methods (1) Single models –20 Newsgroup: logistic regression, SVM, K-means, min-cut –Cora abstracts, citations (with or without a labeled set) –DBLP publication titles, links (with or without labels from conferences) Proposed method –BGCM –BGCM-L: semi-supervised version combining four models –2-L: two models –3-L: three models

41/46 Experiments-Baseline Methods (2) Ensemble approaches –clustering ensemble on all of the four models- MCLA, HBGF

42/46 Accuracy (1)

43/46 Accuracy (2)

44/46

45/46 Conclusions Ensemble –Combining independent, diversified models improves accuracy –Information explosion, various learning packages available Consensus Maximization –Combine the complementary predictive powers of multiple supervised and unsupervised models –Propagate labeled information between group and object nodes iteratively over a bipartite graph –Two interpretations: constrained embedding and ranking on consensus structure Applications –Multiple source learning, Ranking, Truth Finding……

46/46 Thanks! Any questions?