Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Sparse vs. Ensemble Approaches to Supervised Learning
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Chapter 5 Data mining : A Closer Look.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Social Network Analysis via Factor Graph Model
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
On Node Classification in Dynamic Content-based Networks.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Unsupervised Streaming Feature Selection in Social Media
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Exploring Social Tagging Graph for Web Object Classification
Semi-Supervised Clustering
Sofus A. Macskassy Fetch Technologies
Constrained Clustering -Semi Supervised Clustering-
Cross Domain Distribution Adaptation via Kernel Mapping
Vincent Granville, Ph.D. Co-Founder, DSC
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
CS7280: Special Topics in Data Mining Information/Social Networks
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Community Distribution Outliers in Heterogeneous Information Networks
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Jiawei Han Department of Computer Science
Knowledge Transfer via Multiple Model Local Structure Mapping
Semi-Supervised Learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center KDD’09 Paris, France

2/24 Information Explosion Fan Site Descriptions Pictures Videos Not only at scale, but also at available sources! Blogs descriptions reviews

3/24 Multiple Source Classification Image CategorizationLike? Dislike?Research Area images, descriptions, notes, comments, albums, tags……. movie genres, cast, director, plots……. users viewing history, movie ratings… publication and co- authorship network, published papers, …….

4/24 Model Combination helps! Some areas share similar keywords People may publish in relevant but different areas There may be cross- discipline co-operations supervised unsupervised Supervised or unsupervised

5/24 Motivation Multiple sources provide complementary information –We may want to use all of them to derive better classification solution Concatenation of information sources is impossible –Information sources have different formats –May only have access to classification or clustering results due to privacy issues Ensemble of supervised and unsupervised models –Combine their outputs on the same set of objects –Derive a consolidated solution –Reduce errors made by individual models –More robust and stable

6/24 Consensus Learning

7/24 Problem Formulation Principles –Consensus: maximize agreement among supervised and unsupervised models –Constraints: Label predictions should be close to the outputs of the supervised models Objective function ConsensusConstraints NP-hard!

8/24 Methodology Step 1: Group-level predictions Step 2: Combine multiple models using local weights How to propagate and negotiate? How to compute local model weights?

9/24 Group-level Predictions (1) Groups: –similarity: percentage of common members –initial labeling: category information from supervised models

10/24 Group-level Predictions (2) Principles –Conditional probability estimates smooth over the graph –Not deviate too much from the initial labeling [ ] [ ] Labeled nodes Unlabeled nodes

11/24 Local Weighting Scheme (1) Principles –If M makes more accurate prediction on x, M’s weight on x should be higher Difficulties –“unsupervised” model combination—cannot use cross-validation

12/24 Local Weighting Scheme (2) Method –Consensus To compute M i ’s weight on x, use M 1,…, M i-1, M i+1, …,M r as the true model, and compute the average accuracy Use consistency in x’s neighbors’ label predictions between two models to approximate accuracy

13/24 Experiments-Data Sets 20 Newsgroup –newsgroup messages categorization –only text information available Cora –research paper area categorization –paper abstracts and citation information available DBLP –researchers area prediction –publication and co-authorship network, and publication content –conferences’ areas are known Yahoo! Movie –user viewing interest analysis (favored movie types) –movie ratings and synopses –movie genres are known

14/24 Experiments-Baseline Methods Single models –logistic regression, SVM, K-means, min-cut Ensemble approaches –majority-voting classification ensemble –majority-voting clustering ensemble –clustering ensemble on all of the four models

15/24 Empirical Results -Accuracy

16/24 Conclusions Summary –We propose to integrate multiple information sources for better classification –We study the problem of consolidating outputs from multiple supervised and unsupervised models –The proposed two-step algorithm solve the problem by propagating and negotiating among multiple models –The algorithm runs in linear time. –Results on various data sets show the improvements

17/24 Thanks! Any questions?