Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date : 2013.01.04 1.

Slides:

Advertisements

Similar presentations

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

Advertisements

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.

Sparsification and Sampling of Networks for Collective Classification

Co Training Presented by: Shankar B S DMML Lab

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.

On-line learning and Boosting

Data Mining Classification: Alternative Techniques

Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.

Ensemble Learning: An Introduction

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Ensemble Learning (2), Tree and Forest

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.

Ensembles of Classifiers Evgueni Smirnov

Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,

Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.

C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :

Date: 2014/05/27 Author: Xiangnan Kong, Bokai Cao, Philip S. Yu Source: KDD’13 Advisor: Jia-ling Koh Speaker: Sheng-Chih Chu Multi-Label Classification.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.

Post-Ranking query suggestion by diversifying search Chao Wang.

Classification Ensemble Methods 1

Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Classification using Co-Training

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.

Exploiting Input Features for Controlling Tunable Approximate Programs Sherry Zhou Department of electronic engineering Tsinghua University.

Ensemble Classifiers.

Machine Learning: Ensemble Methods

Evaluating Classifiers

Sofus A. Macskassy Fetch Technologies

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Cross Domain Distribution Adaptation via Kernel Mapping

Trees, bagging, boosting, and stacking

Source: Procedia Computer Science（2015）70:

CS 4/527: Artificial Intelligence

Massachusetts Institute of Technology

Machine Learning Week 1.

PEBL: Web Page Classification without Negative Examples

An Inteligent System to Diabetes Prediction

Text Categorization Rong Jin.

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun

Using Link Information to Enhance Web Page Classification

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :

 Introduction  Network data  Traditional VS Networked data Classification  Collective Classification  ICA  Problem  Our Method  Collective Inference With Ambiguous Node (CIAN)  Experiments  Conclusion 2

 traditional data:  instances are independent of each other  network data:  instances are maybe related to each other  application:  s  web page  paper citation independent related 3

4

 traditional VS network data classification 5 F D G E C B H A 1 2 F D G E C B H A 1 2 B Class: 1 2 : Class 1

 To classify interrelated instances using content features and link features A 1 B C D E /2 1/ link featureclass1class2class3 Binary Count Proportion /2 1/2 0 D:E: We use :

 ICA : Iterative Classification Algorithm 7 Initial : Training local classifier use content features to predict unlabel instances Iterative { for predict each unlabeled instance { set unlabeled instance ’s link feature use local classifier to predict unlabeled instance } } step1 step2

8 Training data: 2/3 0 1/3 1/3 1/3 1/3 1 3 A: A E 1 1 B 1 C D F 1 G 3 H Class : unlabel data: training data: 1/2 1/2 0 1/4 1/2 1/4 2 2 B: 3 I

 label the wrong class  judge the label with difficulty  make a mistake A B 2 2 D C E F G content feature a.b. womanmanage≤20age> class non-smokingsmoking or 2 ? 1

A unlabel data: training data: C A True class : B 1 C 1 2 D Training data 2/3 1/3 0 A: 1 1 E F G B 2 H - Ambiguous C: 1 I 2 J

 Make a new prediction for neighbors of unlabeled instance  Use probability to compute link feature  Retrain the CC classifier 11

 compute link feature  use probability 12 A ( 1, 80%) ( 2, 60%) ( 3, 70%) Our method: Class 1 : 80/( ) Class 2 : 60/( ) Class 3 : 70/( ) General method : Class 1 : 1/3 Class 2 : 1/3 Class 3 : 1/3

A C 2 2 E F G B 2 H - Ambiguous 22 A True class : B 1 C D To predict unlabeled instance ’s neighbors again. ( 1, 70%) ( 2, 80%) ( 1, 70%) ( 2, 60%) ( 2, 80%) predict again A C 2 2 E F G B 2 H - Noise D ( 1, 70%) ( 2, 80%) ( 1, 70%) ( 2, 90%) ( 2, 80%) predict again B is ambiguous node. B is noise node.

 To predict unlabeled instance ’s neighbors again  first iteration needs to predict again  difference between original and predict label : ▪ This iteration doesn’t to adopt ▪ Next iteration need to predict again  similarity between original and predict label : ▪ Average the probability ▪ Next iteration doesn’t need to predict again 14 A 12 ( 1, 80%) ( 2, 60%) ( 2, 80%)( 2, 60%) ( 2, 70%) ( 2, 60%) 1 new prediction B C Example:

15 2 w x y 3 z 1 ( 1, 60%) ( 2, 70%) ( 3, 60%) ( 2, 80%) ( 2, 70%) new prediction ( 2, 75%) ( 3, 60%) ( ?, ??%) link feature Class 1Class 2Class 3 original Method A Method B Method C x: Method A : (1, 50%) Method B : (2, 60%) Method C : (1, 0%) 2 x’s True label : 2 x is ambiguous ( or noise) node: Method B >Method C > Method A x is not ambiguous ( or noise) node: Method A >Method C > Method B Method A & Method B is too extreme. So we choose the Method C. not adopt change class not change class -Ambiguous

16 Accuracy

 Retrain CC classifier 17 ( 3, 70%) 1 + A B C D E A 3 B C D E 2 1 ( 1, 90%) ( 2, 60%) ( 2, 70%) ( 1, 80%) retrain Initial ( ICA )

A B C 2 -Ambiguous 2 1 ( 2, 60%) ( 1, 80%) 2 ( 2, 80%) ( 1, 80%) ( 2, 80%) ( 1, 70%) predict again ( 2, 60%) ( 1, 60%) D 22 A True label: B 1 C Training data E F G 1/2 1/2 0 Our: 2 unlabel data: training data: B: ( 1, 60%) ( 2, 80%) 2 ICA:

A B C 2 - Noise 2 1 ( 2, 80%) ( 1, 80%) 2 ( 2, 80%) ( 1, 80%) ( 2, 80%) ( 1, 70%) predict again ( 2, 70%) ( 1, 70%) D 22 A True label: B 1 C Training data E F G 1/2 1/2 0 Our: 2 unlabel data: training data: B: ( 1, 60%) ( 2, 80%) 2 ICA:

 CIAN : Collective Inference With Ambiguous Node 20 Initial : Training local classifier use content features to predict unlabel instances Iterative { for predict each unlabel instance { for nb unlabeled instance ’s neighbors{ if(need to predict again) (class label, probability ) = local classifier(nb) } set unlabel instance ’s link feature (class label, probability ) = local classifier(A) } retrain local classifier } step1 step2 step3 step4 step5

21 CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Class labels 7655 Link number Content features Link features 7655

22 CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Max ambiguous nodes (NB) Max ambiguous nodes (SVM) Training data Iteration 5555 fixed argument ‧ Compare with CO 、 ICA 、 CIAN

 1. misclassified nodes  Proportion of misclassified nodes (0%~30%, 80%)  2. ambiguous nodes  NB vs SVM  3. misclassified and Ambiguous nodes  Proportion of misclassified and ambiguous nodes (0%~30%, 80%)  4. iteration & stable  number of iterations 23

 Cora 24 0% % % %

 CiteSeer 25 0% % % %

 WebKB-texas 26 0% % % % 0.42

 WebKB-washington 27 0% % % %

 80% of misclassified nodes 28

 Cora 29 Max ambiguous nodes : 429Max ambiguous nodes : 356

 CiteSeer 30 Max ambiguous nodes : 590Max ambiguous nodes : 365

 WebKB-texas 31 Max ambiguous nodes : 52 Max ambiguous nodes : 20

 WebKB-washington 32 Max ambiguous nodes : 33 Max ambiguous nodes : 31

CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Max ambiguous nodes (NB) Max ambiguous nodes (SVM) The same ambiguous nodes The proportion of the same ambiguous nodes (NB) 36.5%27.7%28.8%34% The proportion of the same ambiguous nodes (SVM) 44.1%44.9%75%54.8% 33 ‧ How much the same ambiguous nodes between NB and SVM?

 Cora 34 10% % %

 CiteSeer 35 10% % %

 WebKB-texas 36 10% % % 1.43

 WebKB-washington 37 10% % %

 80% of misclassified and ambiguous nodes 38

39 ‧ When the accuracy of ICA is lower than CO ?

 Cora 40

 CiteSeer 41

 WebKB-texas 42

 WebKB-washington 43