Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Introduction Network data Traditional VS Networked data Classification Collective Classification ICA Problem Our Method Collective Inference With Ambiguous Node (CIAN) Experiments Conclusion 2
traditional data: instances are independent of each other network data: instances are maybe related to each other application: s web page paper citation independent related 3
4
traditional VS network data classification 5 F D G E C B H A 1 2 F D G E C B H A 1 2 B Class: 1 2 : Class 1
To classify interrelated instances using content features and link features A 1 B C D E /2 1/ link featureclass1class2class3 Binary Count Proportion /2 1/2 0 D:E: We use :
ICA : Iterative Classification Algorithm 7 Initial : Training local classifier use content features to predict unlabel instances Iterative { for predict each unlabeled instance { set unlabeled instance ’s link feature use local classifier to predict unlabeled instance } } step1 step2
8 Training data: 2/3 0 1/3 1/3 1/3 1/3 1 3 A: A E 1 1 B 1 C D F 1 G 3 H Class : unlabel data: training data: 1/2 1/2 0 1/4 1/2 1/4 2 2 B: 3 I
label the wrong class judge the label with difficulty make a mistake A B 2 2 D C E F G content feature a.b. womanmanage≤20age> class non-smokingsmoking or 2 ? 1
A unlabel data: training data: C A True class : B 1 C 1 2 D Training data 2/3 1/3 0 A: 1 1 E F G B 2 H - Ambiguous C: 1 I 2 J
Make a new prediction for neighbors of unlabeled instance Use probability to compute link feature Retrain the CC classifier 11
compute link feature use probability 12 A ( 1, 80%) ( 2, 60%) ( 3, 70%) Our method: Class 1 : 80/( ) Class 2 : 60/( ) Class 3 : 70/( ) General method : Class 1 : 1/3 Class 2 : 1/3 Class 3 : 1/3
A C 2 2 E F G B 2 H - Ambiguous 22 A True class : B 1 C D To predict unlabeled instance ’s neighbors again. ( 1, 70%) ( 2, 80%) ( 1, 70%) ( 2, 60%) ( 2, 80%) predict again A C 2 2 E F G B 2 H - Noise D ( 1, 70%) ( 2, 80%) ( 1, 70%) ( 2, 90%) ( 2, 80%) predict again B is ambiguous node. B is noise node.
To predict unlabeled instance ’s neighbors again first iteration needs to predict again difference between original and predict label : ▪ This iteration doesn’t to adopt ▪ Next iteration need to predict again similarity between original and predict label : ▪ Average the probability ▪ Next iteration doesn’t need to predict again 14 A 12 ( 1, 80%) ( 2, 60%) ( 2, 80%)( 2, 60%) ( 2, 70%) ( 2, 60%) 1 new prediction B C Example:
15 2 w x y 3 z 1 ( 1, 60%) ( 2, 70%) ( 3, 60%) ( 2, 80%) ( 2, 70%) new prediction ( 2, 75%) ( 3, 60%) ( ?, ??%) link feature Class 1Class 2Class 3 original Method A Method B Method C x: Method A : (1, 50%) Method B : (2, 60%) Method C : (1, 0%) 2 x’s True label : 2 x is ambiguous ( or noise) node: Method B >Method C > Method A x is not ambiguous ( or noise) node: Method A >Method C > Method B Method A & Method B is too extreme. So we choose the Method C. not adopt change class not change class -Ambiguous
16 Accuracy
Retrain CC classifier 17 ( 3, 70%) 1 + A B C D E A 3 B C D E 2 1 ( 1, 90%) ( 2, 60%) ( 2, 70%) ( 1, 80%) retrain Initial ( ICA )
A B C 2 -Ambiguous 2 1 ( 2, 60%) ( 1, 80%) 2 ( 2, 80%) ( 1, 80%) ( 2, 80%) ( 1, 70%) predict again ( 2, 60%) ( 1, 60%) D 22 A True label: B 1 C Training data E F G 1/2 1/2 0 Our: 2 unlabel data: training data: B: ( 1, 60%) ( 2, 80%) 2 ICA:
A B C 2 - Noise 2 1 ( 2, 80%) ( 1, 80%) 2 ( 2, 80%) ( 1, 80%) ( 2, 80%) ( 1, 70%) predict again ( 2, 70%) ( 1, 70%) D 22 A True label: B 1 C Training data E F G 1/2 1/2 0 Our: 2 unlabel data: training data: B: ( 1, 60%) ( 2, 80%) 2 ICA:
CIAN : Collective Inference With Ambiguous Node 20 Initial : Training local classifier use content features to predict unlabel instances Iterative { for predict each unlabel instance { for nb unlabeled instance ’s neighbors{ if(need to predict again) (class label, probability ) = local classifier(nb) } set unlabel instance ’s link feature (class label, probability ) = local classifier(A) } retrain local classifier } step1 step2 step3 step4 step5
21 CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Class labels 7655 Link number Content features Link features 7655
22 CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Max ambiguous nodes (NB) Max ambiguous nodes (SVM) Training data Iteration 5555 fixed argument ‧ Compare with CO 、 ICA 、 CIAN
1. misclassified nodes Proportion of misclassified nodes (0%~30%, 80%) 2. ambiguous nodes NB vs SVM 3. misclassified and Ambiguous nodes Proportion of misclassified and ambiguous nodes (0%~30%, 80%) 4. iteration & stable number of iterations 23
Cora 24 0% % % %
CiteSeer 25 0% % % %
WebKB-texas 26 0% % % % 0.42
WebKB-washington 27 0% % % %
80% of misclassified nodes 28
Cora 29 Max ambiguous nodes : 429Max ambiguous nodes : 356
CiteSeer 30 Max ambiguous nodes : 590Max ambiguous nodes : 365
WebKB-texas 31 Max ambiguous nodes : 52 Max ambiguous nodes : 20
WebKB-washington 32 Max ambiguous nodes : 33 Max ambiguous nodes : 31
CharacteristicsCoraCiteSeerWebKB- texas WebKB- washington Instances Max ambiguous nodes (NB) Max ambiguous nodes (SVM) The same ambiguous nodes The proportion of the same ambiguous nodes (NB) 36.5%27.7%28.8%34% The proportion of the same ambiguous nodes (SVM) 44.1%44.9%75%54.8% 33 ‧ How much the same ambiguous nodes between NB and SVM?
Cora 34 10% % %
CiteSeer 35 10% % %
WebKB-texas 36 10% % % 1.43
WebKB-washington 37 10% % %
80% of misclassified and ambiguous nodes 38
39 ‧ When the accuracy of ICA is lower than CO ?
Cora 40
CiteSeer 41
WebKB-texas 42
WebKB-washington 43