Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004
Motivation Several new websites are launched everyday Need to search fast and efficiently Search engines organize websites under topic hierarchy (taxonomy) Need a classifier: one-against-all SVM Catch: huge negative data increased training time
Negative Data Selection Support vectors in the negative data are much similar to the positive data than the other negative data
Negative Data Selection 1.Feature Selection: top n keywords from the positive data 2.All websites are represented as vectors of these top n keywords. 3.Cosine Similarity:
Negative Data Selection Plot similarity scores of negative to positive documents in descending order with negative documents Similarity Scores in Descending order Negative Documents Convergence Point
Experiments Reuters dataset (10802 training, 565 test) ClassNumber of Positive Data Number of Negative Data Crude Trade Dlr Nat-gas Acq
Experiments