Presentation is loading. Please wait.

Presentation is loading. Please wait.

PEBL: Web Page Classification without Negative Examples

Similar presentations


Presentation on theme: "PEBL: Web Page Classification without Negative Examples"— Presentation transcript:

1 PEBL: Web Page Classification without Negative Examples
Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti

2 Introduction Web page classification is one of the main techniques for Web mining Constructing a classifier requires positive and negative training examples Cautious to avoid bias and laborious to collect negative training examples

3 Typical Learning Framework

4 Positive Example Base Learning (PEBL) Framework
Learn from positive data and unlabeled data Unlabeled data indicates random samples of the universal set Apply the Mapping-Convergence (M-C) Algorithm

5 Mapping-Convergence (M-C) Algorithm
Divide into 2 stages Mapping stage Use any classifier that does not generate false negatives They chose 1-DNF ( monotone Disjunctive Normal Form) Convergence stage For maximizing margin They chose SVM (Support Vector Machine)

6 Mapping Stage Use a weak classifier to draw an initial approximation of “strong” negative data. First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features. If feature frequency in positive data is larger than one in the universal data, it is a strong positive Filter out any possible positive, leaving only strong negatives.

7 Convergence Stage Use SVM to scope down the class boundary
Iterate SVM for certain times to extract negative data from unlabeled data The boundary will converge into the true boundary.

8 Support Vector Machines
Visualization of a Support Vector Machine

9 Convergence of SVM

10 Data Flow Diagram

11 Experimental Results Report the result with precision-recall breakeven point (P-R) Experiment 1: the Internet Use DMOZ as the universal set Experiment 2: University CS department Use WebKB data set Mixture Models

12 Experiment 1

13 Experiment 2

14 Mixture Models

15 Summary and Conclusions
PEBL framework eliminates the need for manually collecting negative training examples The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM PEBL needs faster training time


Download ppt "PEBL: Web Page Classification without Negative Examples"

Similar presentations


Ads by Google