Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mark Chavira Ulises Robles

Similar presentations


Presentation on theme: "Mark Chavira Ulises Robles"— Presentation transcript:

1 Mark Chavira Ulises Robles
Classifying Web Pages Mark Chavira Ulises Robles

2 Motivation World Wide Web is huge.
Computers help some, but not enough. Would like computers to help more: “Who is the president of Stanford University?” Problem: WWW designed for human understanding.

3 Project Highlights Demonstrate a simple way by which knowledge may be extracted from the Web. Classify Web pages from Computer Science Departments. Learner: Naive Bayes. Features: word counts. Ran 60 experiments, each using different values for various parameters.

4 Data Set

5 Some Parameters Which words do we count? Select words using: Pointwise Mutual Information vs. Average Mutual Information vs. X2 What form do feature values take? “raw” word counts vs. word counts normalized for page length.

6 Number of Experiments (5 data sets) * (2 Feature Types) *
(3 Feature Selection Techniques) * (2 Normalization Methods) = 60 Experiments.

7 Results

8 Results (cont.)

9 Total Results

10 Best Results: 85% Correct Classification Using:
Feature Selection: Pointwise Mutual Information Normalization: Normalized for Document Length


Download ppt "Mark Chavira Ulises Robles"

Similar presentations


Ads by Google