Download presentation
Presentation is loading. Please wait.
1
Mark Chavira Ulises Robles
Classifying Web Pages Mark Chavira Ulises Robles
2
Motivation World Wide Web is huge.
Computers help some, but not enough. Would like computers to help more: “Who is the president of Stanford University?” Problem: WWW designed for human understanding.
3
Project Highlights Demonstrate a simple way by which knowledge may be extracted from the Web. Classify Web pages from Computer Science Departments. Learner: Naive Bayes. Features: word counts. Ran 60 experiments, each using different values for various parameters.
4
Data Set
5
Some Parameters Which words do we count? Select words using: Pointwise Mutual Information vs. Average Mutual Information vs. X2 What form do feature values take? “raw” word counts vs. word counts normalized for page length.
6
Number of Experiments (5 data sets) * (2 Feature Types) *
(3 Feature Selection Techniques) * (2 Normalization Methods) = 60 Experiments.
7
Results
8
Results (cont.)
9
Total Results
10
Best Results: 85% Correct Classification Using:
Feature Selection: Pointwise Mutual Information Normalization: Normalized for Document Length
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.