Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artface (Automated reorganization to fit approximate client expectations) Mike Venzke 9/19/2018.

Similar presentations


Presentation on theme: "Artface (Automated reorganization to fit approximate client expectations) Mike Venzke 9/19/2018."— Presentation transcript:

1 Artface (Automated reorganization to fit approximate client expectations)
Mike Venzke 9/19/2018

2 Artface Goals Provide a method for determining the approximate expectation of a web client Examine feasibility of using this information in an automated manner 9/19/2018

3 Description Using Open Directory categories, create a model for classifying web pages. Fetch, parse, and classify the referring page of local web hits. As a result, have the approximate expectations people have when they go to different parts of your website. 9/19/2018

4 Classification Categories
Used DMOZ categories Already classified web pages; provides good training data. Went 3 levels deep in directory Wanted to get approximate expectation, not so specific that very similar items are considered different. Time and constraints 9/19/2018

5 Page Fetching Used Python SGMLParser module
Good at parsing out irrelevant data Fast enough Easy to use 9/19/2018

6 Classification Rainbow – LGPL’d Naïve Bayesian text classifier
Used ~ 9000 documents as training data, with expanded category as classification. ~7000 test pages taken from web logs of and 9/19/2018

7 Data Results Fairly accurate results http://webgraph.canbelearned.com
9/19/2018

8 Automation Possibilities
Determine ‘good’ categories by self-site classification or user input Track traffic from ‘good’ categories and provide higher-level links to local pages. Set of bad categories is small and generally universal. Take action against local sites based on how they’re being used, not what they have. 9/19/2018

9 Automation Possibilities (contd)
Provide custom pages based on what user expected, rather than what page contains. May not have found what they wanted. May be interested in a more broad topic. 9/19/2018

10 Process Enhancement Ideas
More training data Use all levels of DMOZ data, but push classification up to threshold level. Handle more page errors Scripting, authentication errors provide false data. Remove or special-parse ‘classless’ information pages Search engines 9/19/2018


Download ppt "Artface (Automated reorganization to fit approximate client expectations) Mike Venzke 9/19/2018."

Similar presentations


Ads by Google