Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering User Access Patterns on the World-Wide Web

Similar presentations


Presentation on theme: "Discovering User Access Patterns on the World-Wide Web"— Presentation transcript:

1 Discovering User Access Patterns on the World-Wide Web
David W. Cheung, Ben Kao, Joseph Lee

2 Classifying Web Tools Level 0 : Mosaic, Netscape
retrieves documents for a user straight orders Level 1 : Alta Vista provides a user-initiated searching facility Level 2 : WebWatcher, SIFT maintain user profile have an active component for notifying users whenever new relevant information is found

3 Level 3 : DiffAgent, Letizia Level 4
a learning and deductive component of user profiles Level 4 capability of leaning the behavior of both information users and information sources

4 Requirements Discover its users topics of interest automatically
Learn its users access patterns and information sources update patterns Make efficient use of network resources Maintain a database and a full-text index on retrie-ved documents Be compatible with most WWW browsers

5 System Architecture Internet WWW Browser Netscape Suggestion Agent
Search Engine Learning Agent Monitor Agent Document Manager Proxy Server Mosaic Internet Access Log Document Database User Accounts User Profile

6 Learning Agent discovers user access patterns and topics of interest by analyzing the access log created by the Proxy generates a user profile for each user Topics of interest  Search Engine Time-related access patterns  Monitor Agent

7 Discovery of Access Patterns
Log Preprocessing Relevancy Determination Topics Discovery User Access Log Raw Term Vectors Adjusted Term Vectors User Topics

8 Phase 1 process each textual document as recorded in the user access log produce a term vector of (keyword, weght) pairs ex. (NBA, 50), (basketball, 35) a modified formula for TFDIF : weight 계산

9 Phase 2 noise in the raw term vectors
ex. Reference page have a large number of reference hyperlinks use the heuristics to adjust the relevancy of the raw term vectors extracted in the first phase output : a sequence of term vectors with their weights adjusted

10 Heuristics for Phase 2 the number of hyperlinks
the amount of time that a user has spent in a document a browse movement graph a high fan-out number the functional role in a document ex. <TITLE>, <H1> the keywords found in URLs

11 Phase 3 Produce the topics of interests from adjusted term vectors
Clustering technique output : a small number of topic vectors ex. (NBA, basketball, stadium, arena)


Download ppt "Discovering User Access Patterns on the World-Wide Web"

Similar presentations


Ads by Google