Download presentation
Presentation is loading. Please wait.
Published byJulius Gaines Modified over 6 years ago
1
Discovering User Access Patterns on the World-Wide Web
David W. Cheung, Ben Kao, Joseph Lee
2
Classifying Web Tools Level 0 : Mosaic, Netscape
retrieves documents for a user straight orders Level 1 : Alta Vista provides a user-initiated searching facility Level 2 : WebWatcher, SIFT maintain user profile have an active component for notifying users whenever new relevant information is found
3
Level 3 : DiffAgent, Letizia Level 4
a learning and deductive component of user profiles Level 4 capability of leaning the behavior of both information users and information sources
4
Requirements Discover its users topics of interest automatically
Learn its users access patterns and information sources update patterns Make efficient use of network resources Maintain a database and a full-text index on retrie-ved documents Be compatible with most WWW browsers
5
System Architecture Internet WWW Browser Netscape Suggestion Agent
Search Engine Learning Agent Monitor Agent Document Manager Proxy Server Mosaic Internet Access Log Document Database User Accounts User Profile
6
Learning Agent discovers user access patterns and topics of interest by analyzing the access log created by the Proxy generates a user profile for each user Topics of interest Search Engine Time-related access patterns Monitor Agent
7
Discovery of Access Patterns
Log Preprocessing Relevancy Determination Topics Discovery User Access Log Raw Term Vectors Adjusted Term Vectors User Topics
8
Phase 1 process each textual document as recorded in the user access log produce a term vector of (keyword, weght) pairs ex. (NBA, 50), (basketball, 35) a modified formula for TFDIF : weight 계산
9
Phase 2 noise in the raw term vectors
ex. Reference page have a large number of reference hyperlinks use the heuristics to adjust the relevancy of the raw term vectors extracted in the first phase output : a sequence of term vectors with their weights adjusted
10
Heuristics for Phase 2 the number of hyperlinks
the amount of time that a user has spent in a document a browse movement graph a high fan-out number the functional role in a document ex. <TITLE>, <H1> the keywords found in URLs
11
Phase 3 Produce the topics of interests from adjusted term vectors
Clustering technique output : a small number of topic vectors ex. (NBA, basketball, stadium, arena)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.