Download presentation
Presentation is loading. Please wait.
Published byJody Ball Modified over 9 years ago
1
Mining Web Logs to improve Website Organization Ramakrishnan Srikant and Yinghui Yang Professor :Wan-Shiou Yang The algorithm to automatically find pages in a website whose location is different from where visitors expect to find them
2
Introduction The Key insight is that visitors will backtrack if they don ’ t find the page where they expect it: The point from where they backtrack is the expected location for this page. The point from where they backtrack is the expected location for this page. Expected locations with a significant number of hits are presented to Website administrator for adding navigation links from the expected location to the target page. Expected locations with a significant number of hits are presented to Website administrator for adding navigation links from the expected location to the target page.
3
Model of Visitor Search pattern
4
Identifying Target Pages Analysis the present famous Websites Amazon :There is a clear separation between content pages and index such as Reference itemsets Amazon :There is a clear separation between content pages and index such as Reference itemsets Yahoo: List website on the internal nodes of its hierarchy, not just on the leaf nodes. Yahoo: List website on the internal nodes of its hierarchy, not just on the leaf nodes.
5
Website & Search Pattern
6
Test Find Expected Location For i:= 2 to n-2 begin If ((Pi-1)=(Pi+1) or (no link from Pi to Pi=1) Add Pi to B When i=2 P1=P3 or no link P2 -> P3,P2=>X B i=3 P2=P4 or no link P3 -> P4,P3=> B i=4 P3=P5 or no link P4 -> P5,P4=>X B i=5 P4=P6 or no link P5 -> P6,P5=> B i=6 P5=P7 or no link P6 -> P7,P6=> B
7
Algorithm: Find Expected Location
8
Limitations When the website doesn ’ t have a clear separation between content and index page, it can hard to distinguish target pages and other pages. When the website doesn ’ t have a clear separation between content and index page, it can hard to distinguish target pages and other pages. Another limitation is that only people who can successfully find a target page will generate an expected location for that page. Another limitation is that only people who can successfully find a target page will generate an expected location for that page.
9
Optimizing The Set of Navigation Links We consider three approaches for recommending additional links to the web site administrator We consider three approaches for recommending additional links to the web site administrator 1. FirstOnly: Easy and Simple. 2. OptimizeBenefit: Order and all elements. 3. OptimizeTime: Reduce time for both.
10
FirstOnly The algorithm recommends the frequency first expected locations (the page that occur frequency in ^E1) to the website administrator, ignoring any subsequent expected locations the visitor may have considered. The algorithm recommends the frequency first expected locations (the page that occur frequency in ^E1) to the website administrator, ignoring any subsequent expected locations the visitor may have considered. Disadvantage: It Just satisfied with information a little of people needed Disadvantage: It Just satisfied with information a little of people needed
11
Example FirstOnly
12
Algorithm: FirstOnly
13
OptimizeBenefit The is a greedy algorithm that attempt to maximize the benefit to the website of adding additional links. The is a greedy algorithm that attempt to maximize the benefit to the website of adding additional links. In each pass, it find the page with the maximum benefit. In each pass, it find the page with the maximum benefit. adds it to the set of recommendations. adds it to the set of recommendations. null out all instances of this page and succeeding page, and recomputes the benefit. null out all instances of this page and succeeding page, and recomputes the benefit.
14
Example OptimizeBenefit
15
Algorithm: OptimizeBenefit
16
OptimizeTime The goal of the algorithm is to minimize the number of backtrack the visitor has to make. The goal of the algorithm is to minimize the number of backtrack the visitor has to make. Saving time for each record (person) makes good performance for website. Saving time for each record (person) makes good performance for website. The algorithm also a greedy search,and is quit similar to OptimizeBenefit. The algorithm also a greedy search,and is quit similar to OptimizeBenefit.
17
Example OptimizeTime
18
Algorithm: OptimizeTime
19
Algorithm: OptimizeTime&Profit We can emphasize that adding Pi_num of the special recommend from Webdesigner view. We can emphasize that adding Pi_num of the special recommend from Webdesigner view. P:=Page with highest support from Timesaved (Pi) * Pi_num P:=Page with highest support from Timesaved (Pi) * Pi_num We can get the list of recommendations with Web-designer focus. We can get the list of recommendations with Web-designer focus.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.