Mining Web Logs to improve Website Organization Ramakrishnan Srikant and Yinghui Yang Professor :Wan-Shiou Yang The algorithm to automatically find pages in a website whose location is different from where visitors expect to find them
Introduction The Key insight is that visitors will backtrack if they don ’ t find the page where they expect it: The point from where they backtrack is the expected location for this page. The point from where they backtrack is the expected location for this page. Expected locations with a significant number of hits are presented to Website administrator for adding navigation links from the expected location to the target page. Expected locations with a significant number of hits are presented to Website administrator for adding navigation links from the expected location to the target page.
Model of Visitor Search pattern
Identifying Target Pages Analysis the present famous Websites Amazon :There is a clear separation between content pages and index such as Reference itemsets Amazon :There is a clear separation between content pages and index such as Reference itemsets Yahoo: List website on the internal nodes of its hierarchy, not just on the leaf nodes. Yahoo: List website on the internal nodes of its hierarchy, not just on the leaf nodes.
Website & Search Pattern
Test Find Expected Location For i:= 2 to n-2 begin If ((Pi-1)=(Pi+1) or (no link from Pi to Pi=1) Add Pi to B When i=2 P1=P3 or no link P2 -> P3,P2=>X B i=3 P2=P4 or no link P3 -> P4,P3=> B i=4 P3=P5 or no link P4 -> P5,P4=>X B i=5 P4=P6 or no link P5 -> P6,P5=> B i=6 P5=P7 or no link P6 -> P7,P6=> B
Algorithm: Find Expected Location
Limitations When the website doesn ’ t have a clear separation between content and index page, it can hard to distinguish target pages and other pages. When the website doesn ’ t have a clear separation between content and index page, it can hard to distinguish target pages and other pages. Another limitation is that only people who can successfully find a target page will generate an expected location for that page. Another limitation is that only people who can successfully find a target page will generate an expected location for that page.
Optimizing The Set of Navigation Links We consider three approaches for recommending additional links to the web site administrator We consider three approaches for recommending additional links to the web site administrator 1. FirstOnly: Easy and Simple. 2. OptimizeBenefit: Order and all elements. 3. OptimizeTime: Reduce time for both.
FirstOnly The algorithm recommends the frequency first expected locations (the page that occur frequency in ^E1) to the website administrator, ignoring any subsequent expected locations the visitor may have considered. The algorithm recommends the frequency first expected locations (the page that occur frequency in ^E1) to the website administrator, ignoring any subsequent expected locations the visitor may have considered. Disadvantage: It Just satisfied with information a little of people needed Disadvantage: It Just satisfied with information a little of people needed
Example FirstOnly
Algorithm: FirstOnly
OptimizeBenefit The is a greedy algorithm that attempt to maximize the benefit to the website of adding additional links. The is a greedy algorithm that attempt to maximize the benefit to the website of adding additional links. In each pass, it find the page with the maximum benefit. In each pass, it find the page with the maximum benefit. adds it to the set of recommendations. adds it to the set of recommendations. null out all instances of this page and succeeding page, and recomputes the benefit. null out all instances of this page and succeeding page, and recomputes the benefit.
Example OptimizeBenefit
Algorithm: OptimizeBenefit
OptimizeTime The goal of the algorithm is to minimize the number of backtrack the visitor has to make. The goal of the algorithm is to minimize the number of backtrack the visitor has to make. Saving time for each record (person) makes good performance for website. Saving time for each record (person) makes good performance for website. The algorithm also a greedy search,and is quit similar to OptimizeBenefit. The algorithm also a greedy search,and is quit similar to OptimizeBenefit.
Example OptimizeTime
Algorithm: OptimizeTime
Algorithm: OptimizeTime&Profit We can emphasize that adding Pi_num of the special recommend from Webdesigner view. We can emphasize that adding Pi_num of the special recommend from Webdesigner view. P:=Page with highest support from Timesaved (Pi) * Pi_num P:=Page with highest support from Timesaved (Pi) * Pi_num We can get the list of recommendations with Web-designer focus. We can get the list of recommendations with Web-designer focus.