Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page Rank Modifications & Alternatives Brett Harper.

Similar presentations


Presentation on theme: "Page Rank Modifications & Alternatives Brett Harper."— Presentation transcript:

1 Page Rank Modifications & Alternatives Brett Harper

2 Overview Computing Customized Page Ranks Adaptive Ranking of Web Pages Generalizing PageRank Damping Functions for Link- Based Ranking Algorithms An Approach to Confidence Based Page Ranking for User-Oriented Web Search Web Page Ranking using Link Attributes

3 Computing Customized Page Ranks Page rank usually depends on how related a document is to a query, and the quality of the document. PageRank introduces document authority. Similar to the citation problem. Most proposed web ranking algorithms are based on connectivity rather than content. For customized ranks, the concept of page importance depends on the situation.

4 Computing Customized Page Ranks Current solutions build different ranks for topics, users, or queries. Automatic building of the ranking function from a set of user examples.

5 Computing Customized Page Ranks Brin & Page's PageRank Generalized PageRank, where x is a vector containing ranks, W is an n*n matrix, and e is an n-vector. Parametric PageRank, where the sum of each of the a's is 1.

6 Computing Customized Page Ranks User requirements are represented as an optimization problem where the variables are the user requirements and the total number of constraints. The issue of how to obtain constraints is not discussed. A cost function allows the ranks to be changed in accordance with the requirements. (Quadratic and linear) Methods for infeasible requirements. –Penalty Function –Number of satisfied constraints, in addition to the cost function.

7 Computing Customized Page Ranks WT10G data set –Constraints defined –Adaptive rank computed –Compared to PageRank on entire WT10G dataset

8 Computing Customized Page Ranks

9

10 Adaptive Ranking of Web Pages Alter PageRank by modifying the PageRank equation. Can be done from perspective of the user or web site administrators. Modify rank by changing (1-d) in the original PageRank. –Dynamic Control –Static Control

11 Adaptive Ranking of Web Pages Rules –B is an r*n matrix, b is a rule vector of size r –Inputs and outputs should be positive The cost function allows the rank of certain pages to be modified while keeping the current rank of other pages.

12 Adaptive Ranking of Web Pages Initial solution was to structure the problem as a quadratic programming problem. Second solution uses clusters to reduce the number of dimensions. Pages are clustered based on score Vector E contains k parameters. Vector A is the sum of the columns in (I-dW)^-1 that correspond to a certain class.

13 Adaptive Ranking of Web Pages Vector E contains k parameters. Vector A is the sum of the columns in M that correspond to a certain class. H is defined as BA is the quadratic term is the linear term

14 Adaptive Ranking of Web Pages Contradicting constraints –Relax constraints to arrive at sub-optimal solution –Add s to the cost function (used to balance importance of contraints and original cost function)

15 Adaptive Ranking of Web Pages Use a clustering algorithm to split webpages into clusters. Compute Ai If there is a feasible solution, use the first formula to find the optimal parameters e1,...,ek. If no feasible solution exists, use the version for relaxed constraints to find sub-optimal parameters e1,...,ek. Compute rank as

16 Adaptive Ranking of Web Pages Used the WT10G data set for experiments First experiment: Swap importance of two pages located some distance Δ apart. –Effectively modifies the PageRank –Constraints on highly ranked pages disturbs the rest of the pages more significantly. –These disruptions appear in blocks due to clustering. –When swapping two pages, effect is greater on lower ranked than higher ranked pages. Quality of results is influenced by # of clusters.

17 Adaptive Ranking of Web Pages Second experiment: Change # of clusters –Gradually increase # of clusters used from 5 to 100. –Cost function stops improving at ~60 clusters. –Clustering can reduce the complexity level of the problem. –# of clusters quite small compared to the size of the collection.

18 Adaptive Ranking of Web Pages Clustering techniques –Cluster by score –Cluster by rank (variable-sized cluster dimensions) –Cluster by rank with fixed size cluster dimensions

19 Adaptive Ranking of Web Pages PageRanks can be modified, but constraints on some pages causes the ranks of all pages to be affected. The effect of these constraints depends on how highly ranked the constrained page is.

20 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Damping functions reduce page importance propogation on long paths. Focus on linear, exponential, and hyperbolic decay. Exponential corresponds to original PageRank.

21 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms For functional rankings, a link matrix is used. –Normalization –Dangling nodes If P is the resulting matrix after normalization, the rank is defined as

22 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms An equivalent approach takes into account the branching contribution. Rank of a node is the weighted sum of incoming paths, with weights that decay exponentially with path length. PageRank is a functional ranking where the damping function is (1-α)α^t.

23 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

24 Linear Damping

25 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Hyperbolic Damping

26 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Empirical Damping –Pages that are linked are similar, but the topic changes as the distance increases. –Use decrease in text similarity as an approximation to an empirical damping function. –.uk domain, 18m pages, 200 pages chosen at random, similarity measured using TF.IDF without stemming or stop-word removal –Results show that this is better approximated by linear damping with L=8 or 9 than by exponential damping.

27 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

28 Approximating Hyperbolic with Exponential Damping –Find the α that minimizes the difference of weights for different values of β and the maximum path length l.

29 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Approximating Exponential with Linear Damping –Find the L that minimizes the difference of weights for different values of α and the maximum path length l.

30 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Parameters for the damping function –Characteristic path length (average distance between two nodes) grows sub-logarithmically with the size of the graph. –For a smaller graph, the damping function should decay faster. –The sum of the weights up to the average path lengths of graphs L1 and L2 have to be similar for both rankings to behave in a similar way.

31 Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Experimental Comparison of precision (PageRank vs. LinearRank) –Used the WebTREC Gov2 collection (25m documents,.gov domain, 2004) –Chose 50 queries at random to run. –PageRank took 39 iterations to run. LinearRank was run for 5, 10, and 20 iterations. –After first 5 results, LinearRank had precision similar to PageRank. –Useful when rankings can't be computed in advance.

32 An Approach to Confidence Based Page Ranking for User Oriented Web Search Confidence is the probability of accessing a page for a specific query given past behavior. Use this probability to enhance page rankings of most relevant pages. Should also take link structure into account. Merge pages with similar categories since users lose interest after first few results.

33 An Approach to Confidence Based Page Ranking for User Oriented Web Search Extract important features and categories from web pages. Prune pages from the graph that are not relevant. Calculate confidence for all features and categories of each page. Use citations (link structure) and confidence measure to recursively compute the page rank.

34 An Approach to Confidence Based Page Ranking for User Oriented Web Search Extract important features and categories from web pages. –Search the full-text and extended anchor text for most relevant features/categories. – in the set of features where N(P,i) is the total # of times page P is accessed for query i and O(i) is the total number of queries made for i. –Pages with high E(P,a) will likely be accessed for the topic a.

35 An Approach to Confidence Based Page Ranking for User Oriented Web Search Prune pages from the graph that are not relevant. –Pages without similar features/categories can be connected. –These pages are used for extracting features/ categories, but are pruned if the confidence does not meet a certain threshold. –Citations of pruned pages are also removed.

36 An Approach to Confidence Based Page Ranking for User Oriented Web Search Calculate confidence for all features and categories of each page. – in the customized graph. –Calculating C(a,P) for the entire history is not realistic, so only take recent history into account.

37 An Approach to Confidence Based Page Ranking for User Oriented Web Search Use citations (link structure) and confidence measure to recursively compute the page rank. –PR(P,a) = (1-d) + d[PR(T1,a)/O(T1)+...+ PR(Tn,a)/O(Tn)], where Ti is a citing page and O(Ti) is the # of outgoing links. –RPR(P,a) = PR(P,a) * C(a,P) –New pages cited by many many relevant high-ranked pages. Can be suppressed by including a time period. –Substitute damping factor d with (1-C(a,P))

38 An Approach to Confidence Based Page Ranking for User Oriented Web Search The data set was constructed from a list of 7 queries, from which the top 30 results were obtained from Google. A graph of these nodes was then created, and further expanded to a depth of 2. This new graph contained 500-800 nodes. Higher ranked pages are not always accessed a higher number of times. Pages can be accessed for multiple queries. Pages with higher confidence tend to be ranked higher.

39 Web Page Ranking using Link Attributes Tries to improve on current ranking techniques by assigning different weights to links. (WLRank) Relative position in the page Tag where the link is contained Length of anchor text

40 Web Page Ranking using Link Attributes L(j,i) is 1 if a link exists or 0 otherwise, and c is a constant that gives a base weight to every link T(j,i) depends on the tag AL(j,i) is length of anchor text divided by average anchor text length d. RP(j,i) is the relative position weighted by constant b. If W(j,i) = L(j,i) then it is equal to PageRank.

41 Web Page Ranking using Link Attributes Tested against 460k pages in the.CL domain. Several users provided relevance judgements on the first 10 results of several queries. Used c=1, b=1, and d=100. Only used weights for and tags. Compare precision based on a perfect ranking for the first 10 answers. Improvement of 13% on average.

42 Web Page Ranking using Link Attributes

43 Conclusions PageRank can be modified to fit user requirements and specific categories. Different functions can be used to decay PageRank influence on path lengths. Can improve PageRank through clustering.

44 References Tsoi, A. C., Hagenbuchner, M., and Scarselli, F. 2006. Computing customized page ranks. ACM Trans. Interet Technol. 6, 4 (Nov. 2006), 381-414. Tsoi, A. C., Morini, G., Scarselli, F., Hagenbuchner, M., and Maggini, M. 2003. Adaptive ranking of web pages. In Proceedings of the 12th international Conference on World Wide Web (Budapest, Hungary, May 20 - 24, 2003). WWW '03. ACM, New York, NY, 356-365. Baeza-Yates, R., Boldi, P., and Castillo, C. 2006. Generalizing PageRank: damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, USA, August 06 - 11, 2006). SIGIR '06. ACM, New York, NY, 308-315. Mukhopadhyay, D., Giri, D., and Singh, S. R. 2003. An approach to confidence based page ranking for user oriented Web search. SIGMOD Rec. 32, 2 (Jun. 2003), 28-33. Baeza-Yates, R. and Davis, E. 2004. Web page ranking using link attributes. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May 19 - 21, 2004). WWW Alt. '04. ACM, New York, NY, 328-329.

45 Questions


Download ppt "Page Rank Modifications & Alternatives Brett Harper."

Similar presentations


Ads by Google