Page Rank Modifications & Alternatives Brett Harper.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
A Quality Focused Crawler for Health Information Tim Tang.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
The PageRank Citation Ranking “Bringing Order to the Web”
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Google and the Page Rank Algorithm Székely Endre
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Web- and Multimedia-based Information Systems Lecture 2.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
1 CS 430: Information Discovery Lecture 5 Ranking.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Clustering of Web pages
HITS Hypertext-Induced Topic Selection
The minimum cost flow problem
CSE 454 Advanced Internet Systems University of Washington
A Comparative Study of Link Analysis Algorithms
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
CS 440 Database Management Systems
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Feature Selection for Ranking
Learning to Rank with Ties
Presentation transcript:

Page Rank Modifications & Alternatives Brett Harper

Overview Computing Customized Page Ranks Adaptive Ranking of Web Pages Generalizing PageRank Damping Functions for Link- Based Ranking Algorithms An Approach to Confidence Based Page Ranking for User-Oriented Web Search Web Page Ranking using Link Attributes

Computing Customized Page Ranks Page rank usually depends on how related a document is to a query, and the quality of the document. PageRank introduces document authority. Similar to the citation problem. Most proposed web ranking algorithms are based on connectivity rather than content. For customized ranks, the concept of page importance depends on the situation.

Computing Customized Page Ranks Current solutions build different ranks for topics, users, or queries. Automatic building of the ranking function from a set of user examples.

Computing Customized Page Ranks Brin & Page's PageRank Generalized PageRank, where x is a vector containing ranks, W is an n*n matrix, and e is an n-vector. Parametric PageRank, where the sum of each of the a's is 1.

Computing Customized Page Ranks User requirements are represented as an optimization problem where the variables are the user requirements and the total number of constraints. The issue of how to obtain constraints is not discussed. A cost function allows the ranks to be changed in accordance with the requirements. (Quadratic and linear) Methods for infeasible requirements. –Penalty Function –Number of satisfied constraints, in addition to the cost function.

Computing Customized Page Ranks WT10G data set –Constraints defined –Adaptive rank computed –Compared to PageRank on entire WT10G dataset

Computing Customized Page Ranks

Adaptive Ranking of Web Pages Alter PageRank by modifying the PageRank equation. Can be done from perspective of the user or web site administrators. Modify rank by changing (1-d) in the original PageRank. –Dynamic Control –Static Control

Adaptive Ranking of Web Pages Rules –B is an r*n matrix, b is a rule vector of size r –Inputs and outputs should be positive The cost function allows the rank of certain pages to be modified while keeping the current rank of other pages.

Adaptive Ranking of Web Pages Initial solution was to structure the problem as a quadratic programming problem. Second solution uses clusters to reduce the number of dimensions. Pages are clustered based on score Vector E contains k parameters. Vector A is the sum of the columns in (I-dW)^-1 that correspond to a certain class.

Adaptive Ranking of Web Pages Vector E contains k parameters. Vector A is the sum of the columns in M that correspond to a certain class. H is defined as BA is the quadratic term is the linear term

Adaptive Ranking of Web Pages Contradicting constraints –Relax constraints to arrive at sub-optimal solution –Add s to the cost function (used to balance importance of contraints and original cost function)

Adaptive Ranking of Web Pages Use a clustering algorithm to split webpages into clusters. Compute Ai If there is a feasible solution, use the first formula to find the optimal parameters e1,...,ek. If no feasible solution exists, use the version for relaxed constraints to find sub-optimal parameters e1,...,ek. Compute rank as

Adaptive Ranking of Web Pages Used the WT10G data set for experiments First experiment: Swap importance of two pages located some distance Δ apart. –Effectively modifies the PageRank –Constraints on highly ranked pages disturbs the rest of the pages more significantly. –These disruptions appear in blocks due to clustering. –When swapping two pages, effect is greater on lower ranked than higher ranked pages. Quality of results is influenced by # of clusters.

Adaptive Ranking of Web Pages Second experiment: Change # of clusters –Gradually increase # of clusters used from 5 to 100. –Cost function stops improving at ~60 clusters. –Clustering can reduce the complexity level of the problem. –# of clusters quite small compared to the size of the collection.

Adaptive Ranking of Web Pages Clustering techniques –Cluster by score –Cluster by rank (variable-sized cluster dimensions) –Cluster by rank with fixed size cluster dimensions

Adaptive Ranking of Web Pages PageRanks can be modified, but constraints on some pages causes the ranks of all pages to be affected. The effect of these constraints depends on how highly ranked the constrained page is.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Damping functions reduce page importance propogation on long paths. Focus on linear, exponential, and hyperbolic decay. Exponential corresponds to original PageRank.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms For functional rankings, a link matrix is used. –Normalization –Dangling nodes If P is the resulting matrix after normalization, the rank is defined as

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms An equivalent approach takes into account the branching contribution. Rank of a node is the weighted sum of incoming paths, with weights that decay exponentially with path length. PageRank is a functional ranking where the damping function is (1-α)α^t.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

Linear Damping

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Hyperbolic Damping

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Empirical Damping –Pages that are linked are similar, but the topic changes as the distance increases. –Use decrease in text similarity as an approximation to an empirical damping function. –.uk domain, 18m pages, 200 pages chosen at random, similarity measured using TF.IDF without stemming or stop-word removal –Results show that this is better approximated by linear damping with L=8 or 9 than by exponential damping.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

Approximating Hyperbolic with Exponential Damping –Find the α that minimizes the difference of weights for different values of β and the maximum path length l.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Approximating Exponential with Linear Damping –Find the L that minimizes the difference of weights for different values of α and the maximum path length l.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Parameters for the damping function –Characteristic path length (average distance between two nodes) grows sub-logarithmically with the size of the graph. –For a smaller graph, the damping function should decay faster. –The sum of the weights up to the average path lengths of graphs L1 and L2 have to be similar for both rankings to behave in a similar way.

Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms Experimental Comparison of precision (PageRank vs. LinearRank) –Used the WebTREC Gov2 collection (25m documents,.gov domain, 2004) –Chose 50 queries at random to run. –PageRank took 39 iterations to run. LinearRank was run for 5, 10, and 20 iterations. –After first 5 results, LinearRank had precision similar to PageRank. –Useful when rankings can't be computed in advance.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Confidence is the probability of accessing a page for a specific query given past behavior. Use this probability to enhance page rankings of most relevant pages. Should also take link structure into account. Merge pages with similar categories since users lose interest after first few results.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Extract important features and categories from web pages. Prune pages from the graph that are not relevant. Calculate confidence for all features and categories of each page. Use citations (link structure) and confidence measure to recursively compute the page rank.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Extract important features and categories from web pages. –Search the full-text and extended anchor text for most relevant features/categories. – in the set of features where N(P,i) is the total # of times page P is accessed for query i and O(i) is the total number of queries made for i. –Pages with high E(P,a) will likely be accessed for the topic a.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Prune pages from the graph that are not relevant. –Pages without similar features/categories can be connected. –These pages are used for extracting features/ categories, but are pruned if the confidence does not meet a certain threshold. –Citations of pruned pages are also removed.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Calculate confidence for all features and categories of each page. – in the customized graph. –Calculating C(a,P) for the entire history is not realistic, so only take recent history into account.

An Approach to Confidence Based Page Ranking for User Oriented Web Search Use citations (link structure) and confidence measure to recursively compute the page rank. –PR(P,a) = (1-d) + d[PR(T1,a)/O(T1)+...+ PR(Tn,a)/O(Tn)], where Ti is a citing page and O(Ti) is the # of outgoing links. –RPR(P,a) = PR(P,a) * C(a,P) –New pages cited by many many relevant high-ranked pages. Can be suppressed by including a time period. –Substitute damping factor d with (1-C(a,P))

An Approach to Confidence Based Page Ranking for User Oriented Web Search The data set was constructed from a list of 7 queries, from which the top 30 results were obtained from Google. A graph of these nodes was then created, and further expanded to a depth of 2. This new graph contained nodes. Higher ranked pages are not always accessed a higher number of times. Pages can be accessed for multiple queries. Pages with higher confidence tend to be ranked higher.

Web Page Ranking using Link Attributes Tries to improve on current ranking techniques by assigning different weights to links. (WLRank) Relative position in the page Tag where the link is contained Length of anchor text

Web Page Ranking using Link Attributes L(j,i) is 1 if a link exists or 0 otherwise, and c is a constant that gives a base weight to every link T(j,i) depends on the tag AL(j,i) is length of anchor text divided by average anchor text length d. RP(j,i) is the relative position weighted by constant b. If W(j,i) = L(j,i) then it is equal to PageRank.

Web Page Ranking using Link Attributes Tested against 460k pages in the.CL domain. Several users provided relevance judgements on the first 10 results of several queries. Used c=1, b=1, and d=100. Only used weights for and tags. Compare precision based on a perfect ranking for the first 10 answers. Improvement of 13% on average.

Web Page Ranking using Link Attributes

Conclusions PageRank can be modified to fit user requirements and specific categories. Different functions can be used to decay PageRank influence on path lengths. Can improve PageRank through clustering.

References Tsoi, A. C., Hagenbuchner, M., and Scarselli, F Computing customized page ranks. ACM Trans. Interet Technol. 6, 4 (Nov. 2006), Tsoi, A. C., Morini, G., Scarselli, F., Hagenbuchner, M., and Maggini, M Adaptive ranking of web pages. In Proceedings of the 12th international Conference on World Wide Web (Budapest, Hungary, May , 2003). WWW '03. ACM, New York, NY, Baeza-Yates, R., Boldi, P., and Castillo, C Generalizing PageRank: damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, USA, August , 2006). SIGIR '06. ACM, New York, NY, Mukhopadhyay, D., Giri, D., and Singh, S. R An approach to confidence based page ranking for user oriented Web search. SIGMOD Rec. 32, 2 (Jun. 2003), Baeza-Yates, R. and Davis, E Web page ranking using link attributes. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May , 2004). WWW Alt. '04. ACM, New York, NY,

Questions