Methods and Apparatus for Ranking Web Page Search Results Alireza Abbasi abbasi@snu.ac.kr 406.534 Industrial Information Technology Patent Presentation Information Technology Policy Program Seoul National University
Bibliographic Details Title: Methods and Apparatus for Ranking Web Page Search Results Inventor: Andrei Z. Broader, Menlo Park, CA Assignee: Alta Vista Company, Palo Alto, CA (US) Filed Date: Oct. 25 2000 Bibliographic Details
Context Abstract Background of Invention Summery of Invention Field of invention Related Art Summery of Invention Advantages & Applications Detailed Description Context
Abstract To Rank the quality of page Forming a linear combination of two/more matrices Using the coefficients of the eigenvector of the resulting matrix Abstract
Computerized Information Retrieval Invention Relates generally to Computerized Information Retrieval Identifying related pages in a hyperlinked database environment (Web) Challenge is to retrieve only the most relevant resources to the query Field of Invention
Related Art (1) Kleinberg Algorithm: Identify ‘Hub’ and ‘Authority’ pages in a neighborhood graph for a query Determines related pages starting with a single page. Problems: Doesn’t deal with popular URLs Doesn’t analyze contents of pages when it is computing the most related pages Related Art (1)
Related Art (2) Google Search Engine: Use ‘PageRank’ to prioritize the results. PageRank Page A has pages T1…Tn which point to it d: [0…1] damping factor C(A): # of pages going out of page A PR(A)=(1-d) + d( PR(T1)/C(T1)+…+PR(Tn)/C(Tn) ) Sum of all pages’ PageRank = 1 Problems: Ranking is independent of the search query No provision for externally evaluating sites Related Art (2)
R. Lempel, S. Moran ‘SALSA’ (Stochastic Approach for Link-Structure Analysis) Replace Kleinberg’s Mutual Reinforcement by stochastic method the coupling between hubs & authorities is less tight Based upon Markov chains Related Art (3)
Summery of Invention (1) Invention provides “a method whereby a linear combination of matrices (pages’ information) can be used to rank the pages” Highly relevant results to the user’ search The coefficients of the eigenvector provide a measure of the quality of each page in related to the other pages. Determining ranking categories based on # of pages to be ranked Classifying each page in one of the categories Summery of Invention (1)
Summery of Invention (2) A fixed amount of storage for representing the rank of each page Each bit represents one of the categories Bit assigned to the page: The rank of each page The eigenvector coefficients of neighboring pages can be used to generate a hub score => Small amount of storage and computational resources Summery of Invention (2)
Fig 1. block diagram of a hyperlinked environment Detailed Description Improve Ranking Method Fig 1. block diagram of a hyperlinked environment
A flow diagram of a method for ranking pages
Determine Matrices to Include in Leaner Combination 202 An example of matrices that maybe used in a method
Building Neighborhood Graph Assume that: Related pages will tend to be ‘near’ the selected page The same keywords appear as part of the content of related pages. an initial page is selected, page linked to that are represented as a graph in a memory Patent ‘Method for identification related page in a hyperlinked database’ Building Neighborhood Graph
Building Adjacency Matrix - a collection ‘C’ of web-sites - a given topic ‘t’ - a root set ‘S’ of sites - Search engine query ‘q’ From S a base C which consist of Sites in the root set S which point a site in S By using a search engine that stores linkage information Patent ‘Web Page Connectivity Server’ Which are pointed by a set in S C and its link structure directed Graph G Directed edge ‘ij’ appears in G, if site ‘i‘ consist of a link to site ‘j’ |C|*|C| is adjacency matrix of G Building Adjacency Matrix
Determining Attractor Matrices Indication that is provided by viewer Computerize utility program Analyzes content and recognizes Keywords, key phrases, page links, … Possible to be runtime or offline and update periodically Co-citation(GTG) # of sites jointly cite the page index by i & j Bibliographic coupling (GGT) # of sites jointly referred to by the page index by i & j Matrices can be included in linear combination Determining Attractor Matrices
A flow diagram of a method for ranking pages according eigenvector coefficients
Ranking according to eigenvector coefficient Minimal storage space Neighborhood graph may be so large Power low distribution # sites whose eigenvector coefficient have a value that is less than a chosen number Generating a hub score for one or more pages Based on the sum (or a function of sum) of the eigenvector coefficient of neighboring pages indication regarding the quality of pages as a hub or directory of other pages provide information that is valuable for the user. Ranking according to eigenvector coefficient
Ranking according to eigenvector coefficient Example: result of a query 0.5 billion pages (distributed geometrically) 1st category, high ranked page (50 pages) 2nd category, next high rank paged (geometric multiple of 50 pages) … Each page is assigned to a category by designating a corresponding bit from a multi bit word If 10 bits per page are allotted, 1024 categories are available Ranking according to eigenvector coefficient
Advantage & Applications The present invention is capable of being distributed as a program product in a variety of forms Each block diagram component, flowchart step, etc can be implemented by a wide range of hardware, software, firmware or …. Advantage & Applications
Thank you & Best wishes