CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A PRESENTATION on What is this Page Known for? Computing Web Page Reputations D. Rafiei & A.O. Mendelzon WWW9 Conference, Amsterdam, May 2000 by Osama Ahmed Khan
OVERVIEW Introduction Motivation Random Walks On The Web Graph One-level Influence Propagation Two-level Influence Propagation Issues Evaluation Limitations
Introduction To find a ranked set of pages which have a ‘reputation’ on a topic To find a ranked set of topics on which a webpage has a ‘reputation’ ‘Reputation’: Evaluated on the basis of: Navigation Subsumption Relatedness Refutation Justification
Motivation Organizational Review Page Classification Personal Review
Random Walks On The Web Graph Given a set S = {s 1, s 2, ……., s n ) of states, ‘Random Walk’: Switches Stays
One-level Influence Propagation ‘Random Surfer’: Selects a page Follows a link Reputation Total number of visits
One-level Model Reputation Probability that random surfer looking for topic ‘t’ will visit page ‘p’ at step ‘n’ of the walk
One-level Model (Contd.) One-level Reputation Rank Equilibrium Probability of visiting page ‘p’ for topic ‘t’
Two-level Influence Propagation ‘Random Surfer’: Selects a page Follows a link Forward Backward Reputation Authority: Total number of Forward visits Hub: Total number of Backward visits
Two-level Model Reputation Authority: Probability that random surfer looking for topic ‘t’ makes a forward visit to page ‘p’ at step ‘n’ of the walk
Two-level Model (Contd.) Reputation (Contd.) Hub: Probability that random surfer looking for topic ‘t’ makes a backward visit to page ‘p’ at step ‘n’ of the walk
Two-level Model (Contd.) Two-level Reputation Rank Equilibrium Probability of visiting page ‘p’ for topic ‘t’ in direction associated to ‘r’
IssuesIssues 1.Access to large crawl of Web: Computation Set of pages where ranks are computed Algorithm 1 (One-level) Algorithm 2 (Two-level) Set of topics on which ranks are computed Algorithm 1 (One-level) Algorithm 2 (Two-level)
Issues (Contd.) 2.No access to large crawl of Web: Approximation Set of pages where ranks are computed Generalization of PageRank Model (One-level) Generalization of Hubs and Authorities Model (Two-level) Set of topics on which ranks are computed Algorithm 3 (One-level) Algorithm 4 (Two-level)
EvaluationEvaluation Not access to large crawl of Web: Approximation Set of topics on which ranks are computed Algorithm 3 (One-level) Algorithm 4 (Two-level)
Evaluation (Contd.) 1.Known Authoritative Pages
Evaluation (Contd.) 2.Personal Home Pages
Evaluation (Contd.) 3.Unregulated Websites
LimitationsLimitations Topic representation on Web Page connectivity
Thank You