Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 1 Part I: Web Structure Mining Chapter 2: Hyperlink Based Ranking Social Network Analysis PageRank Authorities and Hubs Link Based Similarity Search Enhanced Techniques for Page Ranking
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 2 Social Networks Directed graph with weights assigned to its edges Nodes represent documents and the edges – citations from one document to other documents. Prestige can be associated with the number of input edges to a node (in-degree). Prestige has a recursive nature. depends on the authority (or again, the prestige) of citations
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 3 Social Networks adjacency matrix –if document cites document –otherwise prestige score
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 4 Social Networks Computing prestige Eigen decomposition –Eigenvector P –Eigenvalue
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 5 Social Networks
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 6 Social Networks Loop: While Power Iteration
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 7 PageRank “Random web surfer” keeps clicking on hyperlinks at random with uniform probability Implements random walk on the web graph Page u links to web pages Probability of visiting page v will be Amount of prestige that page v receives from page u is of the prestige of u
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 8 PageRank Propagation of page rank
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 9 PageRank Calculation of page rank Norm Integers