“Important” Vertices and the PageRank Algorithm Networked Life NETS 112 Fall 2014 Prof. Michael Kearns.

Slides:



Advertisements
Similar presentations
The Structure of the Web Mark Levene (Follow the links to learn more!)
Advertisements

Overview of this week Debugging tips for ML algorithms
Markov Models.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
Web as Network: A Case Study Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
CS 206 Introduction to Computer Science II 11 / 11 / Veterans Day Instructor: Michael Eckmann.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Link Analysis, PageRank and Search Engines on the Web
The Web as Network Networked Life CSE 112 Spring 2006 Prof. Michael Kearns.
Network Structure and Web Search Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
How is this going to make us 100K Applications of Graph Theory.
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
News and Notes, 2/24 Homework 2 due at the start of Thursday’s class New required readings: –“Micromotives and Macrobehavior”, chapters 1, 3 and 4 –Watts,
HCC class lecture 22 comments John Canny 4/13/05.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
WEB SCIENCE: ANALYZING THE WEB. Graph Terminology Graph ~ a structure of nodes/vertices connected by edges The edges may be directed or undirected Distance.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 8: Information Retrieval II Aidan Hogan
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
Some Analysis of Coloring Experiments and Intro to Competitive Contagion Assignment Prof. Michael Kearns Networked Life NETS 112 Fall 2014.
Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Structural Properties of Networks: Introduction Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Convergence of PageRank and HITS Algorithms Victor Boyarshinov Eric Anderson 12/5/02.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IX: 2014/05/05.
How Do “Real” Networks Look?
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Informatics tools in network science
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Models of Web-Like Graphs: Integrated Approach
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Graph Coloring: Background and Assignment Networked Life NETS 112 Fall 2014 Prof. Michael Kearns.
Mathematics of the Web Prof. Sara Billey University of Washington.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Structural Properties of Networks: Introduction
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
How Do “Real” Networks Look?
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Structural Properties of Networks: Introduction
Generative Model To Construct Blog and Post Networks In Blogosphere
How Do “Real” Networks Look?
Centrality in Social Networks
How Do “Real” Networks Look?
Models of Network Formation
The likelihood of linking to a popular website is higher
Models of Network Formation
Models of Network Formation
How Do “Real” Networks Look?
CS 440 Database Management Systems
Models of Network Formation
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

“Important” Vertices and the PageRank Algorithm Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Lecture Roadmap Measures of vertex importance in a network Google’s PageRank algorithm

Important Vertices So far: have emphasized macroscopic aspects of network structure –degree distribution: all degrees –diameter: average over all vertex pairs –connectivity: giant component with most vertices –clustering: average over all vertices, compare to overall edge density Also interesting to identify “important” individuals within the network Some purely structural definitions of importance: –high degree –link between communities –betweeness centrality –Google’s PageRank algorithm

Background on Web Search Most important sources of information: words in query and on web pages For instance, on query “mountain biking”: –realize “bike” and “bicycle” are synonymous under context “mountain” –but “bike” and “motorcylce” are not –find documents with these words and their correlates (“Trek”, “trails”) Subject of the field of information retreival But many other “features” or “signals” may be useful for identifying good or useful sites: –font sizes –frequency of exclamation marks PageRank idea: use the link structure of the web

Directed Networks The web graph is a directed network: –page A can point/link to page B, without a reciprocal link –represent directed edges with arrows –each web page/site thus has “in-links” and “out-links” –in-degree = # of in-links; out-degree = # of out-links Q: What constitutes an “important” vertex in a directed network? –could just use in-degree; view directed links as referrals PageRank answer: an important page is pointed to by lots of other important pages This, of course, is circular… A F D B C E

The PageRank Algorithm Suppose p and q are pages where q  p Let R(q) be rank of q; let out(q) be out-degree of q Idea: q “distributes” its rank over its out-links Each out-link of q receives R(q)/out(q) So then p’s rank should be: A F D B C E

The PageRank Algorithm What guarantee do we have that all these equations are consistent? Idea: turn the equations into an update rule (algorithm) At each time step, pick some vertex p to update, and perform: Right-hand side R(q) are current/frozen values; R(p) is new rank of p Q: What if POINTS(p) is empty? A: “Random Surfer” Claim: Under broad conditions, this algorithm will converge: –at some point, updates no longer change any values –have found solution to all the R(p) equations Let’s experiment with this PageRank CalculatorCalculator

A DB C E

A F D B C E

A F D B C E G

Summary PageRank defines importance in a circular or self-referential fashion Circularity is broken with a simple algorithm that provably converges PageRank defines R(p) globally; more subtle than local properties of p