Centrality Spring 2012.

Slides:



Advertisements
Similar presentations
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Introduction to Network Theory: Modern Concepts, Algorithms
Analysis and Modeling of Social Networks Foudalis Ilias.
Introduction to Social Network Analysis Lluís Coromina Departament d’Economia. Universitat de Girona Girona, 18/01/2005.
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Relationship Mining Network Analysis Week 5 Video 5.
Link Analysis: PageRank
Absorbing Random walks Coverage
Centrality in Social Networks
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Graph & BFS.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Centrality Measures These measure a nodes importance or prominence in the network. The more central a node is in a network the more significant it is to.
HCC class lecture 22 comments John Canny 4/13/05.
CS8803-NS Network Science Fall 2013
Centrality in Social Networks Background: At the individual level, one dimension of position in the network can be captured through centrality. Conceptually,
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
Social Networks Corina Ciubuc.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Presentation: Random Walk Betweenness, J. Govorčin Laboratory for Data Technologies, Faculty of Information Studies, Novo mesto – September 22, 2011 Random.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Social Network Analysis: A Non- Technical Introduction José Luis Molina Universitat Autònoma de Barcelona
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Centrality in undirected networks These slides are by Prof. James Moody at Ohio State.
Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury
Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
Lecture 13: Network centrality Slides are modified from Lada Adamic.
L – Modelling and Simulating Social Systems with MATLAB Lesson 6 – Graphs (Networks) Anders Johansson and Wenjian Yu (with S. Lozano.
Centrality in Social Networks Background: At the individual level, one dimension of position in the network can be captured through centrality. Conceptually,
ANALYZING THE SOCIAL WEB an introduction 1. OUTLINE 1.Introduction 2.Network Structure and Measures 3.Social Information Filtering 2.
Slides are modified from Lada Adamic
HCC class lecture 21: Intro to Social Networks John Canny 4/11/05.
CS 590 Term Project Epidemic model on Facebook
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
How to Analyse Social Network? Social networks can be represented by complex networks.
Informatics tools in network science
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Online Social Networks and Media Absorbing random walks Label Propagation Opinion Formation.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Department of Computer and IT Engineering University of Kurdistan
Link-Based Ranking Seminar Social Media Mining University UC3M
Node Similarity Ralucca Gera,
Community detection in graphs
Degree and Eigenvector Centrality
Section 7.12: Similarity By: Ralucca Gera, NPS.
Centrality in Social Networks
Graphs Representation, BFS, DFS
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Centralities (4) Ralucca Gera,
PageRank algorithm based on Eigenvectors
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Katz Centrality (directed graphs).
Presentation transcript:

Centrality Spring 2012

Why do we care? Diffusion (practices, information, disease) Structure, status, prestige Seeing, perspective, worldview Power as relational, constraints as relational Network location as dependent variable Explaining outcomes Supporting strategic “networking”

Example: 2-Step Flow of Communication* Micro- macro- link in communications theory Lazarsfeld on mass media and voting (1940s) high centrality nodes – opinion leaders – mediate broadcast info flow later (Lazarsfeld & Katz (1955)) formalized as two-step flow of communication model: mass media messages filtered through more-exposed central members of social groups. *Remix of http://www.soc.umn.edu/~knoke/pages/SOC8412.htm

What Vertices are Most Important? The Question What Vertices are Most Important?

Everyday Understandings Important = prominent Important = admired Important = linchpin Important = listened to Important = in the know Important = gate keeper Important = involved

Translations Ordinary Description Possible Network Interpretation prominent Vertex is “visible” to many other vertices admired Vertex is “chosen” by many other vertices listened to Vertex is “received” by many other vertices in the know Vertex is short distance from many sources of information linchpin Vertex irreplaceable part gate keeper Vertex stands between one part of graph and another involved Vertex connected to many parts of graph

Davis Southern Women Degree Centrality

Davis’ Southern Women - Centrality

A Simple Network A B C D E F G - 1

𝐶 𝐷 𝑣 𝑖 = 𝑘 𝑖 centrality degree

Degree Centrality can Fail to Differentiate B C D E F G - 1 CD A 4 B C D E F G 1

Degree Centrality Can Mislead

𝐶 𝑐 𝑣 𝑖 = 1 𝑗=1 𝑛 𝑑( 𝑣 𝑖 𝑣, 𝑣 𝑗 ) centrality closeness

Closeness Centrality Closeness = 1/total distance to other vertices 𝐶 𝑐 𝑣 = 1 𝑖=1 𝑛 𝑑(𝑣, 𝑣 𝑖 ) 𝐶 𝑐 𝐴 = 1 𝑑 𝐴𝐵 +𝑑 𝐴𝐶 +𝑑 𝐴𝐷 +𝑑 𝐴𝐸 +𝑑 𝐴𝐹 +𝑑 𝐴𝐺 𝐶 𝑐 𝐴 = 1 1+1+1+2+1+2 𝐶 𝑐 𝐴 = 1 8 =0.125

Compare Two Graphs What is the problem here? How would you fix it? Compute Closeness Centrality of a Vertex 𝐶 𝑐 𝐴 = 1 1+1+1 =0.3 3 𝐶 𝑐 𝐴 = 1 1+1+1+1+1 =0.2 What is the problem here? How would you fix it?

Normalization 𝐶 ′ 𝐶 𝑣 𝑖 = 𝑛−1 𝑗=1 𝑛 𝑑 𝑣 𝑖 , 𝑣 𝑗 Adjusting a formula to take into account things like graph size Usually by “mapping” values to (0…1) or -1…+1 For closeness centrality: 𝐶 ′ 𝐶 𝑣 𝑖 = 𝑛−1 𝑗=1 𝑛 𝑑 𝑣 𝑖 , 𝑣 𝑗 Where n is number of vertices in the graph

Compare Two Graphs 𝐶′ 𝑐 𝐴 = 3 1+1+1 =1 𝐶′ 𝑐 𝐴 = 5 1+1+1+1+1 =1 Intuitively, both blue vertices should have the same closeness centrality since both are 1 step away from all other vertices.

𝐶 𝑏 𝑣 𝑖 = 𝑛 𝑠𝑡𝑖 𝑔 𝑠𝑡 centrality Betweenness

Betweenness Centrality Fraction of shortest paths that include vertex A B C D E F G - 1,1 2,4 2,1 3,4

Betweenness Centrality Fraction of shortest paths that include vertex 1 shortest path of 4 goes through A 𝐶 𝑏 𝑣 𝑖 = 𝑛 𝑠𝑡𝑖 𝑔 𝑠𝑡 A B C D E F G - 1,1 2,4 2,1 3,4 Example: Calculate betweenness centrality of vertex A 1 shortest path of 4 goes through A 𝐶 𝑏 𝑣 𝑖 = 𝑛 𝑠𝑡𝑖 𝑔 𝑠𝑡 = 1 4 + 1 4 + 1 4 = 0.75 1 shortest path of 4 goes through A

Normalizing Betweenness Middle vertices should have same CB? Since number of paths vertex COULD be on is (n-1)(n-2)/2 we can use this as our denominator 𝐶′ 𝐵 = 𝐶 𝐵 𝑛−1 𝑛−2 2

Calculate Cb(F) A B C D E F G - 1 4

Vertex Centrality Comparison Usually centrality metrics positively correlated When not, something interesting going on Low Degree Low Closeness Low Betweenness High Degree Ego embedded in cluster that is far from the rest of the network Ego's connections are redundant - communication bypasses him/her High Closeness Ego tied to important or active alters Probably multiple paths in the network, ego is near many people, but so are many others High Betweenness Ego's few ties are crucial for network flow Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.

Information Centrality Betweenness only uses geodesic paths Information can also flow on longer paths Sometimes we hear it through the grapevine While betweenness focuses just on the geodesic, information centrality focuses on how information might flow through many different paths, weighted by strength of tie and distance. (Moody)

Information Centrality Chapter 2 Resistance Distance, Information Centrality, Node Vulnerability and Vibrations in Complex Networks by Ernesto Estrada and Naomichi Hatano

Diagrams by J Moody, Duke U.

centrality Eigenvector

Consider this Example The two red nodes have similar amounts of “local” centrality, but different amounts of “global” centrality.

Power/Eigenvector Centrality Weakness of degree centrality – it counts your neighbors but not whether or not they count Basic idea ego’s centrality is function of neighbors’ centrality C(ego) = f (C(ego’s neighbors) )

Algorithm Assume all vertices have centrality, C = 1 Recalculate C by summing C of neighbors Repeat the process Each time we are “taking into account” the centralities of yet another “layer” of the vertices around us

1

2 3 5 4

6 7 13 10 18 9

15 22 33 16 20 52 36 46 25

40 58 126 52 72 139 94 209 92

And consider the matrix What does this matrix “do” to the vector ? Consider the xy coordinate plane where a line from (0,0) to (x,y) is the vector And consider the matrix What does this matrix “do” to the vector ? (x,y) x y y x 1 ½ 1

Matrix Multiplication as Distortion 1 ½ 1 ½ 1 BUT 1 ½ 1 1

So, what is an Eigenvector?

Eigenvector Adjacency matrix redistributes vertex contents Some vector of contents is in equilibrium These are the eigenvector centralities

What is an Eigenvector? Consider a graph & its 5x5 adjacency matrix, A

And then consider a vector, x… a 5x1 vector of values, one for each vertex in the graph. In this case, we've used the degree centrality of each vertex.

What happens when… …we multiply the vector x by the matrix A? The result, of course, is another 5x1 vector.

Axx diffuses the vertex values Look at first element of resulting vector The 1s in the A matrix "pick up" values of each vertex to which the first vertex is connected Result value is sum of values of these vertices.

Intuitiveness Visible on Rearrangment

Eigenvector vs. Power (Bonacich)

Centrality in Social Networks Power / Eigenvalue In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key dimensions: Substantively, the key question for centrality is knowing what is flowing through the network. The key features are: Whether the actor retains the good to pass to others (Information, Diseases) or whether they pass the good and then loose it (physical objects) Whether the key factor for spread is distance (disease with low pij) or multiple sources (information) The off-the-shelf measures do not always match the social process of interest, so researchers need to be mindful of this.

What Can We Study with Centrality? City systems Illegal networks Marketing targets Opinion formation/spread Epidemiology

Directed Networks: Centrality v. Prestige An actor is considered central if her ties make her visible to others Visibility by direct ties AND by indirect ties through intermediaries Many social and economic phenomena such as access and control over resources and brokerage of information involve centrality since simply participating in interaction is what counts. But the number of ties alone does not determine importance So we distinguish a second type of visibility : prestige Takes into account the direction of the tie Generally, more incoming ties  more prestige And, more incoming ties from higher prestige vertices  more prestige Based on Aliseya Wright HCC Spring 2005 Wednesday, April 13, 2005 http://www.cs.berkeley.edu/~jfc/hcc/courseSP05/lecs/lec22/Centrality.ppt

Google(PageRank): Overview Pre-computes a rank-vector Provides a-priori (offline) importance estimates for all pages on Web Independent of search query In-degree  prestige Not all votes are worth the same Prestige of a page is the sum of prestige of citing pages: p = Ep Pre-compute query independent prestige score Query time: prestige scores used in conjunction with query-specific IR scores Mining the Web Chakrabarti and Ramakrishnan www.cse.iitb.ac.in/soumen/mining-the-web/slides1/social.ppt

Chakrabarti and Ramakrishnan Google(PageRank) Assumption the prestige of a page is proportional to the sum of the prestige scores of pages linking to it Random surfer on strongly connected web graph E is adjacency matrix of the Web No parallel edges matrix L derived from E by normalizing all row-sums to one: . Mining the Web Chakrabarti and Ramakrishnan

Chakrabarti and Ramakrishnan The PageRank After ith step: Convergence to stationary distribution of L. p -> principal eigenvector of LT Called the PageRank Convergence criteria L is irreducible there is a directed path from every node to every other node L is aperiodic for all u & v, there are paths with all possible number of links on them, except for a finite set of path lengths Mining the Web Chakrabarti and Ramakrishnan