Centrality in Social Networks

Slides:

Advertisements

Similar presentations

Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.

Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.

Introduction to Network Theory: Modern Concepts, Algorithms

Analysis and Modeling of Social Networks Foudalis Ilias.

Relationship Mining Network Analysis Week 5 Video 5.

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

Link Analysis: PageRank

CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou

1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.

CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.

Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:

Page Rank.  Intuition: solve the recursive equation: “a page is important if important pages link to it.”  Maximailly: importance = the principal eigenvector.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006

15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.

Centrality Measures These measure a nodes importance or prominence in the network. The more central a node is in a network the more significant it is to.

Link Analysis, PageRank and Search Engines on the Web

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

HCC class lecture 22 comments John Canny 4/13/05.

CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:

Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.

Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.

The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.

Presentation: Random Walk Betweenness, J. Govorčin Laboratory for Data Technologies, Faculty of Information Studies, Novo mesto – September 22, 2011 Random.

MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury

COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.

Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

Centrality Spring 2012.

Lecture 13: Network centrality Slides are modified from Lada Adamic.

PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.

Slides are modified from Lada Adamic

Understanding Google’s PageRank™ 1. Review: The Search Engine 2.

How to Analyse Social Network? Social networks can be represented by complex networks.

Informatics tools in network science

Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.

Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.

Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.

CS 440 Database Management Systems Web Data Management 1.

CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.

Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.

Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.

Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that.

Search Engines and Link Analysis on the Web

Department of Computer and IT Engineering University of Kurdistan

Link-Based Ranking Seminar Social Media Mining University UC3M

Network analysis.

Community detection in graphs

I206: Lecture 15: Graphs Marti Hearst Spring 2012.

Centralities (2) Ralucca Gera,

Degree and Eigenvector Centrality

Network Science: A Short Introduction i3 Workshop

Cloud Computing Lecture #4 Graph Algorithms with MapReduce

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Centralities (4) Ralucca Gera,

CS 440 Database Management Systems

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Katz Centrality (directed graphs).

Junghoo “John” Cho UCLA

Graph and Link Mining.

Junghoo “John” Cho UCLA

Adjacency Matrices and PageRank

Detecting Important Nodes to Community Structure

Presentation transcript:

Centrality in Social Networks Kristina Lerman University of Southern California Access to data allows us to ask new questions, empirically measure effects CS 599: Social Media Analysis University of Southern California

Network analysis

4.2

Three views of social network analysis Freeman, L. 1979 “Centrality in Social Networks: Conceptual Clarification”, Social Networks 1, No. 3. Bonacich, P. 1987 “Power and Centrality in Social Networks: a Family of Measures”, American Journal of Sociology Franceschetti, M. 2011 “PageRank: standing on the shoulders of giants” Commun. ACM, Vol. 54, pp. 92-101.

Power in social networks Certain positions within the network give nodes more power Directly affect/influence others Control the flow of information Avoid control of others

Centrality in social networks Centrality encodes the relationship between structure and power in groups Certain positions within the network give nodes more power or importance How do we measure importance? Who can directly affect/influence others? Highest degree nodes are “in the thick of it” Who controls information flow? Nodes that fall on shortest paths between others can disrupt the flow of information between them Who can quickly inform most others? Nodes who are close to other nodes can quickly get information to them

Degree centrality The number of others a node is connected to Node with high degree has high potential communication activity 1 2 3 4 5 node In-degree Out-degree Total degree 1 2 3 5 4 1 2 3 4 5

Mathematical representation and operations on graphs Adjacency matrix A 1 2 3 4 5 1 2 3 4 5 1 1 2 3 4 5 node In-degree Out-degree Total degree 1 2 3 5 4 Out-degree: row sum In-degree: column sum

Betweenness centrality Number of shortest paths (geodesics) connecting all pairs of other nodes that pass through a given node Node with highest betweenness can potentially control or distort communication 1 2 3 4 5 12 123 124 1245 23 24 245 32 34 35 1 2 3 4 5 452 4523 45 52 523 524

Closeness centrality Node that is closest to other nodes can reach other nodes in shortest amount of time Can best avoid being controlled by others Closeness centrality is sum of geodesic distances from a node to all other nodes

Self-consistent measures of centrality Katz (1953): Katz score “not only on how many others a person is connected to, but who he is connected to” One’s status is determined by the status of the people s/he is connected to Bonacich (1972): Eigenvector centrality Node’s centrality is the sum of the centralities of its connections e is the eigenvector of A, and l its associated eigenvalue (largest eigenvalue)

Alpha-centrality (Bonacich, 1987) Similar to eigenvector centrality, but the degree to which a node centrality contributes to the centralities of other nodes depends on parameter a. Mathematical interpretation ci(a) is the expected number of paths activated directly or indirectly by a node i

A closer look at Alpha-Centrality 1 2 3 4 5 Alpha-Centrality matrix 1st term: number of paths of length 1 (edges) between i and j Contribution of this term to ci(a) is SjAij 1

A closer look at Alpha-Centrality 1 2 3 4 5 Alpha-Centrality matrix 2nd term: number of paths of length 2 between i and j 1 1 1 2 x =

A closer look at Alpha-Centrality 1 2 3 4 5 Alpha-Centrality matrix 3rd term: number of paths of length 3 between i and j 1 2 1 1 2 x =

A closer look at Alpha-Centrality Alpha-Centrality matrix Number of paths of diverges as length of the path k grows To keep the infinite sum finite, a < 1/l1, where l1 is the largest eigenvalue of A (also called radius of centrality) Interpretation: Node’s centrality is the sum of paths of any length connecting it to other nodes, exponentially attenuated by length of the path, so that longer paths contribute less than shorter paths

Radius of centrality Parameter a sets the length scale of communication or interactions. For a = 0, only local interactions (with neighbors) are considered Only local structure is important centrality is same as degree centrality

Radius of centrality Parameter a sets the length scale of communication or interactions. As a grows, the length of interaction grows Global structure becomes more important Centrality depends on node’s position within a larger structure, e.g., a community

Radius of centrality Parameter a sets the length scale of communication or interactions. As a  1/l1, length of interactions becomes infinite Global structure is important Centrality is same as eigenvector centrality

Normalized Alpha-Centrality [Ghosh & Lerman 2011] Alpha-Centrality diverges for a > 1/l1 Solution: Normalized Alpha-Centrality Holds for

Multi-scale analysis with Alpha-Centrality Parameter a allows for multi-scale analysis of networks Differentiate between local and global structures Study how rankings change with a Leaders: high influence on group members Nodes with high centrality for small values of a Bridges: mediate communication between groups Nodes with low centrality for small values of a But high centrality for large values of a Peripherals: poorly connected to everyone Nodes with low centrality for any value of a

Karate club network [Zachary, 1977] instructor administrator

Ranking karate club members Centrality scores of nodes vs. a Betweenness centrality of 17 is zero, as expected, but 25 and 26 have scores similar to 31. 23

Florentine families in 15th century Italy

Ranking Florentine families

Summary Network position confers advantages or disadvantages to a node, but how you measure it depends on what you mean by advantage Ability to directly reach many nodes  degree centrality Ability to control information  betweenness centrality Ability to avoid control  closeness centrality Self-consistent definitions of centrality Node’s centrality depends on centrality of those it is connected to, directly or indirectly, but contribution of distant nodes is attenuated by how far they are Attenuation parameter sets the length scale of interactions Can probe structure at different scales by varying this parameter

PageRank: Standing on the Shoulders of Giants [Franceschet] Key insights Analyzes the structure of the web of hyperlinks to determine importance score of web pages A web page is important if it is pointed to by other important pages An algorithm with deep mathematical roots Random walks Social network theory

PageRank and the Random Surfer Starts at arbitrary page I H F B C E L M D A G

PageRank and the Random Surfer Starts at arbitrary page Bounces from page to page by following links randomly I H F B C E L M D A G

PageRank and the Random Surfer Starts at arbitrary page Bounces from page to page by following links randomly PageRank score of a web page is the relative number of time it is visited by the Random Surfer I H F B C E L M D A G

Mathematics of PageRank PageRank is a solution to a random walk on a graph Adjacency matrix of the graph A I H L M G E F B C D A A B C D E F G H I L M 1

Mathematics of PageRank PageRank is a solution to a random walk on a graph I H L M G E F B C D A (Diagonal) Out-degree matrix D A B C D E F G H I L M 1 2 3

Mathematics of PageRank PageRank is a solution to a random walk on a graph hij is probability to go from node i to node j hij=1/di  H=D-1A I H L M G E F B C D A A B C D E F G H I L M 1 .5 .3

Mathematics of PageRank PageRank of page j is defined recursively as pj=Sipihij Or in matrix form p=pH What contributes to PageRank score? Number of links page j receives Cf B and D Number of outgoing links of linking pages Cf E’s effect on F and B’s effect on C PageRank scores of linking pages Cf E and B I H L M G E F B C D A

… but there are problems Random Surfer gets trapped by dangling nodes! (no outlinks) Solution: matrix S replace zero rows in H with u=[0.9,0.9, …, 0.9] From dangling node, surfer jumps to any other node .09 1 .5 .3 I H L M G E F B C D A

Still problems Random Surfer gets trapped in buckets Reachable strongly connected component without outlinks Solution: teleportation matrix E Matrix of u .09 I H L M G E F B C D A

… finally Google matrix G = aS + (1-a) E Interpretation of G Where a is the damping factor Interpretation of G With probability a, Random Surfer follows a hyperlink from a page (selected at random) With probability 1-a, Random Surfer jumps to any page (e.g., by entering a new URL in the browser) PageRank scores are the solution of self-consistent equation p =pG =apS + (1-a)u

PageRank scores I 1.6 H 1.6 F 3.9 B 38.4 C 34.3 E 8.1 L 1.6 M 1.6 D 3.3 G 1.6

Summary Recursive (or self-consistent) nature of PageRank has roots in social network analysis metrics PageRank is fundamentally related to random walks on graphs Lots of research to compute it efficiently Huge economic and social impact!