Lecture 10 Measures and Metrics.

Slides:

Advertisements

Similar presentations

NP-Hard Nattee Niparnan.

Advertisements

Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.

Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,

Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.

22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.

Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.

Analysis and Modeling of Social Networks Foudalis Ilias.

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.

V4 Matrix algorithms and graph partitioning

1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.

Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.

Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

The Shortest Path Problem

Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.

GRAPH Learning Outcomes Students should be able to:

V2: Measures and Metrics (II) - Betweenness Centrality - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity.

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

Social Media Mining Community Analysis.

Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

Mathematics of Networks (Cont)

Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,

TELCOM2125: Network Science and Analysis

CS 590 Term Project Epidemic model on Facebook

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2016 Figures are taken.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Lecture 9 Measures and Metrics. Cocitation and Bibliographic coupling 2.

Section 7.13: Homophily (or Assortativity)

Graph clustering to detect network modules

Cohesive Subgraph Computation over Large Graphs

Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama

Applied Discrete Mathematics Week 14: Trees

Groups of vertices and Core-periphery structure

Copyright © Zeph Grunschlag,

Social Networks Analysis

Slides modified from Huan Liu, Lei Tang, Nitin Agarwal, Reza Zafarani

Representing Graphs and

Minimum Spanning Tree 8/7/2018 4:26 AM

Lecture 9 Measures and Metrics.

Greedy Algorithm for Community Detection

Network analysis.

Community detection in graphs

Using Friendship Ties and Family Circles for Link Prediction

Degree and Eigenvector Centrality

Network Science: A Short Introduction i3 Workshop

Section 7.12: Similarity By: Ralucca Gera, NPS.

Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)

ICS 353: Design and Analysis of Algorithms

Lecture 13 Network evolution

Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)

Richard Anderson Lecture 25 NP-Completeness

Why Social Graphs Are Different Communities Finding Triangles

Peer-to-Peer and Social Networks Fall 2017

V11 Metabolic networks - Graph connectivity

Assortativity (people associate based on common attributes)

Emotions in Social Networks: Distributions, Patterns, and Models

3.3 Network-Centric Community Detection

Slides modified from Huan Liu, Lei Tang, Nitin Agarwal, Reza Zafarani

Algorithms (2IL15) – Lecture 7

Text Categorization Berlin Chen 2003 Reference:

V11 Metabolic networks - Graph connectivity

Practical Applications Using igraph in R Roger Stanton

Making Use of Associations Tests

Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)

V11 Metabolic networks - Graph connectivity

Based on slides by Y. Peng University of Maryland

Advanced Topics in Data Mining Special focus: Social Networks

Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)

Presentation transcript:

Lecture 10 Measures and Metrics

Network Growth Patterns Network Segmentation Graph Densification Diameter Shrinkage

1. Network Segmentation Often, in evolving networks, segmentation takes place, where the large network is decomposed over time into three parts Giant Component: As network connections stabilize, a giant component of nodes is formed, with a large proportion of network nodes and edges falling into this component. Stars: These are isolated parts of the network that form star structures. A star is a tree with one internal node and n leaves. Singletons: These are orphan nodes disconnected from all nodes in the network.

2. Graph Densification

Densification in Real Networks 1.69 1.66 Source: Leskovec et al. KDD 2005 V(t) V(t) Physics Citations Patent Citations

3. Diameter Shrinking In networks diameter shrinks over time ArXiv citation graph Affiliation Network

Community Evolution Communities also expand, shrink, or dissolve in dynamic networks

Evaluating the Communities Evaluation with ground truth Evaluation without ground truth

Evaluation with Ground Truth When ground truth is available We have partial knowledge of what communities should look like We are given the correct community (clustering) assignments Measures Precision and Recall, or F-Measure Purity Normalized Mutual Information (NMI) TP – the intersection of the two oval shapes; TN – the rectangle minus the two oval shapes; FP – the circle minus the blue part; FN – the blue oval minus the circle. Accuracy = (TP+TN)/(TP+FP+FN+FN); Precision = TP/(TP+FP); Recall = TP/(TP+FN)

Precision and Recall True Positive (TP) : False Negative (FN) : True Positive (TP) : When similar members are assigned to the same communities A correct decision. True Negative (TN) : When dissimilar members are assigned to different communities A correct decision False Negative (FN) : When similar members are assigned to different communities An incorrect decision False Positive (FP) : When dissimilar members are assigned to the same communities

Precision and Recall: Example TP+FP = C(6,2) + C(8,2) = 15+28 = 43; FP counts the wrong pairs within each cluster; FN counts the similar pairs but wrongly put into different clusters; TN counts dissimilar pairs in different clusters

F-Measure

We can assume the majority of a community represents the community Purity We can assume the majority of a community represents the community We use the label of the majority against the label of each member to evaluate the communities Purity. The fraction of instances that have labels equal to the community’s majority label N – the total number of data points. Purity can be easily tampered by Points being singleton communities (of size 1); or by Very large communities

Mutual Information Mutual information (MI). The amount of information that two random variables share. By knowing one of the variables, it measures the amount of uncertainty reduced regarding the others

Normalizing Mutual Information (NMI)

Normalized Mutual Information

Normalized Mutual Information NMI values close to one indicate high similarity between communities found and labels Values close to zero indicate high dissimilarity between them

Normalized Mutual Information: Example Found communities (H) [1,1,1,1,1,1, 2,2,2,2,2,2,2,2] Actual Labels (L) [2,1,1,1,1,1, 2,2,2,2,2,2,1,1] nh h=1 6 h=2 8 nl 7 nh,l h=1 5 1 h=2 2 6

Evaluation without Ground Truth Evaluation with Semantics A simple way of analyzing detected communities is to analyze other attributes (posts, profile information, content generated, etc.) of community members to see if there is a coherency among community members The coherency is often checked via human subjects. Or through labor markets: Amazon Mechanical Turk To help analyze these communities, one can use word frequencies. By generating a list of frequent keywords for each community, human subjects determine whether these keywords represent a coherent topic. Evaluation Using Clustering Quality Measures Use clustering quality measures (SSE) Use more than two community detection algorithms and compare the results and pick the algorithm with better quality measure

Cocitation and Bibliographic coupling Cocitation of two vertices i and j is the number of vertices that have outgoing edges to both 𝐶 𝑖𝑗 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝐴 𝑗𝑘 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝐴 𝑘𝑗 𝑇 𝐶=𝐴 𝐴 𝑇 Bibliographic coupling is the number of vertices to which both point 𝐵= 𝑘=1 𝑛 𝐴 𝑘𝑖 𝐴 𝑘𝑗 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝑇 𝐴 𝑘𝑗 𝐵= 𝐴 𝑇 𝐴

Edge independent paths: if they share no common edge Vertex independent paths: if they share no common vertex except start and end vertices Vertex-independent => Edge-independent Also called disjoint paths These set of paths are not necessarily unique Connectivity of vertices: the maximal number of independent paths between a pair of vertices Used to identify bottlenecks and resiliency to failures

Cut Sets and Maximum Flow A minimum cut set is the smallest cut set that will disconnect a specified pair of vertices Need not to be unique Menger’s theorem: If there is no cut set of size less than n between a pair of vertices, then there are at least n independent paths between the same vertices. Implies that the size of min cut set is equal to maximum number of independent paths for both edge and vertex independence Maximum Flow between a pair of vertices is the number of edge independent paths times the edge capacity.

Transitivity  is said to be transitive if a  b and b  c together imply a  c Perfect transitivity in network → cliques Partial transitivity u knows v and v knows w → 𝐶= 𝑐𝑙𝑜𝑠𝑒𝑑 𝑝𝑎𝑡ℎ𝑠 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑡𝑤𝑜 𝑝𝑎𝑡ℎ𝑠 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑡𝑤𝑜 = 3 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠

Clustering Coefficient and Triples Triple: an ordered set of three nodes, connected by two (open triple) edges or three edges (closed triple) A triangle can miss any of its three edges A triangle has 3 Triples 𝑣 𝑖 𝑣 𝑗 𝑣 𝑘 and 𝑣 𝑗 𝑣 𝑘 𝑣 𝑖 are different triples The same members First missing edge 𝑒(𝑣 𝑘 ,𝑣 𝑖 ) and second missing 𝑒(𝑣 𝑖 ,𝑣 𝑗 ) 𝑣 𝑖 𝑣 𝑗 𝑣 𝑘 and 𝑣 𝑘 𝑣 𝑗 𝑣 𝑖 are the same triple

[Global] Clustering Coefficient Clustering coefficient measures transitivity in undirected graphs Count paths of length two and check whether the third edge exists When counting triangles, since every triangle has 6 closed paths of length 2 Or we can rewrite it as

[Global] Clustering Coefficient: Example

Local Clustering Coefficient Local clustering coefficient measures transitivity at the node level Commonly employed for undirected graphs Computes how strongly neighbors of a node 𝑣 (nodes adjacent to 𝑣) are themselves connected In an undirected graph, the denominator can be rewritten as:

Local Clustering Coefficient: Example Thin lines depict connections to neighbors Dashed lines are the missing connections among neighbors Solid lines indicate connected neighbors When all neighbors are connected 𝐶=1 When none of neighbors are connected 𝐶=0

Structural Metrics: Clustering coefficient

Local Clustering and Redundancy 𝐶 𝑖 = 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑖 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑖 𝐶 𝑊𝑆 = 1 𝑛 𝑖=1 𝑛 𝐶 𝑖 Redundancy 𝐶 𝑖 = 𝑅 𝑖 𝑘 𝑖 −1 𝑅 𝑖 = 𝐶 𝑖 ( 𝑘 𝑖 −1)

Reciprocity How likely is it that the node you point to will point to you as well. 𝑟= 1 𝑚 𝑖𝑗 𝐴 𝑖𝑗 𝐴 𝑗𝑖 = 1 𝑚 Tr 𝐴 2

If you become my friend, I’ll be yours Reciprocity If you become my friend, I’ll be yours Reciprocity is simplified version of transitivity It considers closed loops of length 2 If node 𝑣 is connected to node 𝑢, 𝑢 by connecting to 𝑣, exhibits reciprocity

Reciprocity: Example Reciprocal nodes: 𝑣1, 𝑣2

Signed Edges and Structural balance Friends / Enemies Friend of friend → Enemy of my enemy → Structural balance: only loops of even number of “negative links” Structurally balanced → partitioned into groups where internal links are positive and between group links are negative

Triangle of nodes 𝑖, 𝑗, and 𝑘, is balanced, if and only if Social Balance Theory Consistency in friend/foe relationships among individuals Informally, friend/foe relationships are consistent when In the network Positive edges demonstrate friendships (𝑤𝑖𝑗=1) Negative edges demonstrate being enemies (𝑤𝑖𝑗=−1) Triangle of nodes 𝑖, 𝑗, and 𝑘, is balanced, if and only if 𝑤𝑖𝑗 denotes the value of the edge between nodes 𝑖 and 𝑗

Social Balance Theory: Possible Combinations For any cycle if the multiplication of edge values become positive, then the cycle is socially balanced

Structural Equivalence: share many of the same neighbors Similarity Structural Equivalence: share many of the same neighbors Jaccard Similarity: 𝜎 𝑖𝑗 = 𝑛 𝑖𝑗 | 𝑛 𝑖 ∪ 𝑛 𝑗 | Cosine Similarity: 𝜎 𝑖𝑗 = 𝑛 𝑖𝑗 𝑘 𝑖 𝑘 𝑗 Pearson Coefficient: Given degree of two nodes, how many common neighbors they have ( 𝑟 𝑖𝑗 ) Euclidian Distance: 𝑑 𝑖𝑗 = 𝑘 ( 𝐴 𝑖𝑘 − 𝐴 𝑗𝑘 ) 2 Regular Equivalence: neighbors are the same Katz Similarity: 𝜎 𝑖𝑗 =𝛼 𝑘𝑙 𝐴 𝑖𝑘 𝐴 𝑗𝑙 𝜎 𝑘𝑙 𝝈=𝛼𝑨𝝈+𝑰

Homophily and Assortative Mixing Assortativity: Tendency to be linked with nodes that are similar in some way Humans: age, race, nationality, language, income, education level, etc. Citations: similar fields than others Web-pages: Language Disassortativity: Tendency to be linked with nodes that are different in some way Network providers: End users vs other providers Assortative mixing can be based on Enumerative characteristic Scalar characteristic

Assortativity: An Example The friendship network in a US high school in 1994 Colors represent races, White: whites Grey: blacks Light Grey: hispanics Black: others High assortativity between individuals of the same race

Assortativity Significance The difference between measured assortativity and expected assortativity The higher this difference, the more significant the assortativity observed Example In a school, half the population is white and the other half is Hispanic. We expected 50% of the connections to be between members of different races. If all connections are between members of different races, then we have a significant finding

Modularity (enumerative) Extend to which a node is connected to a like in network + if there are more edges between nodes of the same type than expected value - otherwise 𝑄= 1 2𝑚 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗 𝛿 𝑐 𝑖 , 𝑐 𝑗 is 1 if ci and cj are of same type, and 0 otherwise 𝑄= 𝑟 𝑒 𝑟𝑟 − 𝑎 𝑟 2 err is fraction of edges that join same type of vertices ar is fraction of ends of edges attached to vertices type r

Assortative coefficient (enumerative) Modularity is almost always less than 1, hence we can normalize it with the Qmax value 𝑟= 𝑄 𝑄 𝑚𝑎𝑥 = 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗 2𝑚 − 𝑖𝑗 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗

Assortative coefficient (scalar) 𝑟= 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑥 𝑖 . 𝑥 𝑗 𝑖𝑗 𝑘 𝑖 𝛿 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑥 𝑖 . 𝑥 𝑗 r=1, perfectly assortative r=-1, perfectly disassortative r=0, non-assortative Usually node degree is used as scale 𝑟= 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑘 𝑖 . 𝑘 𝑗 𝑖𝑗 𝑘 𝑖 𝛿 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑘 𝑖 . 𝑘 𝑗

Modularity: Matrix Form Let denote the indicator matrix and let 𝑘 denote the number of types The Kronecker delta function can be reformulated using the indicator matrix Therefore,

Normalized Modularity: Matrix Form Let Modularity matrix be 𝑑∈ ℝ 𝑛 ×1 is the degree vector Modularity can be reformulated as

Modularity Example The number of edges between nodes of the same color is less than the expected number of edges between them

Assortativity Coefficient of Various Networks M.E.J. Newman. Assortative mixing in networks

Measuring Assortativity for Ordinal Attributes A common measure for analyzing the relationship between ordinal values is covariance It describes how two variables change together In our case, we have a network We are interested in how values assigned to nodes that are connected (via edges) are correlated

Covariance Variables The value assigned to node 𝑣𝑖 is 𝑥𝑖 We construct two variables 𝑋𝐿 and 𝑋𝑅 For any edge (𝑣𝑖,𝑣𝑗), we assume that 𝑥𝑖 is observed from variable 𝑋𝐿 and 𝑥𝑗 is observed from variable 𝑋𝑅 𝑋𝐿 represents the ordinal values associated with the left-node (the first node) of the edges and 𝑋𝑅 represents the values associated with the right-node (the second node) of the edges We need to compute the covariance between variables 𝑋𝐿 and 𝑋𝑅

Covariance Variables: Example List of edges: (A, C) (C, A) (C, B) (B, C) 𝑋𝐿 : (18, 21, 21, 20) 𝑋𝑅 : (21, 18, 20, 21)

Covariance For two given column variables 𝑋𝐿 and 𝑋𝑅 the covariance is 𝐸(𝑋𝐿) is the mean of the variable and 𝐸(𝑋𝐿 𝑋𝑅) is the mean of the multiplication 𝑋𝐿 and 𝑋𝑅

Covariance

Normalizing Covariance Pearson correlation 𝜌(𝑋,𝑌) is the normalized version of covariance In our case: \sigma = E(X-E(X))^2

Correlation Example