COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Dr. Miguel Bagajewicz Sanjay Kumar DuyQuang Nguyen Novel methods for Sensor Network Design.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Fast Algorithms For Hierarchical Range Histogram Constructions
Chapter 6 Sampling and Sampling Distributions
Analysis and Modeling of Social Networks Foudalis Ilias.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Los Angeles September 27, 2006 MOBICOM Localization in Sparse Networks using Sweeps D. K. Goldenberg P. Bihler M. Cao J. Fang B. D. O. Anderson.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
A New Force-Directed Graph Drawing Method Based on Edge- Edge Repulsion Chun-Cheng Lin and Hsu-Chen Yen Department of Electrical Engineering, National.
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
Chapter 7 Sampling and Sampling Distributions
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
CS 376b Introduction to Computer Vision 04 / 04 / 2008 Instructor: Michael Eckmann.
Evaluating Hypotheses
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 11: Inference for Distributions
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
The Shortest Path Problem
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Separate multivariate observations
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
The Erdös-Rényi models
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Applying Science Towards Understanding Behavior in Organizations Chapters 2 & 3.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
by B. Zadrozny and C. Elkan
Network Aware Resource Allocation in Distributed Clouds.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Correlation & Regression
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
3. SMALL WORLDS The Watts-Strogatz model. Watts-Strogatz, Nature 1998 Small world: the average shortest path length in a real network is small Six degrees.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Analyzing the Vulnerability of Superpeer Networks Against Attack Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
1 Introduction What does it mean when there is a strong positive correlation between x and y ? Regression analysis aims to find a precise formula to relate.
How Do “Real” Networks Look?
Brief Announcement : Measuring Robustness of Superpeer Topologies Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Discovering Hidden Groups in Communication Networks Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail William Wallace.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Random Walk for Similarity Testing in Complex Networks
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Approximating the MST Weight in Sublinear Time
Comparing Three or More Means
How Do “Real” Networks Look?
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Presentation transcript:

COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept. University of Maryland {chanhyun, VS Subrahmanian UMIACS & Computer Science Dept. University of Maryland

Motivation 2 Henchmen Let’s assume there is a criminal network and we want to find a leader of this group using the henchmen. Who is the gang leader of this network? We may want to use centrality measures to identify important criminals in the network

Motivation 3 Closeness centrality Betweenness centrality We can think of the vertex of a suspicious person as the leader in this network But, if the leader is smart and understand(or know) the measures?

Motivation 4 If the leader is sufficiently smart, he may - Hide in a crowd of similar actors - Have enough connections with the henchmen The gang leader would be not like this vertex The gang leader would be like these vertices

Motivation Typically, if we plot centrality values and % of nodes in graph G, the distribution obeys a power law and has a long tail (closeness centrality is an exception). A vertex that wants to stay “hidden” does not want to stick out in the long tail. It would prefer to be squarely near the “high percentage” part of the distribution. centrality value 0 % of nodes Nodes that want to stay “unnoticed” don’t want to be in this part of the distribution. To stay “unnoticed”, nodes want to stay here But in order to communicate with the their own subnetwork with lower probability of discovery, they need to be more to the right 5

Motivation 6 Betweenness centrality Eigenvector centrality Degree centrality Closeness centrality But a smart leader may know various centrality measures, so we need to consider a set C of centrality measures to identify the smart leader

In this paper Propose covertness centrality measure. Has two major components: How “common” a vertex is with regard to a set C of centrality measures How well the vertex can “communicate” with a user-specified set I of vertices Develop algorithms to compute covertness centrality Exact and heuristic algorithms Evaluate the measures and the algorithms 7

Commonness Measures how well an actor a hides in a crowd of similar actors CM ( C, a ) denotes the commonness of an actor a from the given centrality measures C =( C 1, C 2, …, C k ) 8 Betweenness centrality Eigenvector centrality Degree centrality Closeness centrality CM C, a The common-ness value of actor a

( ) ) ( Instead of giving specific commonness functions, we first identify axioms that all commonness measures should satisfy Axioms for Commonness Property 1. Optimal Hiding. If all vertices have the same centrality according to all measures, then all vertices should have commonness of 1. Property 2. No Hiding. If the centrality of v is sufficiently different from the centrality of all other vertices according to all centrality measures, then v’s commonness is 0. Property 3. If the values of a centrality measure for all vertices are the same, then the commonness values for all vertices should be the same after removing the centrality measure Commonness 9 ( ( ) ) ( ) ) (

We suggest two measures to compute CM ( C, a ) CM 1 ( C, a ) Compute similar actors of actor a for each centrality measure separately CM 2 ( C, a ) Compute similar actors of actor a with all centrality measures simultaneously 10

Commonness : Similar actors We consider actors similar to actor a w.r.t. one centrality measure C i The probability that a randomly chosen actor excluding the actor a has a centrality C i value within the interval I i is 11 ← Low Ci(a)Ci(a) High → C i centrality values - σ i : standard deviation of C i values - α : the range of similar values a C i ( a ) - α σ i Interval I i C i ( a ) + α σ i Actors similar to actor a for centrality C i

Define commonness as the sum of the squared distances separately for each centrality Commonness : CM We compute the probability for each centrality measure the commonness value of actor A should be larger than the other’s value if the deviation of probabilities of actor A is smaller than the other’s deviation. Because even if the summations of the probabilities are same, - Why not simple summation of the probabilities? k : the number of centrality measures in C

Commonness : CM 1 Satisfies Property 1. Optimal Hiding If the centrality values of all actors are same, the number of the similar actors is |V|-1. So the commonness values of all vertices is 1. Satisfies Property 2. No Hiding If the centrality values of all actors are not similar to each other, the number of similar actors is 0. So the commonness value of all vertices is 0. Does not satisfy Property Let’s assume C ={ C 1, C 2 }, the number of similar actors of actor v for C 1 is r and the number of similar actors of actor v for C2 is |V|

Commonness : CM 1 14 We compute the CM 1 values using Betweenness, Closeness, Degree and Eigenvector centrality measures for the criminal network. We can find some suspicious people who hide in a crowd. But it is not clear. There is a problem. - α =1

Commonness : CM 1 If the centrality measures are very different, measuring the similar actors independently for each centrality measure can lead to problems. 15 % of node Normalized centrality value The vertices will have good commonness values even if the number of similar actors for C 2 is small C 1 centrality C 2 centrality The vertices will have good commonness values even if the number of similar actors for C 1 is small

We can also consider actors similar to a given actor a using all given centrality measures C simultaneously. Commonness : Similar actors 16 Ci(a)Ci(a) High → C i ( a ) - α σ i Interval I i - σ i, σ j : standard deviations of C i values and C j values C i centrality values - α : the range of similar values a C i ( a ) + α σ i Similar actors of actor a C j centrality values High ↑ Cj(a)Cj(a) C j ( a ) + α σ j C j ( a ) - α σ j Interval I j

Commonness : CM 2 Define commonness as the fraction of all actors that are similar to actor a in all considered dimensions 17 - The centrality values of similar actors are within all intervals generated from all centrality values of actor a - We compute the probability that a randomly chosen actor excluding the actor a has centrality values within all the intervals from all the centrality values of a

Commonness : CM 2 Even if the centrality measures are not correlated, 18 % of node Normalized centrality value the vertices will have small commonness values C 1 centrality C 2 centrality a b

Commonness : CM 2 Satisfies Property 1. Optimal Hiding If the centrality values of all actors are the same, the number of similar actors is |V|-1. So the commonness values of all vertices are 1. Satisfies Property 2. No Hiding If the centrality values of all actors are not similar to each other, the number of similar actors is 0. So the commonness values of all vertices are 0. Satisfies Property Let’s assume C ={ C 1, C 2 }, the interval of actor v for C 1 is I 1 and the values of C 2 for all vertices are the same -The intervals of all vertices for C 2 are same -So the number of similar actors for C 1 and the number of similar actors for C 1 and C 2 are the same

Commonness : CM 2 20 We compute the CM 2 values using Betweenness, Closeness, Degree and Eigenvector centrality measures for the criminal network. - α =1 Now we can find clearly some suspicious people who hide in a crowd

Communication Potential 21 The gang leader has enough connections to communicate with the henchmen for achieving their objective For measuring the communication ability precisely, we need to use a subgraph, induced by some vertices, of the criminal network A subgraph of G using the henchmen G

Communication Potential 22

Communication Potential 23 We compute CP 1 values using Closeness centrality CP 1 A subgraph of G using the henchmen G We can find some people who have good communication ability in the subgraph that contains the henchmen

Communication Potential 24 CP 1 We compute CP 1 values using Betweenness centrality A subgraph of G using the henchmen Some people have better communication ability in the subgraph that contains the henchmen than others

Communication Potential 25 G We compute CP 2 values using Closeness centrality CP 2 We can find some people who have good communication ability in the network

Covertness Centrality Covertness centrality is a combination of Commonness and Communication potential Let’s assume CP is normalized to the interval [0,1] like CM 26  measures the importance of Commonness vs. importance of Communication Potential -τ is a minimum level of commonness set by the user -if CM < τ, CP is irrelevant to CC -If τ =0, CC is a classic trade-off between the CM and the CP

Covertness Centrality 27 Who is the gang leader of this network? CC We compute CC values ( λ=0.5 and τ=0 ) using CM 2 ( α=1 ) and CP 1 (Closeness centrality) The guy is the most suspicious person who leader who - Hides in a crowd of similar actors - Has enough connections to communicate with others including the henchmen

Covertness Centrality 28 L 0.2    The CC values of vertices that have a high CP value are decreased according to the increase of The CC values of vertices that have a high CM value are increased according to the increase of CC values( τ=0 ) varying the  (CM 2 ( α=1 ) and CP 2 (Closeness centrality))

CC COMPUTATION Exact computation Simple random sampling method The sample vertices are randomly chosen Systematic sampling method Order all vertices by degree. Then, select k vertices by taking every n/k -th vertex starting from a start vertex randomly selected among the first n/k -th vertices 29 The first n/k -th vertices High degree Low degree A start vertex … n/k -th vertex

Experimental Evaluation We analyze the properties of the covertness centrality and the algorithms Dataset Python is used for CM 1 and CM 2 implementation Evaluated on a standard desktop machine 30 Network#Vertices#EdgesType URV Youtube 40k friendship Youtube 60k friendship

Evaluation : Measures 31 Scatter plot of the commonness scores according to CM 1 and CM 2 in relation to closeness centrality Degree, Closeness, Betweenness and Eigenvector centralities URV dataset CM 1 values are high because of other centrality values

Evaluation : Measures Distribution of CC scores depend on different λ values CM 2 : Degree, Betweenness, Closeness and Eigenvector centrality CP : closeness centrality URV dataset 32 - Commonness is strongly negatively correlated to the base centrality measures

Evaluation : Measures Distribution of CC scores depend on different λ values CM 2 : Degree, Betweenness, Closeness and Eigenvector centrality CP : closeness centrality URV dataset 33 - Covertness centrality is similar to the CP values when  is small

Evaluation : Compute time & Accuracy 34 The runtime scales linearly with the number of vertices if the centrality values are already computed. Comparison of the rank correlation between the exact algorithm and the sampling algorithms for the URV dataset. Very high correlation! URVYoutube 40kYoutube 60k Computing time0.1second2 seconds3 seconds

Evaluation : Accuracy Accuracy of sampling methods measured with Kendall’s τ rank correlation coefficient. Very high correlation! 35 - CM 1, 100 runs for the simple sampling method - Systematic sampling method is better than the simple sampling method - CM 2, 100 runs for the simple sampling method

Conclusion Defined a new concept of covertness centrality combining Commonness Measures how well an actor hides in a crowd of similar actors w.r.t. a given set of centrality measures Proposed axioms that any good commonness function should satisfy. Proposed two new commonness measures CM 1 and CM 2 and showed that CM 2 satisfies all the axioms. Communication Potential Measures the ability to communicate and cooperate to achieve a common objective Used sampling methods for computing the covertness centrality Evaluated the measure and the sampling methods on YouTube and (URV) data. 36

Questions 37

Related works R. Lindelauf, P. Born, and H. Hamers, “The influence of secrecy on the communication structure of covert networks,” Social Networks, vol. 31, no. 2, pp , 2009 Deal with the optimal communication structure of terrorist organizations when considering the tradeoff between secrecy and operational efficiency Determine the optimal communication structure which a covert network should adopt J. Baumes, M. Goldberg, M. Magdon-Ismail, and W. Wallace, “Discovering Hidden Groups in Communication Networks” in Intelligence and Security Informatics, 2004, vol.3073, pp Suggest models and e ffi cient algorithms for detecting groups which attempt to hide their functionality – hidden groups Use the property that hidden groups’ communications are not random because those are planed and coordinated 38

Commonness : CM 1 Define commonness as the sum of the squared distances separately for each centrality 39 The probability for each centrality measure - The commonness value of actor A should be larger than the other’s value if the deviation of probabilities of actor A is smaller than the other’s deviation. Why not the simple summation of the probabilities? Because even if the summations of the probabilities are same,

Commonness : CM 2 Define commonness as the fraction of all actors that are similar to actor v in all considered dimensions 40 - The similar actors ’ centrality values are within the intervals generated from all centrality values of actor v