Communities & Roles Two types ways of identifying nodes that “go together” a)Communities/Groups a)Cohesive subgroups literature: start w. Freeman b)Network.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Clustering.
Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Block Modeling Overview Social life can be described (at least in part) through social roles. To the extent that roles can be characterized by regular.
Designing Research Concepts, Hypotheses, and Measurement
Introduction to Social Network Analysis Lluís Coromina Departament d’Economia. Universitat de Girona Girona, 18/01/2005.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Tirgul 9 Amortized analysis Graph representation.
Beginning the Research Design
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Hashing General idea: Get a large array
Radial Basis Function Networks
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Clustering Unsupervised learning Generating “classes”
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Social Sub-groups II Outline “How?” - Review group-finding strategies - “Evade” – PCA (=SVD for the math-oriented!) - Theory Problem: What should group-structure.
Overview Granovetter: Strength of Weak Ties What are ‘weak ties’? why are they ‘strong’? Burt: Structural Holes What are they? What do they do? How do.
Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.
Social Sub-groups Overview Background: Continue discussion of social subgroups. Wayne Baker Social structure in a place where there should be none Scott.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Finding dense components in weighted graphs Paul Horn
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Victor Lee.  What are Social Networks?  Role and Position Analysis  Equivalence Models for Roles  Block Modelling.
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Local Networks Overview Personal Relations: GSS Network Data To Dwell Among Friends Questions to answer with local network data Mixing Local Context Social.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester
Social Sub-groups Overview Substantive papers: Wayne Baker Social structure in a place where there should be none Scott Feld What causes clustering in.
Centrality in Social Networks Background: At the individual level, one dimension of position in the network can be captured through centrality. Conceptually,
Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Theorists from Simmel to Homans have approached the question.
Slides are modified from Lada Adamic
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Graphs & Matrices Todd Cromedy & Bruce Nicometo March 30, 2004.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Structural Holes & Weak Ties
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
The normal approximation for probability histograms.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Graph clustering to detect network modules
Unsupervised Learning
Groups of vertices and Core-periphery structure
Social Networks Analysis
Local Networks Overview Personal Relations: Core Discussion Networks
CJT 765: Structural Equation Modeling
Greedy Algorithm for Community Detection
Social Balance & Transitivity
Hierarchical clustering approaches for high-throughput data
Social Balance & Transitivity
Social Sub-groups Overview Background:
Block Modeling Overview
Text Categorization Berlin Chen 2003 Reference:
Clustering.
Presentation transcript:

Communities & Roles Two types ways of identifying nodes that “go together” a)Communities/Groups a)Cohesive subgroups literature: start w. Freeman b)Network Operationalization a)Graph Theoretic b)Heuristic Algorithms a)Graph search & modularity b)Cluster analysis c)LDA/Principle components c)Fundamental limitations b)Roles/Positions a)Literature grounded in structural anthropology & kinship b)Roles as relations imply paired sets c)Goal is to identify nodes with common patterns a)Original is CONCOR b)Alternatives based on triads, other clusterings

Social Sub-groups Lin Freeman: The sociological concept of “Group” Focus on collectivities that are: “Relatively small, informal, and involve close personal ties.” What we would call “Primary Groups” What (network) structure characterizes such a group? Goal: Identify (a) non-overlapping groups that allow one to (b) identify internal group structure.

Social Sub-groups Lin Freeman: The sociological concept of “Group” Winship’s Model: 1) Assign people to equivalence classes that are hierarchically nested:

Social Sub-groups Lin Freeman: The sociological concept of “Group” In words, this means that whatever metric you define, a person is closer to themselves than to anyone else, that the relation be symmetric, and that triads be transitive (which, given the symmetric condition, means that they be complete). You can then identify partitions by scaling the proximity, such that these three conditions are met. Winship’s Model:

Social Sub-groups Lin Freeman: The sociological concept of “Group” A B C D E F G H I J K A B C D E F G H I J K Winship’s Model:

Social Sub-groups Lin Freeman: The sociological concept of “Group” total {A-G} {H-K} {A-C} {D-G} Winship’s Model:

Social Sub-groups Lin Freeman: The sociological concept of “Group” Granovetter’s Model: Proceed exactly as in Winship, but treat intransitivity differently when looking at strong or weak ties. If x and y are strongly connected, and y and z are strongly connected, then x and z should be at least weakly connected.

An example of a graph fitting the prohibition against G- intransitive relations. Social Sub-groups Lin Freeman: The sociological concept of “Group” Granovetter’s Model:

Social Sub-groups The Davis - “Old South” Example

Social Sub-groups The Davis - “Old South” Example: Ties > 2

Social Sub-groups The Davis - “Old South” Example: Ties > 3

Social Sub-groups The Davis - “Old South” Example: Ties > 4 Meets the G-transitivity condition

Social Sub-groups The Davis - “Old South” Example: Ties > 5 Stronger than the G-transitivity condition

Social Sub-groups Lin Freeman: The sociological concept of “Group” Freeman argues that the G-intransitivity model fits the data best for each of the 7 groups he studies. Substantively, the types of groups this model predicts are very similar to those predicted by the general transitivity model, except re-cast as a valued relation. Empirically, if you want to identify groups based on levels like this, you can use PAJEK and walk through the model in just the same way as we did with “Old South” or you can use UCI-NET (or program it, it’s not hard)

Methods: How do we identify primary groups in a network? A) Classic graph theoretical methods: Cliques and extensions of cliques Cliques k-cores k-plexes Freeman (1992) Models K-components (we talked about these already) B) Algorithmic methods: search through a network trying to maximize for a particular pattern (I.e. like Frank & Yasumoto) Adjust assignment of actors to groups until a particular pattern of ties (block diagonal, usually) is identified. Standard models: - Factions (UCI-NET) - KliqueFinder (Frank) -RNM/CROWDS/JIGGLE (Moody) -Principle component analysis (PCA) -Flow models (MCL) -Modularity Maximization routines - General Distance & Clustering Methods

Methods: How do we identify primary groups in a network? Graph Theoretical Models. Start with a clique. A clique is defined as a maximal subgraph in which every member of the graph is connected to every other member of the graph. Cliques are collections of nodes where density = 1.0. Properties of cliques: Density: 1.0 Everyone connected to n-1 alters Distance between every pair is 1 Ratio of within group ties to between group ties is infinite All triads are transitive

Methods: How do we identify primary groups in a network? Graph Theoretical Models. In practice, complete cliques are not very useful. They tend to overlap heavily and are limited in their size. Graph theorists have thus relaxed the complete connectivity requirement (with varying degrees of success). See the Moody & White paper on cohesion for a discussion of many of these attempts.

Methods: How do we identify primary groups in a network? Graph Theoretical Models. k-cores: Every person connected to at least k other people. Ideally, they would look something like this (here two 3- cores). However, adding a single tie from A to B would make the whole graph a 3-core

Methods: How do we identify primary groups in a network? Graph Theoretical Models. Extensions of this idea include: K-plex: Every member connected to at least n-k other people in the graph (recall in a clique everyone is connected to n-1, so this relaxes that condition. n-clique: Every person is connected by a path of N or less (recall a clique is with distance = 1). N-clan: same as an n-clique, but all paths must be inside the group. I’ve never had much luck with any of these methods empirically. Real data is usually too messy to work well. You should try them, and gain some intuition for yourself. The place to start is in UCINET.

Methods: How do we identify primary groups in a network? UCINET will compute all of the best-known graph theoretic treatments for subgroups Graph Theoretical Models.

Methods: How do we identify primary groups in a network? Consider running different methods on a known group structure: Graph Theoretical Models.

Methods: How do we identify primary groups in a network? Graph Theoretical Models.

Methods: How do we identify primary groups in a network? Cliques Graph Theoretical Models.

Methods: How do we identify primary groups in a network? The only way to get something meaningful from this is to analyze the clique overlap matrix, which is what the “Clique by partion” dataset does, using cluster analysis Cliques

Heuristic strategies for identifying primary groups: Search: 1) Fit Measure: Identify a measure of groupness (usually a function of the number of ties that fall within group compared to the number of ties that fall between group). 2) Algorithm to maximize fit. Once we have the index, we need a clever method for searching through the network to maximize the fit. Destroy: Break apart the network in strategic ways, removing the weakest parts first, what’s left are your primary groups. See “edge betweeness” “MCL” Evade: Don’t look directly, instead find a simpler problem that correlates: Examples: Generalized cluster analysis, Factor Analysis, RM. Methods: How do we identify primary groups in a network?

Segregation Index ( Freeman, L. C "Segregation in Social Networks." Sociological Methods and Research ) Freeman asked how we could identify segregation in a social network. Theoretically, he argues, if a given attribute (group label) does not matter for social relations, then relations should be distributed randomly with respect to the attribute. Thus, the difference between the number of cross-group ties expected by chance and the number observed measures segregation. Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Consider the (hypothetical) network below. There are two attributes in this network: people with Blue eyes and Brown eyes and people who are square or not (they must be hip). Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Segregation Index Mixing Matrix: Blue Brown Blue 6 17 Brown Hip Square Hip 20 3 Square 3 30 Seg = Seg = 0.78 Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Segregation Index One problem with the segregation index is that it is not ‘margin free.’ That is, if you were to change the distribution of the category of interest (say race) by a constant but not the core association between race and friendship choice, you can get a different segregation level. One antidote to this problem is to use odds ratios. In this case, and odds ratio tells us the relative likelihood that two people in the same category will choose each other as friends. Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Odds Ratios The odds ratio tells us how much more likely people in the same group are to nominate each other. You calculate the odds ratio based on the number of ties in a group and their relative size, based on the following table: Member of: Same Group Different Group Friends A B Not Friends C D OR = AD/ BC Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Hip Square Hip Square Observed Odds Ratios There are 6 hip people and 9 square people in this network. This implies that there are the following number of possible ties in the network: Group Same Dif Yes 50 6 Friend No Hip Square Hip Square Diagonal = n i (n i -1) off diagonal = n i 2 OR = (50)102 / 52(6) = 16.35

Log(Same-Sex Odds Ratio) Friendship Segregation Index Segregation index compared to the odds ratio: r=.95 Complete Network Analysis Network Connections: Social Subgroups

The second problem is that the Segregation index has no clear maximum – if every node is assigned to a single group the value can be higher than if everyone is assigned to the “right” group. -- it tends to have a monotonically changing score. This means you can’t just keep adjusting nodes until you see a best fit, but instead have to look for changes in fit. The modularity score solves this problem by re-organizing the expectation in a way that forces the value to 0 if everyone is in a single group. Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

We can also measure the extent that ties fall within clusters with the modularity score: Where: s indexes clusters in the network l s is the number of lines in cluster s d s is the sum of the degrees of s L is the total number of lines M has the advantage of going to 0 if there is only 1 group, which means maximizing the score is sensible Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

We can also measure the extent that ties fall within clusters with the modularity score: Where: m is the number of edges k is the degree A ij is the edge weight between ij  (c i c j ) is 1 if in the same group  is the resolution parameter Q has the advantage of going to 0 if there is only 1 group, which means maximizing the score is sensible. Note resolution parameter means N of groups is not truly “automatic” Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Modularity Scores Comparison to Segregation Index – comparing values for known solutions Modularity Score Plotted against Segregation Index for various nets Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Number of groups  In-group Density  Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Louvain Method (Blondel et al) in PAJEK & R Factions in UCI-NET Multiple options for the exact factor maximized. I recommend either the density or the correlation function, and I would calculate the distance in each case. Frank’s KliqueFinder Moody’s crowds / Jiggle Generalized blockmodel in PAJEK iGraph (R) has a couple that see this sort (Fast-Greedy is good) Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Factions in UCI-NET Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

Factions in UCI-NET

Reduced BlockMatrix Fit perfectly

UCINET Biggest drawbacks of FACTIONS are: A)SLOW B)Have to specify the number of groups. Methods: How do we identify primary groups in a network? Search: Optimize a partition to fit

PAJEK – Generalized Blockmodel

Fits fine, but it’s slow!

R – “Fast Greedy” This is a direct optimization of Modularity

PAJEK – “Louvain” This is a direct optimization of Modularity

Cluster analysis In addition to tools like FACTIONS, we can use the distance information contained in a network to cluster observations that are ‘close’ to each other. In general, cluster analysis is a set of techniques that allows you to identify collections of objects that are simmilar to each other in some degree. A very good reference is the SAS/STAT manual section called, “Introduction to clustering procedures.” ( ) ( See also Wasserman and Faust, though the coverage is spotty). We are going to start with the general problem of hierarchical clustering applied to any set of analytic objects based on similarity, and then transfer that to clustering nodes in a network. Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis Imagine a set of objects (say people) arrayed in a two dimensional space. You want to identify groups of people based on their position in that space. How do you do it? How Cool you are How Smart you are

Start by choosing a pair of people who are very close to each other (such as 15 & 16) and now treat that pair as one point, with a value equal to the mean position of the two nodes. x Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Now repeat that process for as long as possible. Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

This process is captured in the cluster tree (called a dendrogram) Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

As with the network cluster algorithms, there are many options for clustering. The three that I use most are: Ward’s Minimum Variance -- the one I use almost 95% of the time Average Distance -- the one used in the example above Median Distance -- very similar Again, the SAS manual is the best single place I’ve found for information on each of these techniques. Some things to keep in mind: Units matter. The example above draws together pairs horizontally because the range there is smaller. Get around this by standardizing your data. This is an inductive technique. You can find clusters in a purely random distribution of points. Consider the following example. Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

data random; do i=1 to 20; x= rannor (0); y=rannor(0); output; end; run; The data in this scatter plot are produced using this code: Cluster analysis Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis Resulting dendrogram Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis Resulting cluster solution

Cluster analysis Cluster analysis works by building a distance matrix between each pair of points. In the example above, it used the Euclidean distance which in two dimensions is simply the physical distance between the points in a plot. Can work on any number of dimensions. To use cluster analysis in a network, we base the distance on the path- distance between pairs of people in the network. Consider again the blue-eye hip example: Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis Distance Matrix Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

The distance matrix implies a space that nodes are embedded within. Using something like MDS, we can represent the space implied by the distance matrix in two dimensions. This is the image of the network you would get if you did that. Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis When you use variables, the cluster analysis program generates a distance matrix. We can, instead use the network distance matrix directly. If we do that with this example network, we get the following:

Cluster analysis

In SAS you use two commands to get a cluster analysis. The first does the hierarchical clustering. The second analyzes the cluster output to create the tree. Example 1. Using variables to define the space (like income and musical taste): proc cluster data=a method=ave out=clustd std; var x y; id node; run; proc tree data=clustd ncl=5 out=cluvars; run;

Cluster analysis Example 2. Using a pre- defined distance matrix to define the space (as in a social network). You first create the distance matrix (in IML), then use it in the cluster program. proc iml; %include 'c:\moody\sas\programs\modules\reach.mod'; /* blue eye example */ mat2=j(15,15,0); mat2[1,{ }]=1; /* lines cut here */ mat2[15,{ }]=1; dmat=reach(mat2); mattrib dmat format=1.0; print dmat; id=1:nrow(dmat); id=id`; ddat=id||dmat; create ddat from ddat; /* creates the dataset */ append from ddat; quit; data ddat (type=dist); /* tells SAS it is a distance */ set ddat; /* matrix */ run;

Cluster analysis Example 2. Using a pre-defined distance matrix to define the space (as in a social network). Once you have it, the cluster program is just the same. proc cluster data=ddat method=ward out=clustd; id col1; run; proc tree data=clustd ncl=3 out=netclust; copy col1; run; proc freq data=netclust; tables cluster; run; proc print data=netclust; var col1 cluster; run;

Moody’s CROWDS algorithm combines the search approach with an initial cluster analysis and a routine for determining how many clusters are in the network. It does so by using the Segregation index and all of the information from the cluster hierarchy, combining two groups only if it improves the segregation fit for both groups. Methods: How do we identify primary groups in a network? Evade: Find a “cheap” indicator, and cluster/optimize that

The logic behind these algorithms is that you remove some weak links and see what is left. Most popular is the “edge betweenness” algorithm. Methods: How do we identify primary groups in a network? Destroy: Remove lines/nodes until what is left over reveals something of interest

UCINET has the MCL (Markov clustering, based on flow betweenness in a random walk sense) algorithm programmed. Methods: How do we identify primary groups in a network? Destroy: Remove lines/nodes until what is left over reveals something of interest

“Evade” – look for something that correlates with your split Newman’s Leading Eigenvector (in R – this is the “bottom” partition, not the best fit, which aggregates/joins from here)

The Recursive Neighborhood Means algorithm creates the variables that are then used in the cluster analysis to identify groups. Start by randomly assigning every node a value on k variables Then calculate the average for each variable for the people each person is tied to Repeat this process multiple times  This results in people who have many ties to each other having similar values on the k random variables. This similarity then gets picked up in a cluster analysis. “Evade” – look for something that correlates with your split

Example of the RNM procedure Time 1 Time 2 Time 3

Example of the RNM procedure

As an example, consider the process active on a known-to-be clustered networks, starting with 2 random k variables. You get something like this, where the nodes are now placed according to their resulting values on the 2 variables.

The algorithm does a good job uncovering clusters in fake datasets.

Compared to real data: RNM Partition on the Prison data

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. SES IQ Income Math Score   We often use simple indicators and assume they measure our concepts

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. SES IQ Income Reading Score Occupation Highest Degree House Size Languages Spoken Math Score  But we don’t have to! We can imagine that each latent concept causes our indicators, and build a measurement model.

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. But we don’t have to! We can imagine that each latent concept causes our indicators, and build a measurement model.

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. In a network, we assume that the tie pattern is an imperfect measure of an underlying latent structure that we can explain with similar factors. Instead of lots of “measurements” we have many columns in the adjacency (sim) matrix, and we can summarize that with factor scores. -- works best if the similarity matrix has more information – so multiple account data are perfect. – or you can transform the data in some way to more information (like use a distance matrix.

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. /* this section builds info on how to weight dyads for in-group, out-group. */ twostp=((adjmat+adjmat`)>0)*adjmat; /* make it either direction w. the first term */ ttie=adjmat#twostp; /*=1 if tie contributes to a transitive triple */ ttie=((ttie+ttie`)); adjraw=adjmat; adjmat=(adjmat+adjmat`); /* force it to be symetric, 1=asym 2=reciped */ adjmat=adjmat-diag(adjmat); /* remove any self ties */ d2=reachlim((adjmat>0),3); /* re-weight to bias toward recip ties */ wm_4 = (d2=1)#(adjmat=2)#8; /* recip direct ties */ wm_2a = (d2=1)#(adjmat=1)#4; /* unrecip direct ties */ wm_1 = 2*(d2=2);/* ties 2-steps out */ wm_p5 = 0*(d2=3); /* ties 3-steps out - note it's zeroed out here*/ wm=wm_4+wm_2a+wm_1++wm_p5+(3*(ttie/(max(ttie)))); /* transitivity is at the end*/ wm=wm-diag(wm); Here is code I used in the PROSPER data:

Strategies for identifying primary groups: Evade Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data. Here is code I used in the PROSPER data: /* run factor analysis. Note nfactors is a high value, should only take those w. EV > 2, but this gives us room... */ proc factor rotate=varimax min=&minev out=factset data=symmat nfactors=175 outstat=fscores noprint; run; quit;

Strategies for identifying primary groups: Evade Result:

Strategies for identifying primary groups: Evade Result: Each column is a person, these are the factor loadings for each person on each retained factor.

Strategies for identifying primary groups: Evade Result: Sociogram for a single school

Strategies for identifying primary groups: Evade Result: Sociogram for a single school. Problem is that there are no necessary connectivity checks – you can get “groups” that are disconnected. Biggest strengths are: a)Really fast b)Allows for overlapping groups c)Gives you “embeddedness” scores based on factor loadigs

The Crowds Algorithm 1. Identify members of network bicomponents, remove people not included. 2. Cluster the reduced network. - Identify optimal number of groups: (TREEWALK) - For each level of the cluster partition tree do (BFS): -Move up the tree from smaller to larger groups. -If the fit for both groups is improved by joining them then do so. -If not, then identify group at that level. -End TREEWALK. Do until all groups are identified (GLOBAL LOOP): 3. Evaluate node fit. Do until nodes cannot be moved: For each identified cluster do (GRPCHECK): - Ensure group is a bi-component. -Calculate effect on group a of moving node j to group a. -Calculate effect on j's present group of removing j. - If there is a positive net gain to moving j from own group to a, then do so. End. 4. Identify Bridging members. -If removing j from group a would improve the fit of group a, AND assigning j to any other group would lower the fit for that group, then j is considered a bridge. Place all bridges in separate class. 5. Group Check. Check returns to combining groups. IF merging groups would improve the fit of all groups to be merged, then do so. - Evaluate bridges, to be sure that they are not bridging two groups that have now merged. End Global loop. Strategies for identifying primary groups: Hybrid

Social Sub-groups Frank & Yasumoto: Action and Structure They expect to find evidence of enforceable trust within social subgroups and evidence of reciprocity between such groups. To do so, they must identify primary subgroups within the network. They do so using a density based criterion. Frank’s algorithm iteratively assigns nodes to subgroups until a parameter that maximizes in-group density is reached. Basic model is: logit(Y ij )=  +  ij Seek to find an assignment of nodes to groups (g) that maximizes fit. This results in a ‘block diagonal’ adjacency matrix, where most of the ties fall along the diagonal.

Relations among the French Financial Elite (as drawn by F&Y) Group-weighted MDS Relations within group are weighted heavier than between to generate this picture:

Return to first question: What is a group? The simple notions of a complete clique are difficult to square w. real-world data. Density is an indicator, but subject to over-grouping (no connectivity) and star-patterns. Groups are likely internally differentiated – with “core” vs. “periphery” members Most sociological theories of groups rest on transitive closure and short distances There’s a sense that members are equal – a tight-knit group The group should be fairly small – face-to-face scale The social processes underlying the group turn on reciprocity, trust, communication, homogeneity of norms & beliefs. Almost all require a comparative set: in-group to out-group. It is relational not essential. Cross-cutting social circles – would lead us to expect overlapping groups, but in practice most methods do not do that, as it’s analytically too cumbersome. Practically, group detection is hard and most methods will give you (slightly) different results. You can compare results using a Rand statistic (proportion of pairs similarly categorized in two partitions), but for small settings these differences can matter.

Fast & GreedyLouvainEdge Between Markov ChainLeading Eigenvect RNM (CROWDS)

Overview Social life can be described (at least in part) through social roles. To the extent that roles can be characterized by regular interaction patterns, we can summarize roles through common relational patterns. Identifying these sets is the goal of block-model analyses. Nadel: The Coherence of Role Systems Background ideas for White, Boorman and Brieger. Social life as interconnected system of roles Important feature: thinking of roles as connected in a role system = social structure White, Boorman and Breiger: Social structure from Multiple Networks I. Blockmodels of Roles and Positions The key article describing the theoretical and technical elements of block-modeling Roles & Positions

Nadel: The Coherence of Role Systems Elements of a Role: Rights and obligations with respect to other people or classes of people Roles require a ‘role compliment’ another person who the role- occupant acts with respect to Examples: Parent - child, Teacher - student, Lover - lover, Friend - Friend, Husband - Wife, etc. Nadel (Following functional anthropologists and sociologists) defines ‘logical’ types of roles, and then examines how they can be linked together.

Nadel describes how various roles fit together to form a coherent whole. Roles are collected in people through the ‘summation of roles” Necessary: Some roles fit together necessarily. For example, the expected interaction patterns of “son-in-law” are implied through the joint roles of “Husband” and “Spouse-Parent” Coincidental: Some roles tend to go together empirically, but they need not (businessman & club member, for example). Distinguishing the two is a matter of usefulness and judgement, but relates to social substitutability. The distinction reverts to how the system as a whole will be held together in the face of changes in role occupants. Nadel: The Coherence of Role Systems

Given that roles can be identified as ‘going together’ is there a logic that underlies their connection? Nadel uses a functional description based on ascription and achievement:

Nadel: The Coherence of Role Systems And he gives an example of a simple role system: Nadel’s task is to make sense of these roles, to identify how they are interconnected to form a system -- a coherent structure. This is a difficult task to do analytically, as the eventual failure of Parsonian functionalism shows.

White et al: From logical role systems to empirical social structures With the fall of parsons and functionalism in the late 60s, many of the ideas about social structure and system were also tossed. White et al demonstrate how we can understand social structure as the intercalation of roles, without the a priori logical categories. Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might represent a family as:

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might see an exchange network such as: Provides food for Romantic Love Bickers with White et al: From logical role systems to empirical social structures

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Which is a summary of a (sort of) family. H W C C C Provides food for Romantic Love Bickers with (and there are, of course, many other relations inside the family) White et al: From logical role systems to empirical social structures

The key idea, is that we can express a role through a relation (or set of relations) and thus a social system by the inventory of roles. If roles equate to positions in an exchange system, then we need only identify particular aspects of a position. But what aspect? Block modeling focuses on equivalence positions. Structural Equivalence Two actors are structurally equivalent if they have the same types of ties to the same people. That is, they have the exact same ties.

Structural Equivalence A single relation

Structural Equivalence Graph reduced to positions

Alternative notions of equivalence Instead of exact same ties to exact same alters, you look for nodes with similar ties to similar types of alters

Blockmodeling: basic steps In any positional analysis, there are 4 basic steps: 1) Identify a definition of equivalence 2) Measure the degree to which pairs of actors are equivalent 3) Develop a representation of the equivalencies 4) Assess the adequacy of the representation

1) Identify a definition of equivalence Structural Equivalence: Two actors are equivalent if they have the same type of ties to the same people.

Automorphic Equivalence: Actors occupy indistinguishable structural locations in the network. That is, that they are in isomorphic positions in the network. Two graphs are isomorphic if there is some mapping of nodes to positions that equates the two. For example, all 030T triads are isomorphic. A graph is automorphic, if there are patterns internal to the graph that are equated (if the mapping goes from the set of nodes in the graph to other nodes in the graph). In general, automorphicaly equivalent nodes are equivalent with respect to all graph theoretic properties (I.e. degree, number of people reachable, centrality, etc.) and are structurally indistinguishable. Key difference from structural equivalence is relaxing of the necessity of being linked to the same nodes. 1) Identify a definition of equivalence

Automorphic Equivalence:

Regular Equivalence: Regular equivalence does not require actors to have identical ties to identical actors or to be structurally indistinguishable. Actors who are regularly equivalent have identical ties to and from equivalent actors. If actors i and j are regularly equivalent, and actor i has a tie to/from some actor, k, then actor j must have the same kind of tie to/from some actor l, and actors k and l must be regularly equivalent. So effectively this is a recursive definition, and not necessarily unique. There may be several ways to assign actors to clusters that satisfy this definition. (This is related to graph colorings, regular equivalence definitions are those where nodes have neighbors of the same color). 1) Identify a definition of equivalence

Regular Equivalence: There may be multiple regular equivalence partitions in a network, and thus we tend to want to find the maximal regular equivalence position, the one with the fewest positions.

Role or Local Equivalence: While most equivalence measures focus on position within the full network, some measures focus only on the patters within the local tie neighborhood. These have been called ‘local role’ equivalence. Note that: Structurally equivalent actors are automorphically equivalent, Automorphically equivalent actors are regularly equivalent. Structurally equivalent and automorphically equivalent actors are role equivalent In practice, we tend to ignore some of these fine distinctions, as they get blurred quickly once we have to operationalize them in real graphs. It turns out that few people are ever exactly equivalent, and thus we approximate the links between the types. In all cases, the procedure can work over multiple relations simultaneously. The process of identifying positions is called blockmodeling, and requires identifying a measure of similarity among nodes.

Blockmodeling is the process of identifying these types of positions. A block is a section of the adjacency matrix - a “group” of people. Here I have blocked structurally equivalent actors

Once you block the matrix, reduce it, based on the number of ties in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present): Structural equivalence thus generates 6 positions in the network

Once you partition the matrix, reduce it: Regular equivalence 12 3 (here I placed a one in the image matrix if there were any ties in the ij block)

To get a block model, you have to measure the similarity between each pair. If two actors are structurally equivalent, then they will have exactly similar patterns of ties to other people. Consider the example again: C D Match Sum: 12 C and D match on 12 other people

If the model is going to be based on asymmetric or multiple relations, you simply stack the various relations: H W C C C Provides food for Romantic Love Bickers with Romance Feeds Bicker Stacked

For the entire matrix, we get: (number of agreements for each ij pair)

The metric used to measure structural equivalence by White, Boorman and Brieger is the correlation between each node’s set of ties. For the example, this would be: Another common metric is the Euclidean distance between pairs of actors, which you then use in a standard cluster analysis.

The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations Concor iteration 1:

Concor iteration 2: The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

Concor iteration 3: The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

Concor iteration 3: The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

Repeat the process on the resulting 1-blocks until you have reached structural equivalent blocks Because CONCOR splits every sub-group into two groups, you get a partition tree that looks something like this:

CONCOR example: Consider a simple senate voting network: Network is dense, since every cell has some score and dynamic the pattern changes over time. Color by structural equivalence…

Network is dense, since every cell has some score and dynamic the pattern changes over time. Adjust position to collapse SE positions. CONCOR example: Consider a simple senate voting network:

Network is dense, since every cell has some score and dynamic the pattern changes over time. And then adjust color, line width, etc. for clarity. While we’ve gone some distance with identifying relevant information from the mass, how do we account for time? CONCOR example: Consider a simple senate voting network:

CONCOR example: Repeat at each wave, linking positions over time

CONCOR example:

Automorphic and Regular equivalence are more difficult to find, and require iteratively searching over possible class assignments for sets that have the same graph theoretic patterns. Usually start with a set of nodes defined as similar on a number of network measures, then look within these classes for automorphic equivalence classes. The classic reference is REGE (White & Reitz 1985), which recursively defines the degree of equivalence between pairs and then adjusts for as many iterations as you specify. A theoretically appealing method for finding structures that are very similar to regular equivalence, role equivalence, uses the triad census. Each node is involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of these triads. These positions are summarized in the following figure:

Network Sub-Structure: Triads 003 (0) 012 (1) D 021U 021C (2) 111D 111U 030T 030C (3) D 120U 120C (4) 210 (5) 300 (6) Intransitive Transitive Mixed

An Example of the triad census Type Number of triads D U C D U T C D U C Sum (2 - 16): 63

_S 012_E 012_I 102_D 102_I 021D_S021D_E 021U_S 021U_E 021C_S 021C_B021C_E 111D_S 111D_B 111D_E111U_S111U_B 111U_E 030T_S030T_B 030T_E 030C 201_S 201_B 120D_S 120D_E 120U_S 120U_E 120C_S 120C_B 120C_E 210_S 210_B 300 Triadic Position Census: 36 Positions within 16 Directed Triads Indicates the position.

Triadic Position Census: 40 Positions within all mutual ties but two types of relations

Triad position vectors for the example network, resulting in 3 positions:

Correlating each person’s triad position vector with each other persons results in the following table, which clearly shows the positions that are equivalent:

Jefferson High SchoolSunshine High School School provides a good boundary for social relations School does not provide a good boundary for social relations Complete Network Analysis Network Connections: Role Positions

Jefferson High SchoolSunshine High School Image networks. Width of tie is proportional to the ratio of cell density to mean cell density. 34% 32% 33% 4% 43% 52% Complete Network Analysis Network Connections: Role Positions

Once you have decided on a number of blocks, you need to determine what counts as a ‘one’ block or a ‘zero’ block. Usually this is a some function of the density of the resulting block. General rules: “Fat Fit” Only put a one in blocks with all ones in the adjacency matrix “Lean Fit” Put a zero if all the cells are zero, else put a one “Density fit” If the average value of the cell is above a certain cutoff. White, Boorman and Breiger used a ‘lean fit’ (zeroblock) rule for the examples in their paper:

An example: White et al, figure 1. Biomedical Specialty data:

White et al, figure 3. Biomedical Specialty data: Key to structure lies in zero blocks

Recent models Recent work has generalized blockmodels in two directions: Specific structural hypotheses example: Core-periphery models or Structural Hole ideas Generalized blockmodeling based on particular relationship types & patterns. Pat Doreian’s recent work the the PAJEK folks. Connectivity sets. Identifying sets of nodes with some common patter of connectivity. This is a merge/mingle of community detection & positions. Moody & White would be an example.

To identify a core- periphery structure, we compare an observed block structure to an ideal block structure An ideal core- periphery network: Borgatti SP and Everett M G (1999) Models of core/periphery structures. Social Networks Recent models Core-Periphery

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure. (observed blocked network) Recent models Core-Periphery

(observed blocked network) (Ideal CP blocked network) To identify a core-periphery structure, we compare an observed block structure to an ideal block structure. Recent models Core-Periphery

(observed blocked network) (Ideal CP blocked network) A core periphery structure exists to the extent that the correlation between the ideal structure and the observed structure is high. We can search for cores by simply proposing a partition (many times) and then selecting the best fitting partition. But that’s silly-slow! To identify a core-periphery structure, we compare an observed block structure to an ideal block structure. Recent models Core-Periphery

A continuous version of “coreness” can be had by generalizing the ideal image seen above. Instead of just 0/1, pairs of “high core” nodes have a very strong tie connecting them, and core-periphery nodes have a very low score. Coreness can thus be defined as a type of centrality, but one that assumes a particular underlying structure to the network. Nodes with high coreness are more likely to be at the center of a core-periphery structure. As it turns out, coreness is essentially Eigenvector centrality, and UCINET sorts nodes by eigenvector centrality and build the “core” until the correlation between ideal/observed drops. To identify a core-periphery structure, we compare an observed block structure to an ideal block structure. Recent models Core-Periphery

Recent models Core-Periphery

The recent work on generalization focuses on the patterns that determine a block. Instead of focusing on just the density of a block, you can identify a block as any set that has a particular pattern of ties to any other set. This work starts from the observation that types of equivalence limit the observed types of blocks. So, for example, regularly equivalent blocks must be either empty, complete, or 1-covered. The “direct” approach is thus to search for these sorts of coverings. Recent models Generalized Block Models

Recent models Generalized Block Models

Recent models Generalized Block Models From Carrington, Scott & Wasserman. Models & Methods in Social Network Analysis

“A friend of a friend is a friend” “The enemy of an enemy is a friend” F x F = F E x E = F We can generalize the balance rule to multitudes of “compound relations” Use matrices for primary relations and matrix multiplication for compounds Compound Relations.

One of the most powerful tools in role analysis involves looking at role systems through compound relations. A compound relation is formed by combining relations in single dimensions. The best example of compound relations come from kinship. Sibling Child of Sibling Child of x = Nephew/Niece S  C = SC

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as” Consider a system with two sorts of relations. Here, one is hierarchical and the other defines “within class”. We can build a role table with Boolean multiplation of the relations

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as” “Boss” X “boss of my boss is my boss”

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as” “On the same level” X

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

Kinship networks form a foundation to social structures. In the west, we have 2 primary relations (Parent of, married to) and one partitioning attribute (male or female). So: Parent of a Parent = Grandparent Father’s Father = Paternal Grandfather Mother’s Father = Maternal Grandfather Wife’s Mother’s Son = Brother-in-law Mother’s Mother’s son’s son = Cousin (mom’s side) Quality: The entire western kinship structure can be decomposed into a set of equations consisting of only Parent, Child, and Gender. Quantity: Given a fertility rate of 2 kids, the two-step * kinship neighborhood would have 26 people; if the fertility rate were 3 the same count goes up to 46. *2-steps includes aunt’s & uncles, but not their spouses. Compound Relations.

The scientists second rule has to be to look for regularity and exploit that for theory. Consider as a good example, Harrison White’s Kinship model: Compound Relations.

Ego connects to any of these Compound Relations. The scientists second rule has to be to look for regularity and exploit that for theory. Consider as a good example, Harrison White’s Kinship model:

Kinship networks form a foundation to social structures. In China, we have the same 2 primary relations: Parent of Married to But 3 partitioning attributes: Gender Relative Age Relational Order (1 st wife, 2 nd wife, etc) This means that compounds we name as equivalent (cousin, uncle) are named differently. But, while westerners largely ignore gender for anything other than final designation (aunt/uncle, niece/nephew), Chinese kinship terms are differentiated by parent’s line (maternal aunt, maternal uncle, etc.). We know this designation, but use it rarely. Compound Relations.

*2-steps includes aunt’s & uncles, but not their spouses. Compound Relations.

Uncles Compound Relations.

The Chinese extended family network – for “normal” relations westerners would recognize – includes 74 unique kinship terms. The same set in the west has 28 different terms. Each of these terms carries a different expected gift exchange system at holidays and mourning attire at death. Compound Relations.

How has this system changed? Consider the effects of the 1-child policy: Source: Population research Bureau With a fertility of 6, 2-step kinship nets would have 166 people; with 2 it’s 26. A full implementation of 1-child removes the “relative age” operator, erasing every kinship term dependent on “older” or “younger” and means that families play either in a maternal or a paternal line, but not both. Compound Relations.

Using Compound Relations theoretically: James Montgomery & Patronage systems

Using Compound Relations theoretically: James Montgomery & Patronage systems

Using Compound Relations theoretically:

Other work on this general topic:

Using Compound Relations theoretically: Other work on this general topic:

Methods: How to? The basic block model formation can be done in multiple ways: 1.Apply any of our group-finding algorithms to a role-based similarity matrix -Here you’re simply converting the conditions for equivalence to adjacency and solving for modularity. Requires either a community detection algorithm that uses valued ties or a binarization of the similarity matrix. 2.Cluster node-level structural indices (get at regular/automorphic equivalence) - This is the “evade” correlate to SE from community detection: cluster on a BUNCH of easy-to-calculate node-level network statistics and this gives you nodes that are equivalent (with respect to the measures you used!)

Methods: How to? The basic block model formation can be done in multiple ways: Role-specific algorithms:

Methods: How to? The basic block model formation can be done in multiple ways: Role-specific algorithms:

Methods: How to? Triad Structural Equivalence in SAS

Methods: How to? Triad Structural Equivalence in SAS

Methods: How to? Triad Structural Equivalence in SAS

Addendum A new statistic for determining the number of groups in a network. Proc cluster gives you a statistic for the basic “fit” of a cluster solution. This statistic varies depending on the method used, but is usually something like an R2. Consider this dendrogram:

Addendum A new statistic for determining the number of groups in a network. Proc cluster gives you a statistic for the basic “fit” of a cluster solution. This statistic varies depending on the method used, but is usually something like an R2. Consider this dendrogram: The SPRSQ and the RSQ are your fit statistics.

Addendum A new statistic for determining the number of groups in a network. SPRSQ RSQ A sharp change in the statistic is your best indicator.

Addendum A new statistic for determining the number of groups in a network. Modularity: M is the modularity score S indexes each group (“module”) ls is the number of lines in group s L is the total number of lines ds is the sum of the degrees of the nodes in s Nm is the number of groups

Role Positions Identifying positions: Could use the Modularity score at each tree cut…

Role Positions Example positions identified in a single school network (role 7 is a “leading crowd” in the simplest sum-of-in-degree sense)

Repeating this process across all networks, generates a population of within-school position profiles. We then pool & cluster these position profiles in a “2 nd -order clustering” to identify a set of roles that can be compared across the populations. We settle on 5 position solution: Role Positions 89/ /501 39/107850/1235 4/815 35/263 50/819 0/416 Outsiders Aloofs FriendsHangers Central Core

Role Positions Uninvolved outsiders (35% of students, 28% of role groups) Largely uninvolved: nominate few and are nominated rarely by others. Includes isolated dyads & small groups; mixing matrix show that few friends tend to be others in same positon.

Role Positions Non-Reciprocated (17% of students, 15% of role groups) Makes nominations, but rarely reciprocated and has low in-degree, targeting highly central nodes with nominations. “Hangers on” position.

Role Positions Basically average – positive scores largely because the isolates have been removed – liked by some, like others. Everyday kids: good friends (21% of students, 29% of role groups)

Role Positions “Popular Aloof” (9% of students, 9% of role groups) High in-degree but low out-degree, but the few they do nominate tend to reciprocate.

Role Positions Central Core (17.5% of students, 17.8% of role groups) Highly reciprocated ties, active, very central; both high in-degree and reciprocation rates.

Role Positions How stable is occupancy of a school role?

Role Positions How stable is occupancy of a school role?