Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody 1.

Slides:



Advertisements
Similar presentations
Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Advertisements

Where we are Node level metrics Group level metrics Visualization
Block Modeling Overview Social life can be described (at least in part) through social roles. To the extent that roles can be characterized by regular.
SOCI 5013: Advanced Social Research: Network Analysis Spring 2004.
Introduction to Social Network Analysis Lluís Coromina Departament d’Economia. Universitat de Girona Girona, 18/01/2005.
Sampling Distributions
Agricultural and Biological Statistics
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
Centrality in Social Networks
By: Roma Mohibullah Shahrukh Qureshi
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Radial Basis Function Networks
Centrality in Social Networks Background: At the individual level, one dimension of position in the network can be captured through centrality. Conceptually,
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Standard Error and Research Methods
Models of Influence in Online Social Networks
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Overview Granovetter: Strength of Weak Ties What are ‘weak ties’? why are they ‘strong’? Burt: Structural Holes What are they? What do they do? How do.
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Estimation of Statistical Parameters
Presentation: Random Walk Betweenness, J. Govorčin Laboratory for Data Technologies, Faculty of Information Studies, Novo mesto – September 22, 2011 Random.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Connectivity and the Small World Overview Background: de Pool and Kochen: Random & Biased networks Rapoport’s work on diffusion Travers and Milgram Argument.
Diffusion 1)Structural Bases of Social Network Diffusion 2)Dynamic limitations on diffusion 3)Implications / Applications in the diffusion of Innovations.
Victor Lee.  What are Social Networks?  Role and Position Analysis  Equivalence Models for Roles  Block Modelling.
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Centrality in undirected networks These slides are by Prof. James Moody at Ohio State.
Murtaza Abbas Asad Ali. NETWORKOLOGY THE SCIENCE OF NETWORKS.
Patterns & Paradox: Network Foundations of Social Capital James Moody Ohio State University Columbus, Ohio June 20 th 2005.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
"Social Networks, Cohesion and Epidemic Potential" James Moody Department of Sociology Department of Mathematics Undergraduate Recognition Ceremony May.
Lecture 13: Network centrality Slides are modified from Lada Adamic.
Connectivity & Cohesion Overview Background: Small World = Connected What distinguishes simple connection from cohesion? Moody & White Argument Measure.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester
Centrality in Social Networks Background: At the individual level, one dimension of position in the network can be captured through centrality. Conceptually,
Discrete Probability Distributions Define the terms probability distribution and random variable. 2. Distinguish between discrete and continuous.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Slides are modified from Lada Adamic
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Graphs & Matrices Todd Cromedy & Bruce Nicometo March 30, 2004.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Epidemic Potential in Human Sexual Networks: Connectivity and The Development of STD Cores.
Structural Holes & Weak Ties
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
HCC class lecture 21: Intro to Social Networks John Canny 4/11/05.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Informatics tools in network science
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Connectivity and the Small World
Groups of vertices and Core-periphery structure
Social Networks Analysis
Applications of graph theory in complex systems research
Department of Computer and IT Engineering University of Kurdistan
Local Networks Overview Personal Relations: Core Discussion Networks
Network Science: A Short Introduction i3 Workshop
Statistical Data Analysis
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody 1

Introduction These patterns of connection form a social space, that can be seen in multiple contexts: We live in a connected world: “To speak of social life is to speak of the association between people – their associating in work and in play, in love and in war, to trade or to worship, to help or to hinder. It is in the social relations men establish that their interests find expression and their desires become realized.” Peter M. Blau Exchange and Power in Social Life, 1964 "If we ever get to the point of charting a whole city or a whole nation, we would have … a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole." J.L. Moreno, New York Times, April 13,

Source: Linton Freeman “See you in the funny pages” Connections, 23, 2000, Introduction 3

High Schools as Networks Introduction 4

5

6

And yet, standard social science analysis methods do not take this space into account. “For the last thirty years, empirical social research has been dominated by the sample survey. But as usually practiced, …, the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it.” Allen Barton, 1968 (Quoted in Freeman 2004) Moreover, the complexity of the relational world makes it impossible to identify social connectivity using only our intuitive understanding. Social Network Analysis (SNA) provides a set of tools to empirically extend our theoretical intuition of the patterns that construct social structure. Introduction 7

Why do Networks Matter?Local vision Introduction 8

Why do Networks Matter?Local vision Introduction 9

Why networks matter: Intuitive: “goods” travel through contacts between actors, which can reflect a power distribution or influence attitudes and behaviors. Our understanding of social life improves if we account for this social space. Less intuitive: patterns of inter-actor contact can have effects on the spread of “goods” or power dynamics that could not be seen focusing only on individual behavior. Introduction 10

Social network analysis is: a set of relational methods for systematically understanding and identifying connections among actors. SNA is motivated by a structural intuition based on ties linking social actors is grounded in systematic empirical data draws heavily on graphic imagery relies on the use of mathematical and/or computational models. Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior. Introduction 11

1.Introduction 2.Social Network Data a.Basic data Elements b.Collecting network data c.Basic data structures 3.Measuring Networks a.Flows within of goods in networks 1)Topology 2)Time b.Structure of Social Space 1)Small Worlds, Scale-Free, Triads 2)Cohesive Groups 3)Role Positions 4.Modeling with Networks a.Modeling Behaviors with Networks 1)Peer attribute models 2)Network Autocorrelation Models 3)Dyad / QAP Models b.Modeling Network Network Structure 1)QAP for network structure 2)Exponential Random Graph Models 5.SNA Computer Programs 12

The unit of interest in a network are the combined sets of actors and their relations. We represent actors with points and relations with lines. Actors are referred to variously as: Nodes, vertices or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: a b ce d Social Network Data 13

In general, a relation can be: Binary or Valued Directed or Undirected a b ce d Undirected, binary Directed, binary a b ce d a b ce d Undirected, Valued Directed, Valued a b ce d Social Network Data 14

Social Network Data Social network data are substantively divided by the number of modes in the data. 1-mode data represents edges based on direct contact between actors in the network. All the nodes are of the same type (people, organization, ideas, etc). Examples: Communication, friendship, giving orders, sending . 1-mode data are usually singly reported (each person reports on their friends), but you can use multiple-informant data, which is more common in child development research (Cairns and Cairns). 15

Social Network Data Social network data are substantively divided by the number of modes in the data. 2-mode data represents nodes from two separate classes, where all ties are across classes. Examples: People as members of groups People as authors on papers Words used often by people Events in the life history of people The two modes of the data represent a duality: you can project the data as people connected to people through joint membership in a group, or groups to each other through common membership There may be multiple relations of multiple types connecting your nodes. 16

Social Network Data We can examine networks across multiple levels: 1) Ego-network - Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module - May include estimates of connections among alters 2) Partial network - Ego networks plus some amount of tracing to reach contacts of contacts - Something less than full account of connections among all pairs of actors in the relevant population - Example: CDC Contact tracing data for STDs 17

3) Complete or “Global” data - Data on all actors within a particular (relevant) boundary - Never exactly complete (due to missing data), but boundaries are set -Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom For the most part, I will be discussing techniques surrounding global networks today, though I will briefly mention some standard uses of ego-network data. Social Network Data We can examine networks across multiple levels: 18

Collecting Network Data Data capture any connection between the nodes. Sources include surveys, published accounts, special informants, etc. In general, you can only make conclusions about relations among the set of nodes you have collected, so it is important to observe as much of the network as possible. See W&F, chap 2 on different types of data collection Social Network Data 19

If you use surveys to collect data, some general rules of thumb: a)Network data collection can be time consuming. It is better (I think) to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. b)Question format: If you ask people to recall names (an open list format), fatigue will result in under-reporting If you ask people to check off names from a full list, you can often get over-reporting c) It is common to limit people to ~5 nominations. This will bias network stats for stars, but is sometimes the best choice to avoid fatigue. d) Concrete relational indicators are best (who did you talk to?) over attitudes that are harder to define (who do you like?) Collecting Network Data Social Network Data 20

Collecting Network Data Social Network Data Existing Sources of Social Network Data 1)Check INSNA: The International Network of Social Network Analysis 2)Many secondary sources (particularly for 2-mode data) 3)National Longitudinal Survey of Adolescent Health (Add Health) 21

Working with pictures. No standard way to draw a sociogram: each of these are equal: Basic Data Structures Social Network Data 22

In general, graphs are cumbersome to work with analytically, though there is a great deal of good work to be done on using visualization to build network intuition. I recommend using layouts that optimize on the feature you are most interested in, and find that either a hierarchical layout or a force-directed layout are best. Basic Data Structures Social Network Data 23

From pictures to matrices a b ce d Undirected, binaryDirected, binary a b ce d abcde a b c d e abcde a b c d e Basic Data Structures Social Network Data 24

From matrices to lists abcde a b c d e a b b a c c b d e d c e e c d a b b a b c c b c d c e d c d e e c e d Adjacency List Arc List Basic Data Structures Social Network Data 25

“Goods” flow through networks: Measuring Networks: Flow 26

In addition to the simple probability that one actor passes information on to another (p ij ), two factors affect flow through a network: Topology -the shape, or form, of the network - Example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contact matters - Example: an actor cannot pass information he has not receive yet Measuring Networks: Flow 27

Two features of the network’s topology are known to be important: connectivity and centrality Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. Distance: Given they can be reached, how many steps are they from each other? Number of paths: How many different paths connect each pair? Measuring Networks: Flow 28

Without full network data, you can’t distinguish actors with limited information potential from those more deeply embedded in a setting. a b c Measuring Networks: Flow 29

d e c Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them. a b ce d f bf a Reachability Measuring Networks: Flow Paths can be directed, leading to a distinction between “strong” and “weak” components 30

Basic elements in connectivity A path is a sequence of nodes and edges starting with one node and ending with another, tracing the indirect connection between the two. On a path, you never go backwards or revisit the same node twice. Example: a  b  c  d A walk is any sequence of nodes and edges, and may go backwards. Example: a  b  c  b  c  d A cycle is a path that starts and ends with the same node. Example: a  b  c  a Reachability Measuring Networks: Flow 31

Reachability If you can trace a sequence of relations from one actor to another, then the two are reachable. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component. Intuitively, a component is the set of people who are all connected by a chain of relations. Reachability Measuring Networks: Flow 32

This example contains many components. Reachability Measuring Networks: Flow 33

In general, components can be directed or undirected. For a graph with any directed edges, there are two types of components: Strong components consist of the set(s) of all nodes that are mutually reachable Weak components consist of the set(s) of all nodes where at least one node can reach the other. Reachability Measuring Networks: Flow (hidden) 34

There are only 2 strong components with more than 1 person in this network. Reachability Measuring Networks: Flow (hidden) 35

a Distance is measured by the (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 Distance & number of paths Measuring Networks: Flow 36

Paths are the different routes one can take. Node-independent paths are particularly important. a b There are 2 independent paths connecting a and b. There are many non- independent paths Distance & number of paths Measuring Networks: Flow 37

Probability of transfer by distance and number of paths, assume a constant p ij of Path distance probability 10 paths 5 paths 2 paths 1 path Distance & number of paths Measuring Networks: Flow 38

Reachability in Colorado Springs (Sexual contact only) (Node size = log of degree) High-risk actors over 4 years 695 people represented Longest path is 17 steps Average distance is about 5 steps Average person is within 3 steps of 75 other people 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths 39

Centrality refers to (one dimension of) location, identifying where an actor resides in a network. For example, we can compare actors at the edge of the network to actors at the center. In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders. Centrality Measuring Networks: Flow 40

At the individual level, one dimension of position in the network can be captured through centrality. Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated, but substantively we often have reason to believe that people at the center are very important. Three standard centrality measures capture a wide range of “importance” in a network: Degree Closeness Betweenness Centrality Measuring Networks: Flow 41

The most intuitive notion of centrality focuses on degree. Degree is the number of ties, and the actor with the most ties is the most important: Centrality Measuring Networks: Flow 42

Degree centrality, however, can be deceiving, because it is a purely local measure. Centrality Measuring Networks: Flow (hidden) 43

If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality: Simple: variance of the individual centrality scores. Or, using Freeman’s general formula for centralization (which ranges from 0 to 1): Centrality Measuring Networks: Flow 44

Degree Centralization Scores Freeman:.07 Variance:.20 Freeman: 1.0 Variance: 3.9 Freeman:.02 Variance:.17 Freeman: 0.0 Variance: 0.0 Centrality Measuring Networks: Flow 45

A second measure of centrality is closeness centrality. An actor is considered important if he/she is relatively close to all other actors. Closeness is based on the inverse of the distance of each actor to every other actor in the network. Closeness Centrality: Normalized Closeness Centrality Centrality Measuring Networks: Flow 46

Closeness Centrality in the examples C=1.0 C=0.0 C=0.36 C=0.28 Centrality Measuring Networks: Flow 47

Betweenness Centrality: Model based on communication flow: A person who lies on communication paths can control communication flow, and is thus important. Betweenness centrality counts the number of shortest paths between i and k that actor j resides on. b a C d e f g h Centrality Measuring Networks: Flow 48

Betweenness Centrality: Where g jk = the number of geodesics connecting jk, and g jk (n i ) = the number that actor i is on. Usually normalized by: Centrality Measuring Networks: Flow 49

Centralization: 1.0 Centralization:.31 Centralization:.59 Centralization: 0 Betweenness Centrality: Centrality Measuring Networks: Flow 50

Centralization:.183 Betweenness Centrality: Centrality Measuring Networks: Flow (hidden) 51

Comparing across centrality values Generally, the 3 centrality types will be positively correlated When they are not correlated, it probably tells you something interesting about the network. Low Degree Low Closeness Low Betweenness High Degree Embedded in cluster that is far from the rest of the network Ego's connections are redundant - communication bypasses him/her High Closeness Key player tied to important important/active alters Probably multiple paths in the network, ego is near many people, but so are many others High Betweenness Ego's few ties are crucial for network flow Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others. Centrality Measuring Networks: Flow (hidden) 52

(Node size proportional to betweenness centrality ) Actors that appear very different when seen individually, are comparable in the global network. Centrality Measuring Networks: Flow 53

Two factors that affect network flows: Topology - the shape, or form, of the network - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contacts matters - simple example: an actor cannot pass information he has not yet received. Time Measuring Networks: Flow 54

Timing in networks A focus on contact structure has often slighted the importance of network dynamics,though a number of recent pieces are addressing this. Time affects networks in two important ways: 1)The structure itself evolves, in ways that will affect the topology an thus flow. 2) The timing of contact constrains information flow Time Measuring Networks: Flow 55

Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 1 Time Measuring Networks: Flow 56

Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray Time Measuring Networks: Flow 57

Drug Relations, Colorado Springs, Year 3 Current year in red, past relations in gray Time Measuring Networks: Flow 58

Drug Relations, Colorado Springs, Year 4 Current year in red, past relations in gray Time Measuring Networks: Flow 59

Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray Time Measuring Networks: Flow 60

B C E DF A Numbers above lines indicate contact periods Time Measuring Networks: Flow What impact does timing have on flow through the network? 61

B C E DF A The path graph for the hypothetical contact network Time Measuring Networks: Flow While clearly important, this is not often handled well by current software. 62

Measuring Networks: Structure & Social Space The second broad division for measuring networks steps back to generalized features of the global network. These factors almost always are of interest because of what they imply about how goods move through the network, but have resulted in a distinct line of methods and substantive research. We focus on 3 such factors today: 1) Basic structure of large-scale networks 2) Cohesive Peer Groups 3) Identifying Role positions (blockmodels) 63

Small World Networks Based on Milgram’s (1967) famous work, the substantive point is that networks are structured such that even when most of our connections are local, any pair of people can be connected by a fairly small number of relational steps. Works on 2 parameters: 1)The Clustering Coefficient (c) = average proportion of closed triangles 2)The average distance (L) separating nodes in the network Measuring Networks: Large-Scale Models 64

High probability that a node’s contacts are connected to each other. Small average distance between nodes C=Large, L is Small = SW Graphs Small World Networks Measuring Networks: Large-Scale Models 65

In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically Watts demonstrates that small world properties can occur in graphs with a surprisingly small number of shortcuts Diffusion / flow implications are unclear, but seem similar to a random graphs where local clusters are reduced to a single point. Small World Networks Measuring Networks: Large-Scale Models 66

Across a large number of substantive settings, Barabási points out that the distribution of network involvement (degree) is highly and characteristically skewed. Scale-Free Networks Measuring Networks: Large-Scale Models 67

Many large networks are characterized by a highly skewed distribution of the number of partners (degree) Scale Free Networks Measuring Networks: Large-Scale Models 68

Many large networks are characterized by a highly skewed distribution of the number of partners (degree) Scale Free Networks Measuring Networks: Large-Scale Models 69

The scale-free model focuses on the distance-reducing capacity of high-degree nodes: Scale Free Networks Measuring Networks: Large-Scale Models 70

The scale-free model focuses on the distance-reducing capacity of high- degree nodes, as ‘hubs’ create shortcuts that carry network flow. Scale Free Networks Measuring Networks: Large-Scale Models 71

Colorado Springs High-Risk (Sexual contact only) Network is approximately scale-free, with = -1.3 But connectivity does not depend on the hubs. Scale Free Networks Measuring Networks: Large-Scale Models 72

White, D. R. and F. Harary "The Cohesiveness of Blocks in Social Networks: Node Connectivity and Conditional Density." Sociological Methodology 31: Moody, James and Douglas R. White “Structural Cohesion and Embeddedness: A hierarchical Conception of Social Groups” American Sociological Review 68: White, Douglas R., Jason Owen-Smith, James Moody, & Walter W. Powell (2004) "Networks, Fields, and Organizations: Scale, Topology and Cohesive Embeddings." Computational and Mathematical Organization Theory. 10: Moody, James "The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999" American Sociological Review. 69: Social Cohesion Measuring Networks: Large-Scale Models 73

Formal definition of Structural Cohesion: (a)A group’s structural cohesion is equal to the minimum number of actors who, if removed from the group, would disconnect the group. Equivalently (by Menger’s Theorem): (b)A group’s structural cohesion is equal to the minimum number of independent paths linking each pair of actors in the group. Social Cohesion Measuring Networks: Large-Scale Models 74

Networks are structurally cohesive if they remain connected even when nodes are removed Node Connectivity Social Cohesion Measuring Networks: Large-Scale Models 75

Structural cohesion gives rise automatically to a clear notion of embeddedness, since cohesive sets nest inside of each other Social Cohesion Measuring Networks: Large-Scale Models 76

3-Component (n=58) Project 90, Sex-only network (n=695) Social Cohesion Measuring Networks: Large-Scale Models 77

Connected Bicomponents IV Drug Sharing Largest BC: 247 k > 4: 318 Max k: 12 Structural Cohesion simultaneously gives us a positional and subgroup analysis. Social Cohesion Measuring Networks: Large-Scale Models 78

A primary interest in Social Network Analysis is the identification of “significant social subgroups” – some smaller collection of nodes in the graph that can be considered, at least in some senses, as a “unit” based on the pattern, strength, or frequency of ties. There are many ways to identify groups. They all insist on a group being in a connected component, but other than that the variation is wide. Measuring Networks: Cohesive Sub Groups 79

Graph Theoretical Models. Start with a clique. A clique is defined as a maximal subgraph in which every member of the graph is connected to every other member of the graph. Cliques are collections of nodes where density = 1.0. Properties of cliques: Density: 1.0 Everyone connected to n-1 alters Distance between every pair is 1 Ratio of within group ties to between group ties is infinite All triads are transitive Measuring Networks: Cohesive Sub Groups 80

Graph Theoretical Models. In practice, complete cliques are not very useful. They tend to overlap heavily and are limited in their size. Graph theorists have thus relaxed the complete connectivity requirement (with varying degrees of success). See the Moody & White (2003) for a discussion of these attempts. Measuring Networks: Cohesive Sub Groups 81

Extensions of this idea include: K-Core: Every person has ties to at least k other people in the set. K-plex: Every member connected to at least n-k other people in the graph (recall in a clique everyone is connected to n-1, so this relaxes that condition. n-clique: Every person is connected by a path of N or less (recall a clique is with distance = 1). N-clan: same as an n-clique, but all paths must be inside the group. I’ve never had much luck with any of these methods empirically. Real data is usually too messy to work well. Since many of the graph-theoretic options seem not to work well, authors have used optimization techniques, that attempt to identify groups iteratively. Measuring Networks: Cohesive Sub Groups (hidden) 82

Identifying Primary groups: 1) Measures of fit To identify a primary group, we need some measure of how clustered the network is. Usually, this is a function of the number of ties that fall within group to the number of ties that fall between group. 2) Algorithmic approaches to maximizing (1) Once we have such an index, we need a method for searching through the network to maximize the fit. 3) Generalized cluster analysis In addition to maximizing a group function such as (1) we can use the relational distance directly, and look for clusters in the data. We next go over two different styles of cluster analysis Measuring Networks: Cohesive Sub Groups 83

Segregation Index ( Freeman, L. C "Segregation in Social Networks." Sociological Methods and Research ) Freeman asked how we could identify segregation in a social network. Theoretically, he argues, if a given attribute (group label) does not matter for social relations, then relations should be distributed randomly with respect to the attribute. Thus, the difference between the number of cross-group ties expected by chance and the number observed measures segregation. Measuring Networks: Cohesive Sub Groups 84

Consider the (hypothetical) network below. There are two attributes in this network: people with Blue eyes and Brown eyes and people who are square or not (they must be hip). Measuring Networks: Cohesive Sub Groups 85

Segregation Index Mixing Matrix: Blue Brown Blue 6 17 Brown Hip Square Hip 20 3 Square 3 30 Measuring Networks: Cohesive Sub Groups Seg = Seg =

To calculate the number of expected, we use the standard formula for a contingency table: Row marginal * column Marginal / Total Blue Brown Blue Brown Blue Brown Blue Brown observed Expected In matrix form: E(X) = R*C/T Segregation Index (Hidden) Measuring Networks: Cohesive Sub Groups 87

Blue Brown Blue Brown Blue Brown Blue Brown observed Expected E(X) = ( ) X = (17+17) Seg = / 27.1 = -6.9 / 27.1 = Segregation Index (Hidden) Measuring Networks: Cohesive Sub Groups 88

Hip Square Hip Square Observed Hip Square Hip Square Expected E(X) = ( ) X = (3+3) Seg = / 27.1 = 21.1 / 27.1 = 0.78 Segregation Index (Hidden) Measuring Networks: Cohesive Sub Groups 89

Measuring Networks: Cohesive Sub Groups The segregation index is one metric used to identify groups. Others include: a) The ratio of in-group to out-group ties (Negopy, UCINET Factions) b) Maximizing the probability of in-group contact (CliqueFinder) c) The Segregation Matrix Index (SMI) d) The dyadic factor loadings for overlapping groups (akin to a latent class model) e) Minimize the within-group distance Once a metric has been chosen, some algorithm is needed to search through the graph to identify clusters. These algorithms range from very sophisticated “graph-intelligent” algorithms, such as NEGOPY, to simple cluster analysis of distance matrices. In most cases, you have to pre-set the number of groups to use (the exceptions are NEGOPY and CliqueFinder. Moody’s CROWDS algorithm also has automatic stopping criteria, but you have to give it starting values. 90

Measuring Networks: Cohesive Sub Groups In practice, the different algorithms will give different results. Here, I compare the NEGOPY results to the RNM results. NEGOPY returned one large group, RNM found many smaller, denser groups. It’s usually a good idea to explore multiple solutions and algorithms. 91

Measuring Networks: Cohesive Sub Groups In practice, the different algorithms will give different results. Here, I compare NEGOPY, FACTIONS and RNM. Groups A and B are identical, C is close. F, E and D differ. It’s usually a good idea to explore multiple solutions and algorithms. Gangon Prison Network (all solutions constrained to 6 groups) 92

Overview Social life can be described (at least in part) through social roles. To the extent that roles can be characterized by regular interaction patterns, we can summarize roles through common relational patterns. Identifying these sets is the goal of block-model analyses. Nadel: The Coherence of Role Systems Background ideas for White, Boorman and Brieger. Social life as interconnected system of roles Important feature: thinking of roles as connected in a role system = social structure White, Harrison C.; Boorman, Scott A., and Breiger, Ronald L. Social Structure from Multiple Networks I. American Journal of Sociology. 1976; The key article describing the theoretical and technical elements of block-modeling Measuring Networks: Role Positions 93

Elements of a Role: Rights and obligations with respect to other people or classes of people Roles require a ‘role compliment’ another person who the role- occupant acts with respect to Examples: Parent - child, Teacher - student, Lover - lover, Friend - Friend, Husband - Wife, etc. Nadel (Following functional anthropologists and sociologists) defines ‘logical’ types of roles, and then examines how they can be linked together. Measuring Networks: Role Positions 94

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might represent a family as: H W C C C Provides food for Romantic Love Bickers with (and there are, of course, many other relations inside a family!) White et al: From logical role systems to empirical social structures Measuring Networks: Role Positions 95

The key idea, is that we can express a role through a relation (or set of relations) and thus a social system by the inventory of roles. If roles equate to positions in an exchange system, then we need only identify particular aspects of a position. But what aspect? Structural Equivalence Two actors are structurally equivalent if they have the same types of ties to the same people. Measuring Networks: Role Positions 96

Structural Equivalence A single relation Measuring Networks: Role Positions 97

Structural Equivalence Graph reduced to positions Measuring Networks: Role Positions 98

Blockmodeling: basic steps In any positional analysis, there are 4 basic steps: 1) Identify a definition of equivalence 2) Measure the degree to which pairs of actors are equivalent 3) Develop a representation of the equivalencies 4) Assess the adequacy of the representation Measuring Networks: Role Positions 99

1) Identify a definition of equivalence Structural Equivalence: Two actors are equivalent if they have the same type of ties to the same people. Measuring Networks: Role Positions 100

Automorphic Equivalence: Actors occupy indistinguishable structural locations in the network. That is, that they are in isomorphic positions in the network. In general, automorphically equivalent nodes are equivalent with respect to all graph theoretic properties (I.e. degree, number of people reachable, centrality, etc.) Measuring Networks: Role Positions 101

Automorphic Equivalence: Measuring Networks: Role Positions 102

Regular Equivalence: Regular equivalence does not require actors to have identical ties to identical actors or to be structurally indistinguishable. Actors who are regularly equivalent have identical ties to and from equivalent actors. If actors i and j are regularly equivalent, then for all relations and for all actors, if i k, then there exists some actor l such that j l and k is regularly equivalent to l. Measuring Networks: Role Positions 103

Regular Equivalence: There may be multiple regular equivalence partitions in a network, and thus we tend to want to find the maximal regular equivalence position, the one with the fewest positions. Measuring Networks: Role Positions 104