Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen The Social Statistics.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
EE384y: Packet Switch Architectures
Sampling Research Questions
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
Bayesian Belief Propagation
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
The Poisson distribution
Social Network Analysis - Part I basics Johan Koskinen Workshop: Monday, 29 August 2011 The Social Statistics Discipline Area, School of Social Sciences.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
An introduction to exponential random graph models (ERGM)
Statistical Analysis SC504/HS927 Spring Term 2008
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Simple Linear Regression Analysis
Basics of Statistical Estimation
A Tutorial on Learning with Bayesian Networks
Managerial Economics in a Global Economy
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
Where we are Node level metrics Group level metrics Visualization
Hierarchical Linear Modeling for Detecting Cheating and Aberrance Statistical Detection of Potential Test Fraud May, 2012 Lawrence, KS William Skorupski.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Exponential random graph (p*) models for social networks Workshop Harvard University February 2002 Philippa Pattison Garry Robins Department of Psychology.
Binary Response Lecture 22 Lecture 22.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
Objectives of Multiple Regression
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Particle Filtering in Network Tomography
Model Inference and Averaging
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Exponential Random Graph Models Under Measurement Error Zoe Rehnberg with Dr. Nan Lin Washington University in St. Louis ARTU 2014.
A two minute introduction to: Exponential random graph (p*)models for social networks SNAC Workshop, Illinois, November 2005 Garry Robins, University of.
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Lecture 2: Statistical learning primer for biologists
Introduction to Statistical Models for longitudinal network data Stochastic actor-based models Kayo Fujimoto, Ph.D.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Markov Chain Monte Carlo in R
Probability Theory and Parameter Estimation I
Bayesian data analysis
Assessing Disclosure Risk in Microdata
ERGM conditional form Much easier to calculate delta (change statistics)
Model Inference and Averaging
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Statistical Methods For Engineers
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
CS 188: Artificial Intelligence
Topic models for corpora and for graphs
Parametric Methods Berlin Chen, 2005 References:
Classical regression review
Presentation transcript:

Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen The Social Statistics Discipline Area, School of Social Sciences Mitchell Centre for Network Analysis Tuesday, 20 December

A relational perspective – networks matter Vegetarian partner x Ethical Economics Health Taste Dr D eats (predominantly) vegetarian food... Dr Dean Lushers relational take

A relational perspective – networks matter Someone close to you is unhappy will you remain unaffected?

A relational perspective – networks matter Equal opportunities based on our individual qualities......

A relational perspective – networks matter... bowl alone others bowl in leagues Some people bowl alone

Part 1 Network representations

Social networks mary paul We conceive of a network as a Relation defined on a collection of individuals relates to … go to for advice…

Social networks mary paul We conceive of a network as a Relation defined on a collection of individuals relates to … consider a friend…

Social networks mary paul We conceive of a network as a Relation defined on a collection of individuals relates to on off Generally binary Tie present Tie absent

Network representations A non-directed graph A social network of tertiary students – Kalish (2003)

Network representations

Police training squad: Confiding network (Pane, 2003)

Network representations World Trade in 1992 Plümper, 2003, JOSS

Network representations: attributes The actors (nodes) in the network are individuals with –attitudes, behaviours, and attributes These may –guide them in their choices of partners –be shaped (influenced) by their partners The actors may have individual and collective outcomes

Network representations: attributes A non-directed graph A social network of tertiary students – Kalish (2003)

Network representations: attributes A non-directed graph A social network of tertiary students – Kalish (2003) JewishArab

Network representations: attributes High School friendship, Moody, 2001 white black other

Network representations: attributes Romantic/sexual relationships at a US high school (Bearman, Moody & Stovel, 2004) Guess the blue and pink

Network representations: attributes detachedteam orientedpositive Team structures in training squads (Pane, 2003) (friendship network in 12 th week of training)

Multiple relations – entrailment, exchange, and generalized exchange Physical violence Violence & attitudes among school boys (Lusher, 2003)

Social networks We conceive of the Graph as a collection of Tie variables: {X ij : i,j V} john pete mary paul i - x ij x ik x il jx ji -x jl kx ki x kj - x kl lx li x lj x lk - x = i j k l =

Social networks We conceive of the Graph as a collection of Tie variables: {X ij : i,j V} i - x ij x ik x il jx ji -x jl kx ki x kj - x kl lx li x lj x lk - x = i j k l = l i j k

Social networks The Adjacency matrix: The matrix of the collection Tie var. {X ij : i,j V} i - x ij x ik x il jx ji -x jl kx ki x kj - x kl lx li x lj x lk - x =

Social networks: adjacency matrix Read Highland tribes

Social networks: adjacency matrix Read Highland tribes

Social networks: adjacency matrix Read Highland tribes

Social networks: adjacency matrix Read Highland tribes Symmetric for a non-directed network

Social networks: adjacency matrix Read Highland tribes

Social networks: adjacency matrix Read Highland tribes Zeroes along the diagonal – self ties not permitted

Part 2 Analysing social networks – Putting the building blocks of networks together using ERGM

Do we need to analyse networks? -Is the network a unique narrative? -stick to an ethnography? Possible answers -Detecting systematic tendencies -Social mechanisms -lift the description to describe network in generalizable terms

Networks matter – ERGMS matter 6018 grade 6 children 1966 FEMALE Male

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 300 schools Stockholm

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Do we need to analyse networks? -Is the network a unique narrative? -stick to an ethnography? Possible answers -Detecting systematic tendencies -Social mechanisms -lift the description to describe network in generalizable terms Conceptualising the network as a graph is what enables this

ERGMS – modelling graphs

ERGMS – modelling graphs: example Marriage network of Padgetts Florentine families

ERGMS – modelling graphs: example Marriage network of Padgetts Florentine families Model this as combination of 4 local structures Their importance measured by their parameters

ERGMS – modelling graphs: example effectMLES.E. Edge star star Triangle

ERGMS – modelling graphs: example effectMLES.E. Edge star star Triangle

Part 3 Modelling graphs – deriving building blocks out of dependencies

Independence - Deriving the ERGM l i j k m n heads tails l i l i heads tails i k i k

Independence - Deriving the ERGM 0.25 AUD 0.5 SEK 0.5 l i k l i k Knowledge of AUD, e.g. does not help us predict SEK e.g. whetheror

Independence - Deriving the ERGM i i k Knowledge of AUD, e.g.does not help us predict SEK e.g. whetheror even though dyad { i,l } l i and dyad { i,k } have vertex i in common

Independence - Deriving the ERGM AUD 0.5 SEK 0.5 l i k l i k May we find model such that knowledge of AUD, e.g. does help us predict SEK e.g. whetheror?

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul Consider the tie-variables that have Mary in common How may we make these dependent?

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul pete mary

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul john pete mary

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul john pete mary paul mary

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul john pete mary paul mary pete john

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul john pete mary paul mary paul john pete john

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul john pete mary paul mary paul pete paul john pete john

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul m,pa pa,pe pa,j m,pe pe,j m,j The probability structure of a Markov graph is described by cliques of the dependence graph (Hammersley-Clifford)….

Deriving the ERGM: From Markov graph to Dependence graph pete mary paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul m,pe

Deriving the ERGM: From Markov graph to Dependence graph john pete mary paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pe m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,j m,pe pe,j

Deriving the ERGM: From Markov graph to Dependence graph mary john pete paul m,pa pa,pe pa,j m,pe pe,j m,j

From Markov graph to Dependence graph – distinct subgraphs? too many statistics (parameters)

The homogeneity assumption = = = =

A log-linear model (ERGM) for ties Aggregated to a joint model for entire adjacency matrix Interaction terms in log-linear model of types

A log-linear model (ERGM) for ties By definition of (in-) dependence E.g. andco-occuring i j i j k i k Main effectsinteraction term More than is explained by margins

Part 4 Estimation of ERGM

Likelihood equations for exponential fam Aggregated to a joint model for entire adjacency matrix X Sum over all 2 n(n-1)/2 graphs The MLE solves the equation (cf. Lehmann, 1983):

Likelihood equations for exponential fam Solving Using the cumulant generating function (Corander, Dahmström, and Dahmström, 1998) Stochastic approximation (Snijders, 2002, based on Robbins-Monro, 1951) Importance sampling (Handcock, 2003; Hunter and Handcock, 2006, based on Geyer- Thompson 1992)

Robbins-Monro algorithm Solving Snijders, 2002, algorithm - Initialisation phase - Main estimation - convergence check and cal. of standard errors MAIN: Draw using MCMC

Geyer-Thompson Solving Handcock, 2003, approximate Fisher scoring MAIN: Approximated using importance sample from MCMC

Bayes: dealing with likelihood The normalising constant of the posterior not essential for Bayesian inference, all we need is: … but Sum over all 2 n(n-1)/2 graphs

Bayes: MCMC? Consequently, in e.g. Metropolis-Hastings, acceptance probability of move to θ … which contains

Bayes: Linked Importance Sampler Auxiliary Variable MCMC LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on Møller et al. (2006), we define an auxiliary variable And produce draws from the joint posterior using the proposal distributions and

Bayes: alternative auxiliary variable LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on Møller et al. (2006), we define an auxiliary variable Improvement: use exchange algorithm (Murray et al. 2006) Many linked chains: - Computation time - storage (memory and time issues) and Accept θ* with log-probability: Caimo & Friel, 2011

Bayes: Implications of using alternative auxiliary variable Improvement: use exchange algorithm (Murray et al. 2006) and Accept θ* with log-probability: Caimo & Friel, 2011 Storing only parameters No pre tuning – no need for good initial values Standard MCMC properties of sampler Less sensitive to near degeneracy in estimation Easier than anything else to implement QUICK and ROBUST

Part 5 Types of missing data

Sampling in/on networks

missing data observed data Sampling in/on networks

= 0 x = - 1 1

Sampling in/on networks = x =

Sampling in/on networks = x =

Sampling in/on networks = x =

Sampling in/on networks = x =

Sampling in/on networks = ? 0 ? 0 ? ? ? ? x = ? ? ? ? ? ? - 1 ? ? 1 -

Ignoring non-sampled? = ? 0 ? 0 ? ? ? ? x = ? ? ? ? ? ? - 1 ? ? 1 -

What about alter – alter across ego? = ? 0 ? 0 ? ? ? ? x = ? ? ? ? ? ? - ? ? ? ? -

School classes

Multilevel attribute models If network like another level: Groups: Group indicators: Networks in groups (scaled): With random intercepts:

Empirical setup =

=

Problem of boundary specification By design – children do not nominate alters outside of school class

Problem of boundary specification By design – children do not nominate alters outside of school class Out of school To other school class

Problem of boundary specification By design – children do not nominate alters outside of school class Out of school To other school class

Multilevel autocorrelation/nef models =

= ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Part 6 Estimation of ERGM with missing data

Model assisted treatment of missing network data missing data observed data If you dont have a model for what you have observed How are you going to be able to say something about what you have not observed using what you have observed

Model assisted treatment of missing network data Importance sampling (Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010) Stochastic approximation and the missing data principle (Orchard & Woodbury,1972) (Koskinen & Snijders, forthcoming) Bayesian data augmentation (Koskinen, Robins & Pattison, 2010)

What about alter – alter across ego? missing data observed data Available case analysis: pretend missing does not exist

The principled approach in ERGM framework missing data observed data We have to simulate the missing (complement) and pool our inferences

Subgraph of ERGM not ERGM i j k Dependence in ERGMWe may also have dependence i j l k But if k ? j We should include counts of: Marginalisation (Snijders, 2010; Koskinen et al, 2010)

Bayesian Data Augmentation With missing data: Simulate parameters In each iteration simulate graphs missing Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing Most likely missing given current Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing Most likely given current missing Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing Most likely missing given current Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing Most likely given current missing Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing Most likely missing given current Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing and so on… Bayesian Data Augmentation

Simulate parameters With missing data: In each iteration simulate graphs missing … until Bayesian Data Augmentation

What does it give us? Distribution of parameters Distribution of missing data Subtle point Missing data does not depend on the parameters (we dont have to choose parameters to simulate missing) missing Bayesian Data Augmentation

What does it give us? Distribution of parameters Distribution of missing data Subtle point Missing data does not depend on the parameters (we dont have to choose parameters to simulate missing) missing Bayesian Data Augmentation

Part 7 Estimation of ERGM with missing data - Example Missing ties

Bayesian Data AugmentationLazegas (2001) Lawyers Collaboration network among 36 lawyers in a New England law firm (Lazega, 2001) Boston office: Hartford office: Providence off.: least senior: most senior:

Bayesian Data AugmentationLazegas (2001) Lawyers 133 Edges: Seniority: Practice: Homophily Sex: Office: GWESP: with 8 = log( ) Practice: Main effect t 1 : t 2 : etc. ( b i = 1, if i corporate, 0 litigation) t 3 :

Bayesian Data Augmentation Lazegas (2001) Lawyers – ERGM posteriors (Koskinen, 2008)

Bayesian Data Augmentation Cross validation (Koskinen, Robins & Pattison, 2010) Remove 200 of the 630 dyads at random Fit inhomogeneous Bernoulli model obtain the posterior predictive tie-probabilities for the missing tie-variables Fit ERGM and obtain the posterior predictive tie- probabilities for the missing tie-variables (Koskinen et al., in press) Fit Hoffs (2008) latent variable probit model with linear predictor T z(x ij ) + w i w j T Repeat many times

Bayesian Data Augmentation ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

Bayesian Data Augmentation ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

Bayesian Data Augmentation ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

Part 8 Estimation of ERGM with missing data - Sampled data and covert actors

Bayesian Data AugmentationSnowball sampling Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)... but snowball sampling rarely used when population size is known... Using the Sageman (2004) clandestine network as test-bed for unknown N

Bayesian Data AugmentationSnowball sampling Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)... but snowball sampling rarely used when population size is known... Using the Sageman (2004) clandestine network as test-bed for unknown N

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Take seed of size n = 120 Snowball out 1 wave. Additional nodes m = 160

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301 N = 311

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301 N = N = 391 N = 396 N = 399

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301 N = N = 391 N = 396 N = credibility intervals N

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301 N = N = 391 N = 396 N = credibility intervals

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N 280 Assume N = 281 N = 291 N = 301 N = N = 391 N = 396 N = credibility intervals

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N prediction intervals Bernoulli ERGM obs

Bayesian Data Augmentationthe Sageman (2004) N = 366 network Seed n = 120, first wave m = 160, N prediction intervals Bernoulli ERGM obs

Bayesian Data AugmentationSnowball sampling – Next steps We can fit and predict missing conditional on N Next: Marginalise with respect to N, and Estimate N -Use path sampler -Take combinatorics of zero block into account

Part 9 Further issues

How large networks can we allow for? Large N ERGMs do not scale up (cp missing data experiments) Lot of unobserved data – lot of unobserved covariates Computational issues – time and memory Heterogeneity…

How large networks can we allow for? ERGMs typically assume homogeneity (A)Block modelling and ERGM (Koskinen, 2009) (B) Latent class ERGM (Schweingberger & Handcock)

Solutions and future directions Ignoring unknown N : - Conditional MLE for snowball sample does not require knowledge of N (sic!) (Pattison et al., in preparation) Estimating N : - Bernoulli assumptions (Frank and Snijders, 1994 JOS) - Using ERGM and Bayes factors? (Koskinen et al., in preparation) - Using heuristic GOF; posterior predictive distributions, re-sampling and copula (?)

Wrap-up ERGMs - Increasingly being used - Increasingly being understood - I ncreasingly being able to handle imperfect data (also missing link prediction) Methods -Plenty of open issues -Bayes is the way of the future Legitimacy and dissemination - e.g. Lusher, Koskinen, Robins ERGMs for SN, CUP, 2011