Tutorial: Analyzing real network data 1) Creating data from survey You can download all of the needed files from here:

Slides:



Advertisements
Similar presentations
Social Network Analysis UCINET. UCINET--Introduction UCINET—UCINET is produced by Analytic Technologies. It offers a very user-friendly, reasonably priced.
Advertisements

Where we are Node level metrics Group level metrics Visualization
Alternative Simulation Core for Material Reliability Assessments Speculation how to heighten random character of probability calculations (concerning the.
Block Modeling Overview Social life can be described (at least in part) through social roles. To the extent that roles can be characterized by regular.
Inference for Regression
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Logistic Regression Example: Horseshoe Crab Data
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
The Basics of Network Computing Michael T. Heaney University of Michigan August 31, Hour lesson This material is distributed under an Attribution‐NonCommercial‐ShareAlike.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 12 Simple Regression
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
LSP 121 Week 2 Intro to Statistics and SPSS/PASW.
Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
Chi-square Test of Independence
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
Business Statistics: Communicating with Numbers
Inference for regression - Simple linear regression
Tutorial: Analyzing real network data
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Chapter 15 Correlation and Regression
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
ESD.70J Engineering Economy Module - Session 21 ESD.70J Engineering Economy Module Fall 2005 Session One Alex Fadeev - Link for this PPT:
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Local Networks Overview Personal Relations: GSS Network Data To Dwell Among Friends Questions to answer with local network data Mixing Local Context Social.
MICHAEL T. HEANEY UNIVERSITY OF MICHIGAN AUGUST 31, HOUR LESSON The Basics of Network Computing.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
ESD.70J Engineering Economy Module - Session 21 ESD.70J Engineering Economy Fall 2009 Session Two Michel-Alexandre Cardin – Prof. Richard.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Describing Distributions with Numbers Chapter 2. What we will do We are continuing our exploration of data. In the last chapter we graphically depicted.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Correlation & Regression Analysis
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
11 Network Level Indicators Bird’s eye view of network Image matrix example of network level Many network level measures Some would argue this is the most.
Example x y We wish to check for a non zero correlation.
Analysis of Experiments
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Tutorial: Analyzing real network data
IMPORTANT If you haven’t yet completed the task in which you measured your digit ratio and completed the BART task, then please stop reading This set of.
Statistics in MSmcDESPOT
Local Networks Overview Personal Relations: Core Discussion Networks
12 Inferential Analysis.
Social Balance & Transitivity
CHAPTER 26: Inference for Regression
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Comparing Two Groups Statistics 2126.
12 Inferential Analysis.
Simple Linear Regression
15.1 The Role of Statistics in the Research Process
SEM: Step by Step In AMOS and Mplus.
Presentation transcript:

Tutorial: Analyzing real network data 1) Creating data from survey You can download all of the needed files from here: -This is data (modified) from one of the Add Health schools. I’ve changed the data some for security reasons. We’ll walk through some of the data coding issues, creating measures & figures, and then running peer influence & structural models on the network. -Outline: -From survey to analysis files -Exploring the network: visualization -Network Behavior & Peer Influence Models -Network structure as indep variable -Peer influence models -Dyad similarity models -Network Structure analyses -Clustering for peer groups -Block models -Statistical Models for networks (STANET).

This is what students filled out in the Add Health, in school survey. One set for male friends, another for female friends. This is the foundation of our data…. Tutorial: Analyzing real network data 1) Creating data from survey

This is what students filled out in the Add Health, in school survey. One set for male friends, another for female friends. This is the foundation of our data…. Resulting in a nomination data file that looks something like this (actual numbers changed). We want to turn this file into something PAJEK, UCINET, etc. can read. Open “netcreate.sas” & walk through logic of the file. Tutorial: Analyzing real network data 1) Creating data from survey

Netcreate.sas used files from SPAN to create PAJEK files. PAJEK files have a fixed structure that is easy to program for. See the PAJEK support files for details. There are programs that convert excel or text to PAJEK format. And UCINET (and STATNET, sort of ) all read pajek.NET files. Tutorial: Analyzing real network data 1) Creating data from survey

Tutorial: Analyzing real network data 2) Exploring the network graphically I think it’s extremely useful to simply “play” with the network in various ways and get a sense of the shape of the network. This is perhaps PAJEK’s most usefule effect. -- Load a network and work through good/bad plots.

Tutorial: Analyzing real network data 2) Exploring the network graphically Once you have a network, how do you create a print-ready image? a)Screen shots (good for.ppt) b)Export to.ps or FLASH and edit in Illustrator

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence We often want to know how some simple features of the network position affect students. These are “network behavior” models, where some indicator measure of network position is used to predict an outcome. One should think carefully about a theoretical model here. Cause is often very difficult to disentangle. Here we’ll leave those questions asside and simply look for correlates of network position in behavior. We’ll look at: a) network volume (degree) b) centrality (Closeness) c) local reciprocity (proportion of ties ego send that are received) We can get most of these from either SAS or PAJEK, though I’m not sure PAJEK can give you node-level reciprocity rates… Paj_nodestatread.sas is the SAS file…

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence Paj_nodestatread.sas is the SAS file… After running some models we get:

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence Open nodestats1.sas to see how to code these same stats, plus a few, in SAS…

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence QAP is an alternative method that doesn’t make as many strong assumptions about the model. To use QAP, we can run in SAS (but it’s slow and basic), or export to UCINET (which is fast, sophisticated and all that jazz). The “qapstats.sas” file moves the data for us….

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence We can also estimate the network autocorrelation model directly. We can get “QAD” estimates just by adding the W*Y term to the base model, which typically performs fairly well. Open peerinfl1.sas to see this routine. Alternatively, UCINET calculates a simple network correlation between any vector (Nx1) and any matrix (NxN) to estimate the bivariate peer effect, and Carter Butts’ LNAM routine in R (as part of SNA), let’s you run a full linear network autocorrelation model. For stats details: Leenders, T.Th.A.J. (2002) ``Modeling Social Influence Through Network Autocorrelation: Constructing the Weight Matrix'' Social Networks, 24(1), Anselin, L. (1988) Spatial Econometrics: Methods and Models. Norwell, MA: Kluwer

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence To run the R version, we need to export the data. We can get started using the send2r.mac routine and reshape some of the files. The sas program “sas2r_peerinfl.sas” creates the needed external files The r script “lname_example.r” is the needed r script. Run the example models…. Call: lnam(y = fights, x = cv, W1 = w1, W2 = clbs) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error Z value Pr(>|z|) FEMALE * WHITE S e-05 *** rho *** rho Result of “fights” as Y, friendship as W1, club overlap as W2

Tutorial: Analyzing real network data 3) Network Behavior & Peer Influence Getting measures from PAJEK. PAJEK has no direct ID link to files. These are simply text files, so sort order matters. The basic routine to get any measure in PAJEK is to create the measure using the dropdown menus, then save the files and read them into SAS, SPSS or whatever stats program you use. Open the PAJEK files and create in-degree, out-degree, closeness centrality, & reciprocity.

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network As part of the description, we often want to identify significant clusters in the network. There are lots of ways to do this, we’ll sample a few. a) Using UCINET’s routines b) Clustering a distance matrix (SAS) c) The “Jiggle” routine (SAS, Moody) d) The “Crowds” algorithm e) Using PAJEK’s blockmodel routine to fine-tune a peer group model.

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Clustering in UCINET -I find it simplest to read PAJEK files. Then the best “general” routine is FACTIONS, though it is slow for large (100s) nets. Very effective for small nets. -In a pinch, CONCOR will often yield reasonable peer groups, and it’s faster in UCINET Clustering in SAS - We can often get a quick starting point by simply using a hierarchical clustering on the distance matrix. This is a fair place to start for nets in the 100s of nodes size. - Two algorithms that work fairly well are “Jiggle” for large nets and “Crowds” for smaller nets. Both work by extending the RNM approach of Moody (2001), but jiggle is faster for large nets, Crowds includes more checks for particular structurs (like biconnected sets) and thus is slower.

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Clustering in PAJEK Pajek doesn’t have a dedicated clustering routine for finding peer groups in nets. But you can coerce the blockmodel routine to find block-diagonal structures (slow) or use some of it’s neighboring partitions. Keep an eye on this, as I bet they implement Newman’s algorithm soon… Let’s try running some of these….

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Sample results This is the resulting graph from a “Jiggle” run on the school net. Note this is a randomized algorithm, so you will get dif. Results from dif. runs

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Sample results This is the resulting graph from a “Crowds” run on the school net. We end up with smaller clusters, and a larger “background” set. By construction, the clusters must be bi-connected, so they are “rounder” than in the prior algorithm.

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Sample results This is the resulting graph from a “Crowds” run on the school net. We end up with smaller clusters, and a larger “background” set. By construction, the clusters must be bi-connected, so they are “rounder” than in the prior algorithm.

Tutorial: Analyzing real network data 4) Network Structure: Clustering the network Sample results This is the resulting graph from a “Crowds” run on the school net. We end up with smaller clusters, and a larger “background” set. By construction, the clusters must be bi-connected, so they are “rounder” than in the prior algorithm.

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network Sample results The most commonly used blockmodel routine is ConCorr, which is simple and fast. The result is a set of nested “splits” – to some pre-specified depth. Here I apply that result to the school net, working to a depth of 3 splits. Split 1

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network Sample results The most commonly used blockmodel routine is ConCorr, which is simple and fast. The result is a set of nested “splits” – to some pre-specified depth. Here I apply that result to the school net, working to a depth of 3 splits. Split 2 Note that the 2 nd split in the bottom half captures a “periphery” position

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network Sample results The most commonly used blockmodel routine is ConCorr, which is simple and fast. The result is a set of nested “splits” – to some pre-specified depth. Here I apply that result to the school net, working to a depth of 3 splits. Split 3

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network More in keeping w. the spirit of the original block modeling papers, “regular equivalence” models are less likely to generate block-diagonal models. A simple positional model is the “core-periphery” model. This searches for a single “core” in the net. Since we know this net is split in two “wings”, we’ll just look within one of them.

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network Another simple way to get at positions in a network is to compare nodes across a vector of triad-positions. In a directed network, the vector giving the count of which positions an actor is part of nicely summarizes the type of role the actor plays in the net _S 012_E 012_I 102_D 102_I 021D_S 021D_E 021U_S 021U_E 021C_S 021C_B 021C_E 111D_S 111D_B 111D_E 111U_S 111U_B 111U_E 030T_S 030T_B 030T_E 030C 201_S 201_B 120D_S 120D_E 120U_S 120U_E 120C_S 120C_B 120C_E 210_S 210_B 300 Triadic Position Census: 36 Positions within 16 Directed Triads Indicates the position.

Tutorial: Analyzing real network data 4) Network Structure: Block modeling a network Another simple way to get at positions in a network is to compare nodes across a vector of triad-positions. In a directed network, the vector giving the count of which positions an actor is part of nicely summarizes the type of role the actor plays in the net.

Tutorial: Analyzing real network data 4) Statistical Models for Networks The exponential random graph (ERGM) class of models are designed to let you model an observed network as a function of local-network, node, and dyad-level features. These models take the form:

Tutorial: Analyzing real network data Statistical Models for Networks

Tutorial: Analyzing real network data Statistical Models for Networks

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006):

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006): Note this is a very simple “dyad independence” model.

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006): The dyad-independence model had been extended to other “node” features

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006): Lots of other structural features can be included, though not all imply reasonable models

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006):

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006): The STATNET statistical package in R is the best way to estimate these models. We will: walk through exporting our school friendship data from SAS and bringing it into R. Specify some simple models Demonstrate getting goodness of fit stats on these models Demonstrate simulating from a model The ultimate set of stats one can add to a model are growing quickly…. Open “statnet_datawrite.sas” to see how to create data for export.

Tutorial: Analyzing real network data Statistical Models for Networks From Handcock (2006): Results from a model (takes too long to run in real time!): Summary of model fit ========================== Formula: s_friends ~ edges + mutual + ttriad + nodematch("S3") + nodematch("WHITE") + edgecov(s_clubs, "ovlpec") Newton-Raphson iterations: 87 MCMC sample of size Monte Carlo MLE Results: estimate s.e. p-value MCMC s.e. edges < 1e mutual < 1e ttriad < 1e nodematch.S < 1e nodematch.WHITE edgecov.s_clubs.ovlpec Null Deviance: on degrees of freedom Residual Deviance: on degrees of freedom Deviance: on 6 degrees of freedom AIC: BIC: