Social Networks Lecture 4: Collection of Network Data & Calculation of Network Characteristics U. Matzat.

Slides:



Advertisements
Similar presentations
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Where we are Node level metrics Group level metrics Visualization
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Introduction to Graph “theory”
SOCI 5013: Advanced Social Research: Network Analysis Spring 2004.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Introduction to Social Network Analysis Lluís Coromina Departament d’Economia. Universitat de Girona Girona, 18/01/2005.
Feb 20, Definition of subgroups Definition of sub-groups: “Cohesive subgroups are subsets of actors among whom there are relatively strong, direct,
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
By: Roma Mohibullah Shahrukh Qureshi
Variance and covariance M contains the mean Sums of squares General additive models.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Beginning the Research Design
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
Statistical Background
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Experimental Evaluation
Graph: Relations There are many kinds of social relations. For example: Role-based : brother of, father of, sister of, etc. : friend of, acquaintance of,
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
TU/e - 0ZM05/0EM15/0A Assignment 2. TU/e - 0ZM05/0EM15/0A150 2 Course aim knowledge about concepts in network theory, and being able to apply that.
Overview Granovetter: Strength of Weak Ties What are ‘weak ties’? why are they ‘strong’? Burt: Structural Holes What are they? What do they do? How do.
1 Social Network Analysis Feedback Assignment 1. TU/e – Social Network Analsysis, 0ZM05/0EM15/0A150 2 Feedback assignment 1 What makes it more likely.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
1 Social Networks Assignment 1: University-Industry Collaboration and Social Networks U. Matzat.
Social Network Analysis: A Non- Technical Introduction José Luis Molina Universitat Autònoma de Barcelona
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
1 The Spatial Dimension of Social Capital: An Exploration Zong-Rong Lee 李宗榮 Institute of Sociology Academia Sinica Taipei, Taiwan.
Assignment 2: remarks FIRST PART Please don’t make a division of labor so blatantly obvious! 1.1 recode - don't just delete everything that looks suspicious!
HOW TO WRITE RESEARCH PROPOSAL BY DR. NIK MAHERAN NIK MUHAMMAD.
Local Networks Overview Personal Relations: GSS Network Data To Dwell Among Friends Questions to answer with local network data Mixing Local Context Social.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
1 Innovation networks and alliance management Lecture 7/ Assignment 2 Advice networks and Innovation.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester
Data Structures & Algorithms Graphs
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Slides are modified from Lada Adamic
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Graphs & Matrices Todd Cromedy & Bruce Nicometo March 30, 2004.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Structural Holes & Weak Ties
1 Innovation networks and alliance management Assignment II [TIW: Assignment II + III]
OPTIMAL CONNECTIONS: STRENGTH AND DISTANCE IN VALUED GRAPHS Yang, Song and David Knoke RESEARCH QUESTION: How to identify optimal connections, that is,
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 Innovation networks and alliance management Lecture 4 Collection of Network Data & Calculation of Network Characteristics.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Production and Costs Ch. 19, R.A. Arnold, Economics 9 th Ed.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Lecture Slides Elementary Statistics Twelfth Edition
Social Networks Analysis
Local Networks Overview Personal Relations: Core Discussion Networks
Network Science: A Short Introduction i3 Workshop
Structural Holes & Weak Ties
Presentation transcript:

Social Networks Lecture 4: Collection of Network Data & Calculation of Network Characteristics U. Matzat

Course design Aim: knowledge about concepts in network theory, and being able to apply them, in particular in a context of innovation and alliances Introduction: what are they, why important … Four basic network arguments Kinds of network data (collection) & measurement Small world networks Business networks Assignment 1

Course outlook - today 3. Methods Kinds of network data: collection (Part I) Typical network concepts: calculation, UCINET software, visualisation (Part II) Later: Assignments - complete network analysis - ego-centered network analysis

Part 1 – Collection of Network Data in traditional surveys a random sample of units (e.g. managers) is interviewed properties of individuals are correlated to analyze some phenomena (e.g., correlation of age with openness for new ideas) focus on distributions of qualities of the individuals, not on their relations traditional assumption: sampled units (e.g., managers) are independent of each other and not related to each other inappropriate for SNA traditional survey instruments had to be adjusted & new ones had to be developed

Collection of Network Data: two main approaches within SNA 1.) ego-centered network analysis: network (of a specific type) from the perspective of a single actor (ego) 2.) complete network analysis: the relations (of a specific type) between all units of a social system are analyzed the first approach rests on an extension of traditional survey instruments can be combined with random sampling statistical data analyses possible with standard software (e.g., SPSS) the second approach is new (usually) cannot be combined with random sampling quantitative case study statistical data analyses with specialized software (e.g., UCINET)

Ego-centered network data random sample: selection of units (e.g. individuals) out of a population inclusion of one individual does not influence whether another one is also included relationship between units is no criterion of selection respondent (ego) mentions for a relationship of a certain type (e.g. friendship relation) other individuals (alteri) with whom he is related usually the alteri are not within the sample respondent gives additional information about -some characteristics of the alteri (age etc.) -the relations between the alteri crucial: specialized items for the generation of alteri: name-generator

Ego-centered network data: the generation of data via name generators name generator for reconstruction of friendship networks in a general population: first step: "From time to time people discuss questions and personal problems that keep them busy with others. When you think about the last 6 months - who are the persons with whom you did discuss such questions that are of personal importance for you. Please mention only the first name of the individuals." [If respondent mentions less than five names, ask once more: "Anybody else? " Write down only the first five names.] second step:-characterization of alteri (gender, age, etc) and relation between ego and alteri (e.g., strength of relation) third step: -characterization of relation between the different pairs of alter (e.g., strength of relation)

Ego-centered network data: example: reconstruction of university-company relationships random sample of university researchers question of interest: how does a researcher’s network look like that brings him into contact with business representatives for collaboration? reconstruction of four parts of the network from the point of view of the researcher: within university- within own faculty within university- outside own faculty outside university – within business world [outside university – personal friends, acquaintances etc.]

example: reconstruction of university-company relationships Questionnaire items Let us suppose that you are convinced that you have an idea, a product or something similar, in which collaboration with a business firm is a sensible and reasonable option. Do you have any contacts that could be of substantial value for bringing you in touch with a business firm? 0 yes 0 no (continue with question xx)

example: reconstruction of university-company relationships From which of the employees within your faculty do you expect that they can make a substantial contribution with respect to getting you in contact with business firms that might become partners? Mention the most important persons, at most four. First name Initial of last name From which of the employees outside your faculty but within your university do you expect that they can make a substantial contribution with respect to getting you in contact with business firms that might become partners? Mention the most important persons, at most four. First name Initial of last name

Example (cont) You mentioned up to 16 names of persons. Please write down the name of the first person mentioned, the second person mentioned, the third person mentioned, etc, until every name is on this list. Make sure that each name is mentioned once and only once. 1. .......................................................................... 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12 13 14 15 16. 17. 18. Please carefully check this list. Are any persons missing of whom you feel that – given the questions – they should be included in this list? Persons who are crucial in getting cooperation between you and a business partner going? If yes, please add these persons to the list (at most two extra persons) and briefly describe your relation to this person.

Example (cont): second step We would like to know how strong your relation with the persons in this list is. A strong relation would be a relation with frequent contact and with a regular exchange of information. The relation is strong. The relation is distant. Jack ○ 2. Jim 3 . …. 4. 5. 6. 7. 8. 9. 10. 11. 12 13 14 15 16. 17. 18.

Example (cont): third step Finally, we would like to ask you about the relations between the listed persons in your network. Start with the first person in the list. Consider the relation between this person and the other persons in the list. Choose between: S: strong relation D: distant relation 0: no relation Fill out an X if you cannot judge the relationship. Jim 01 Jack 02 ... 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18

ego-centered network data: data matrix example: name generator for three best friends (of two respondents) gender age friend 1 existing? friend 2 existing? friend 3 existing? tie strength 1 tie strength 1-2 gender friend 1 respondent 1 1 30 1 1 1 0.8 1 1 respondent 2 2 40 1 1 0 0.7 0 2 …………

ego-centered network data: data matrix

ego-centered network data: data matrix standard data matrix that can be analyzed with the conventional techniques and conventional software (e.g., SPSS, STATA etc) but special type of variables of the data set some variables describe the respondent some variables describe the respondent's contacts some variables describe the relation between the respondent and his contacts some variables describe relations between members of the respondent's (primary) network these variables can be used to construct other variables that describe properties of the respondent’s network (size, density etc) you have to construct these variables: e.g. via “TRANSFORM – COMPUTE” in SPSS

ego-centered network data ego-centered network data necessary for testing of typical network theories Example: structural holes hypothesis (ego=company) “Innovating companies tend to profit more from new product ideas the more structural holes they have in their collaboration networks with other companies.“ a test of this hypothesis is impossible with traditional surveys of companies

ego-centered network data: Strengths and weaknesses + random sampling possible + generalization to a well-defined population possible + for the social scientist easy to use techniques of data analysis - restriction to those parts of the network that are directly visible to the respondent: the primary network; other characteristics of the network are not taken into account

ego-centered network data:

ego-centered network data:

complete network data:

Complete network data example: network of informal communication between employees of a project group consisting of 5 persons: Mr Smith, Mr Jackson, Mr. White, Mrs Moneypenny, Mrs Brown questionnaire item for Mr Smith: "With whom of the following persons do you now and then chat during a normal working day?" Do you talk with… Mr. Jackson 0 yes 0 no Mr. White 0 yes 0 no Mrs Moneypenny 0 yes 0 no Mrs Brown 0 yes 0 no question is presented to all members of the project group you need to have a complete list of the names of all units (e.g. individuals) of the social system (e.g. project group) beforehand

Complete network data: sociomatrix the data matrix is different from the traditional data matrix every cell ij in the matrix provides information about the relation between units i and j ("from row i to column j") relation can be symmetric or asymmetric, valued or dichotomous

Complete network data: collection of complete network data impossible for large random samples necessary for many hypotheses that make predictions about structural effects: "In groups with a high network density the diffusion of innovations takes place more quickly than in groups with a low density." hypothesis can only be tested with complete network data data matrix of complete network data cannot be analyzed with the conventional data analysis techniques specialized software that offers special techniques is needed (e.g., UCINET) you can calculate network characteristics of actors and of the whole network you can calculate network characteristics (within UCINET) for actors that can be exported and then combined with other data (e.g., SPSS data)

Complete network data: Strengths and weaknesses + all aspects of the structure of relationships between all actors in a social system are taken into account no random sampling, therefore no generalizations are possible, rather: quantitative case study approach - other techniques of data analysis necessary

Complete network data:

Part II: Calculation & visualisation of network concepts (1): in- and outdegree For complete, valued, directed network data with N actors, and relations from actor i to actor j valued as rij , varying between 0 and R. Centrality and power: outdegree (or: outdegree centrality) For each actor j: the number of (valued) outgoing relations, relative to the maximum possible (valued) outgoing relations. OUTDEGREE(i) = j rij / N.R Centrality and power: indegree (or: indegree centrality) same, but now consider only the incoming relations NOTE1: this is a locally defined measure, that is, a measure that is defined for each actor separately NOTE2: this gives rise to several global network measures, such as (in/out)degree variance NOTE3: if your network is not directed, indegree and outdegree are the same and called degree NOTE4: these measures can be constructed in SPSS; no need for special purpose software. Try this yourself!

Network measures (2): number of ties of a certain quality 1 = I do not know who this is 2 = I know who it is, but never talked to him/her 3 = I have spoken to this person once or twice 4 = I talk to this person regularly 5 = I talk to this person often Number of ties: For each network or for each actor, the number of ties above a certain threshold (say, all ties with a value above 3) Number of weak ties (remember Mark Granovetter?): For each network or for each actor, the number of ties above and below a certain threshold (say, only ties with values 2 and 3) Try creating this one yourself in SPSS (try using ‘recode’)

Network measures (3): closeness Centrality and power again: closeness = Average distance to all others in the network Note: a shortest path from i to j is called a “geodesic” Define distance Dij from i to j as: * Minimum value of a path from i to j For every actor i, average distance = j Dij / N NOTE: THIS IS NOT EASY TO DO ANYMORE IN SPSS!

Network measures (4): the most common global network property Density /network closure (J. Coleman: “Dense networks provide social capital.”) For each network: the number of (valued) relations, relative to the maximum possible number of (valued) relations. = i,j rij / N (N-1) R (directed, valued ties) NOTE: normally only of use if your data consist of multiple networks (alliance networks in different sectors or countries / friendship networks in school classes / …) NOTE: this is still doable in SPSS

Network measures (5): Subgroup Models (Cohesion) aim: description of cohesive subgroups within the larger network general and common idea: a subgroup has a certain degree of cohesiveness (direct ties, strong ties) can also be used to make predictions about the diffusion of innovations according to the cohesion model (which pairs of actors influence each other?) which companies constitute a subgroup within the network? which companies are in many subgroups? how many subgroups do exist?

Subgroups: Some general terminology you need to know….. reachability if a path exists between 2 nodes then these nodes are called reachable  path length number of lines of a path (dichotomous data) example: path length 4213 = 3 geodesic distance between two nodes there can be more than one path between two nodes, the different paths can have different lengths d(i,j)=length of the shortest path between two nodes i and j example: 4213 = 3 , d(i,j)=3 if there exists no shorter path between i and j d(i,j)= if i,j are not reachable 8

Subgroups: Terminology.... completeness of a graph a graph is complete if all pairs of nodes (i,j) are reachable with d(i,j)=1 connectedness a graph is connected if for every pair (i,j) d(i,j)< subgraphs a subgraph Gs consists of a subset NsN and its lines Ls L that connect all {i,j}  Ns Maximality a subgraph is maximal with respect to some property (e.g., maximal with regard to completeness) if that property holds for the subgraph, but does no longer hold if any additional node and the lines incident with the node are added 8

Subgroups example: maximal completeness 5 7 maximal complete subgraph Gs Ns={1,2,3,4,5} and the ties between them 1 6 2 4 3

Subgroup Definitions for undirected dichotomous ties Cliques a cliques is a maximal complete subgraph that consists of at least three nodes 2 7   1 3 4                 5      6 Which cliques? {1,2,3}, {1,3,5}, {3,4,5,6} cliques can overlap, a clique can not be part of a larger clique because of the maximality condition impossible to calculate with SPSS!

Network measures (6): Structural holes This was covered in the 3rd lecture Network measures (6): Structural holes Ron Burt: “Structural holes create value” A 1 B 7 3 2 James Robert 4 5 6 Robert will do better than James, because of: informational benefits “tertius gaudens” (entrepreneur) autonomy 8 C

Network measures (6): Structural holes Burt, R.S. (1995) NOTE: structural holes can be defined on ego-networks! Burt splits his structural holes measure in four separate ones: [1] effective size [2] efficiency (= effective size / total size) [3] constraint (degree to which ego invests in alters who themselves invest in other alters of ego) [4] hierarchy (adjustment of constraint, dealing with the degree to which constraint on ego is concentrated in a single actor)

Structural holes: Effective size & efficiency B D G E C We calculate effective size and efficiency for actor G (note: because this is an ego-network, all would be different if we would have chosen, for instance, actor A) Ego=G, Size[G]=6 A B C D E F Eff. size Efficiency redundancy 3/6 2/6 0/6 1/6 4.67 78% Or, the same but a bit easier: Effective size = size - average degree of ego’s alters in ego’s network (excluding ties to ego). Here: 6 - {3 (A) + 2(B) + 0(C) + 1(D) + 1(E) + 1(F)}/6 = 6 - 1.33 = 4.67

Defining constraint: actors must divide their attention B D G E C   A B C D E F G 0.25 0.33 0.0 1.00 0.50 0.17 The assumption is that actors can only invest a certain amount of time and energy in their contacts, and must divide the available time and energy across contacts. If not explicitly measured, we assume all contacts are invested in equally.

Constraint Actor i is constrained in his relation with j to the extent that: [a] i invests in another contact q who … [b] invests in i’s contact j Total investment of i in j = Pij + q (piq pqj) “Since this also equals i’s lack of structural holes, constraint of i in j is taken to equal” ( Pij + q (piq pqj) )2 q piq pqj i pij j

Calculating constraint using matrices (1) c1 c2 c3 c4 c5 c6 c7 r1 0 .25 0 0 .25 .25 .25 r2 .333 0 0 .333 0 0 .333 r3 0 0 0 0 0 0 1 r4 0 .5 0 0 0 0 .5 r5 .5 0 0 0 0 0 .5 r6 .5 0 0 0 0 0 .5 r7 .17 .17 .17 .17 .17 .17 0 Adjacency matrix P = (see two slides ago) all investment from i in j in 1 step Matrix product P2 = P*P = all investments from i in j in 2 steps c1 c2 c3 c4 c5 c6 c7 r1 .37575 .0425 .0425 .12575 .0425 .0425 .33325 r2 .05661 .30636 .05661 .05661 .13986 .13986 .24975 r3 .17 .17 .17 .17 .17 .17 0 r4 .2515 .085 .085 .2515 .085 .085 .1665 r5 .085 .21 .085 .085 .21 .21 .125 r6 .085 .21 .085 .085 .21 .21 .125 r7 .22661 .1275 0 .05661 .0425 .0425 .52411

Calculating constraint using matrices (2) c1 c2 c3 c4 c5 c6 c7 r1 .37 .29 .04 .12 .29 .29 .58 r2 .38 .30 .05 .38 .13 .13 .58 r3 .17 .17 .17 .17 .17 .17 1 R4 .25 .58 .08 .25 .08 .08 .66 r5 .58 .21 .08 .08 .21 .21 .62 r6 .58 .21 .08 .08 .21 .21 .62 r7 .39 .29 .17 .22 .21 .21 .52 P + P2 = All investments from i to j in 1 or 2 steps Pij + q (piq pqj) (0.666)2 = 0.444 Etc … c1 c2 c3 c4 c5 c6 c7 r1 .141 .085 .002 .015 .085 .085 .340 r2 .151 .093 .003 .151 .019 .019 .339 r3 .028 .028 .028 .028 .028 .028 1 r4 .063 .342 .007 .063 .007 .007 .444 r5 .342 .044 .007 .007 .044 .044 .390 r6 .342 .044 .007 .007 .044 .044 .390 r7 .157 .088 .028 .051 .045 .045 .274 Hadamard matrix product (P+P2)2h = P+P2 squared element wise Constraint(i,j) can be read from this matrix

Calculating constraint using matrices (3) Total constraint for actor i = sum of all constraints Cij with ji c1 c2 c3 c4 c5 c6 c7 r1 .141 .085 .002 .015 .085 .085 .340 r2 .151 .093 .003 .151 .019 .019 .339 r3 .028 .028 .028 .028 .028 .028 1 r4 .063 .342 .007 .063 .007 .007 .444 r5 .342 .044 .007 .007 .044 .044 .390 r6 .342 .044 .007 .007 .044 .044 .390 r7 .157 .088 .028 .051 .045 .045 .274 = 0.755 <- Constraint(1) = 0.779 <- Constraint(2) = 1.173 <- Constraint(3) = 0.934 <- Constraint(4) = 0.879 <- Constraint(5) = 0.879 <- Constraint(6) = 0.691 <- Constraint(7)

Hierarchy = degree to which constraint is concentrated in a single actor Cij = constraint from j on i (as on previous pages) N = number of contacts in i’s network C = sum of constraints across all N relationships Hierarchy (i) Minimum = 0 (all i’s constraints are the same) Maximum = 1 (all i’s constraint is concentrated in a single contact)

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

Network concepts: Ucinet Software

To Do: Read the chapters 6, 9, 10-11 of Hanneman & Ridle on network techniques Download/install Ucinet and the talk.dl data Try it out! (Install SPSS and fresh up your SPSS knowledge!)