Download presentation
Presentation is loading. Please wait.
Published byPiers Berry Modified over 9 years ago
1
Network Models and Data Analysis Stephen E. Fienberg Department of Statistics Machine Learning Department Machine Learning 10-701/15-781, Fall 2008
2
2 Making Pretty Pictures — Visualizing Networks — Is Easy October 22, 2008
3
3
4
4 Example 1: 9/11 Terrorists October 22, 2008
5
5 Lots of Probabilistic/Statistical Models Types of models: –Descriptive vs. Generative. –Static vs. Dynamic. Origin of social network models in 1930s, integrated with graph representation in 1950s. Erdos-Renyi random graph models. –Generalized random graph models. –Stochastic process reinterpretations. Sociometric models such as p 1 and ERGMs. Machine learning / latent-variable models: –Stochasitic block models for mixed membership. October 22, 2008
6
6 Applications Galore Small world studies Social networks: –Sampson’s monks –Classroom friendship Organization theory –Branch banks Homeland security Politics –Voting behavior –Bill co-sponsorship Public health –Needle sharing –Spread of AIDS –Obesity Computer science: –Email networks (Enron) –Internet –WWW routing systems Biology: –Protein-protein interactions –Zebras October 22, 2008
7
7 But Doing Careful Statistical Analysis is Difficult Claims for network behavior are often based on casual empiricism: –Power laws are everywhere, yet nowhere once we look closely at the data. Inferential issues usually buried: –Algorithms, simulations, and “experiments” are not substitutes for formal statistical representation and theory. October 22, 2008
8
8
9
Power Laws & Internet Graph October 22, 20089
10
10 Framework for Networks Evolving over Time Our representation for a network will be a graph: G t ={N t ;E t }. –Nodes and edges can be created and can die. –Edges can be directed or undirected. –Data are available to be observed beginning at time t 0. There exists stochastic process, evolving over time which, combined with initial conditions, describes the network structure and evolution. –May involve more than dyadic relationships. October 22, 2008
11
11 Forms of Network Data 1.Observe formation (or removal) of each edge with a time stamp indicating when this occurs. Can see how entire network or sub-network changes with each transaction. 2. Observe status of network or sub-network at T epochs. Represent snapshots of network. Correspond to information on incidence of links and information on relationships. 3.Observe cumulative network links over time. “Prevalence” approach. October 22, 2008
12
12 Example 3: Enron E-mail Database Attributes nodes (including organization chart!) and full text on all e-mail messages. Multiple addressees and cc’s. Thus observations produce structure different from dyadic edges. Messages contain time stamps, so we are in situation 3. Question: Who was party to fraudulent transactions and when? October 22, 2008
13
Enron−Threshold 5 (151 employees) October 22, 200813
14
Enron−Threshold 30 (151 employees) October 22, 200814
15
15 Example 4: The Framingham “Obesity” Study Original Framingham “sample” cohort with offspring cohort of N 0 =5,124 individuals measured beginning in 1971 for T=7 epochs centered at 1971, 1981, 1985, 1989, 1992, 1997, 1999. Link information on family members and one “close friend.” Total number of individuals on whom we have obesity measures is N=12,067. NEJM, July 2007. October 22, 2008
16
16October 22, 2008
17
17 Animation October 22, 2008
18
Erdos-Renyi Random Graph Model Two versions: –In G(n, M) model, graph is chosen uniformly at random from collection of all graphs which have n nodes and M edges. –In G(n, p) model, each edge is included in graph with probability p, with presence or absence of distinct edges being independent. As p increases from 0 to 1, the model becomes more and more likely to include graphs with more edges. – October 22, 200818
19
Erdos-Renyi Random Graph Model G(n, p) has on average n C 2 p edges, and distribution of degree of any node is binomial (n,p). –If np < 1, G(n,p) will almost surely have no connected components of size larger than O(logn). –If np = 1, G(n,p) will almost surely have largest component whose size 0(n 2 / 3 ). –If np tends to constant c > 1, G(n, p) will almost surely have unique "giant" component containing positive fraction of the nodes. No other component will contain more than O(logn) nodess. October 22, 200819
20
20October 22, 2008
21
Preferential Attachment Model Encourages formation of hubs in in graph. Degree distribution follows power law. –Fraction of nodes having k edges to other nodes for large values of k as P(k) ~ k −γ. –Linear on log-log scale. October 22, 200821
22
Small World Model Designed to produce local clustering and triadic closures, by interpolating between an ER graph and a regular ring lattice. October 22, 200822
23
October 22, 200823 Example 5: Monks in a Monastery 18 novices observed over two years. –Network data gather at 4 time points; and on multiple relationships, e.g., friendship. –Airoldi, et al., (2007, 2008)
24
October 22, 200824
25
Holland-Leinhardt’s p 1 Model n nodes; occurrence of “directed” links is random. Consider dyads D ij = (X ij,X ji ) to be independent with –Pr(D ij =(1,1)) = m ij, i < j –Pr(D ij =(1,0)) = a ij, i ≠ j –Pr(D ij =(0,0)) = n ij, i < j where m ij + a ij + a ji + n ij = 1, for all i < j. 25October 22, 2008
26
p 1 Model If we let –ρ ij = log{m ij n ij /(a ij a ji )}, i < j –θ ij = log{a ij /n ij }, i ≠ j Then p 1 assumes probability of observing x is: p 1 (x) = Pr(X=x) = K exp[Σ i<j ρ ij X ij X ji + Σ ij θ ij X ij ] –K = Π i<k 1/k ij –k ij ({θ ij }, { ρ ij }) is a normalizing constant for D ij. 26October 22, 2008
27
Three Common Forms of p 1 If we add restrictions: θ ij = θ + α i + ß j i ≠ j (i) ρ ij = 0, (ii) ρ ij = ρ, and (iii) ρ ij = ρ+ρ i +ρ j Then for case (ii): p 1 (x) = K exp[ρM + θL + Σ i α i X i + + Σ j ß j X +j ] 27 reciprocitydensityexpansivenesspopularity October 22, 2008
28
Estimation for p 1 Exponential family form. –Set MSSs equal to their expectations. –Iterate. Holland and Leinhardt explored goodness of fit of p 1 : –Comparing ρ ij = 0 vs. ρ ij = ρ. –Usual chi-square results don’t apply. –How to test ρ ij = ρ against a more complex model? 28October 22, 2008
29
p 1 As a Log-Linear Model p 1 is expressible as log-linear model on “incomplete” 4-way contingency table: –Y ijkl = 1 if X ij = k and X ji = l, 0 otherwise. p 1 with ρ ij = ρ corresponds to log-linear model on Y with all two-way interactions: [12][13][14][23][24][34]. p 1 with ρ ij = ρ +ρ i + ρ j corresponds to [12][134][234]. 29October 22, 2008
30
30 Example 5: Monks in a Monastery 18 novices observed over two years. –Network data gather at 4 time points; and on multiple relationships, e.g., friendship. –Airoldi, et al., (2007, 2008)
31
p 1 Analysis of Monk Data October 22, 200831
32
p 1 Analysis of Monk Data October 22, 200832
33
Sampson’s Monks−3 Blocks? October 22, 200833
34
October 22, 200834 K=3 SBMM for Friendship Friendship relationship among novices measured at 3 successive times. K=3 stochastic blocks + mixed membership:
35
October 22, 200835 Example 6: MIPS-Curated PPI in Yeast 871 proteins participate in 15 high-level functions 2119 functional annotations (binary)
36
October 22, 200836 M=871 nodes M 2 =750K entries The Data: Interaction Graphs M proteins in a graph (nodes) M 2 observations on pairs of proteins –Edges are random quantities, Y [n,m] –Interactions are not independent. –Interacting proteins form a protein complex. T graphs on the same set of proteins Partial annotations for each protein, X [n]
37
October 22, 200837 Modeling Ideas Hierarchical Bayes: –Latent variables encode semantic elements –Assume structure on observable-latent elements 1. Models of mixed membership 2. Network models (block models) Stochastic block models of mixed membership =
38
October 22, 200838 Graphical Model Representation Stochastic Blocks Mixed Membership
39
October 22, 200839 Interactions (observed * ) j i y ij = 1 ii jj 1 2 3 Mixed membership Vectors (latent * ) h g 1 2 3 123123 23 = 0.9 Group-to-group patterns (latent * ) Pr ( y ij =1 | i, j, ) = i j T Hierarchical Likelihood
40
October 22, 200840 Interactions in Yeast (MIPS) Do PPI contain information about functions? YLD014W 1010 1 2 3... 15
41
October 22, 200841 Results: Functional Annotations
42
October 22, 200842 Results: Stochastic Block Model
43
October 22, 200843 Some Results K=50 blocks works well using 5-fold cross-validation, and are consistent with 15 functional categories. Our predictions of functional annotations are superior to others in the literature on same data base. Lots of technical details.
44
44 Example 7: Social Network of Zebras October 22, 2008
45
45 Dynamical Representation What is the stochastic model for group formation and change? Groups of females and shifting males who are mating? October 22, 2008
46
46 Summary Lots of networks and their graphs representation. Eros-Renyi random graph models, preferential attachment models, small world models. p 1 and log-linear models. –Generalization to Exponential Random Graph Models. Stochastic block models with mixed membership. October 22, 2008
47
47 Some References Holland, P.W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76:33–65. Fienberg, S.E. and Wasserman, S.S. (1981). Categorical data analysis of single sociometric relations. Sociological Methodology, 156–192. Fienberg, S.E. Meyer, M.M. and Wasserman, S.S. (1985). Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association, 80:51–67. Airoldi, E.M., Blei, D.M. Fienberg, S.E., Goldenberg, A., Xing, E., and Zheng, A. eds. (2007). Statistical Network Analysis: Models, Issues and New Directions, LNCS 4503 Springer-Verlag, Berlin. Newman, M., Barabási, A.L., and Watts, D.J. eds. (2006). The Structure and Dynamics of Networks. Princeton Univ. Press. Airoldi, E.M., Blei, D.M. Fienberg, S.E., and Xing, E. (2008). Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9:1981—2014. October 22, 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.