Exponential Random Graph Models Under Measurement Error Zoe Rehnberg with Dr. Nan Lin Washington University in St. Louis ARTU 2014.

Slides:



Advertisements
Similar presentations
Sampling Research Questions
Advertisements

A Tutorial on Learning with Bayesian Networks
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Ch11 Curve Fitting Dr. Deshi Ye
Convex Position Estimation in Wireless Sensor Networks
Introduction of Probabilistic Reasoning and Bayesian Networks
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Visual Recognition Tutorial
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
QUANTITATIVE DATA ANALYSIS
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 6 Introduction to Sampling Distributions
Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Lorelei Howard and Nick Wright MfD 2008
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
AM Recitation 2/10/11.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Statistical Inference for Large Directed Graphs with Communities of Interest Deepak Agarwal.
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
10 December, 2008 CIMCA2008 (Vienna) 1 Statistical Inferences by Gaussian Markov Random Fields on Complex Networks Kazuyuki Tanaka, Takafumi Usui, Muneki.
5 Descriptive Statistics Chapter 5.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Data Collection & Sampling Dr. Guerette. Gathering Data Three ways a researcher collects data: Three ways a researcher collects data: By asking questions.
Lecture 2: Statistical learning primer for biologists
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
By: Aaron Dyreson Supervising Professor: Dr. Ioannis Schizas
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Near repeat burglary chains: describing the physical and network properties of a network of close burglary pairs. Dr Michael Townsley, UCL Jill Dando Institute.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
Analysis of Social Media MLD , LTI William Cohen
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Menglong Li Ph.d of Industrial Engineering Dec 1st 2016
Analyzing Redistribution Matrix with Wavelet
When we free ourselves of desire,
Discrete Event Simulation - 4
Predict Failures with Developer Networks and Social Network Analysis
Assortativity (people associate based on common attributes)
Topic models for corpora and for graphs
Topic models for corpora and for graphs
Hierarchical Relational Models for Document Networks
CS 394C: Computational Biology Algorithms
Presentation transcript:

Exponential Random Graph Models Under Measurement Error Zoe Rehnberg with Dr. Nan Lin Washington University in St. Louis ARTU 2014

Examples of Social Networks Wikipedia pages – Individual pages are connected when there is a reference for one on the other. Article authorship – Two statisticians are connected when they co-author a paper. Friendship – Two high school students are connected when they indicate that they are friends with each other.

Network Data Nodes – individuals in a mock high school Edges – mutual friendships Adjacency matrix (W ) n = number of nodes w i,j = 1, if edge present 0, if edge absent

Exponential Random Graph Models The combination of nodes and edges in an adjacency matrix is random. Exponential random graph models explain how likely it is that a specific configuration of edges will occur: w – a given set of edges in an adjacency matrix θ – vector of model coefficients g(w) – a vector of statistics for the given adjacency matrix

Descriptive Statistics These statistics summarize how nodes are related to each other within the larger graph as a whole. These are used to form the ERG model for a given adjacency matrix. Examples 1.Degree 2.Degree centrality 3.Triangles

Estimating Model Coefficients The ERGM function in the statnet package of R uses a maximum likelihood approach to estimate , the vector of model coefficients. library("statnet") data(faux.mesa.high) dat <- faux.mesa.high # fit the original ERG model orig.model <- ergm(dat ~ edges + nodematch("Grade") + nodematch ("Race") + nodematch("Sex") + gwesp(0.4, fixed = TRUE), control = control.ergm(MCMC.samplesize = 1e+5, seed = 123)) # simulate from the original model sim.net <- simulate(orig.model, seed = 1534)

Possible Measurement Error Measurement error refers to how well an observed network reflects the true network. We focused on missing (false negative) and spurious (false positive) edges in the network. Possible sources of error: – Mistakes in collecting or coding data – Differences in perception

Goal of Our Study Goal: understanding ERGMs under measurement error Method: study by simulation 1.Model and simulate friendship network g(w) – edges, assortative mixing, shared partners 2.Imitate measurement error Adding probability: q = 0.001, 0.005, 0.01, 0.05 Removing probability: p = 0.01, 0.02, …, Estimate ERGM coefficients and statistics

Simulated measurement error by perturbing 100 networks at each probability combination Calculated root mean square error of the perturbed ERGM coefficients

Method of Spectral Denoising Naïve estimator: Empirical estimator: create through spectral decomposition of There is a continuity requirement for statistics. P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of network summary statistics through nonparametric network denoising. Annals of Statistics, arXiv: v3. p = probability of missing edge q = probability of spurious edge =

Challenges and Future Work The estimated adjacency matrices have non-integer values, which causes practical computational problems. – The R function ergm( ) only accepts adjacency matrices with 0/1 entries.

Challenges and Future Work The estimated adjacency matrices have non-integer values, which causes practical computational problems. – The R function ergm( ) only accepts adjacency matrices with 0/1 entries. Instead of obtaining a single estimate of, we want to simulate from the conditional distribution of and fit ERGMs to each. – The final estimation of will then be based on this simulated distribution.

References [1]A. Caimo and N. Friel. Bayesian inference for exponential random graph models. Social Networks, [2] Hanneman, Robert A. and Mark Riddle Introduction to social network methods. Riverside, CA: University of California, Riverside. [3]P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of network summary statistics through nonparametric network denoising. Annals of Statistics, arXiv: v3. [4]Wang, D.J., et al., Measurement error in network data: A reclassification. Soc. Netw. (2012), doi: /j.socnet