Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.

Slides:

Advertisements

Similar presentations

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.

Advertisements

How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.

Representing and Querying Correlated Tuples in Probabilistic Databases

Unsupervised Learning

Spread of Influence through a Social Network Adapted from :

Influence and Correlation in Social Networks Xufei wang Nov

Analysis and Modeling of Social Networks Foudalis Ilias.

Analysis of Social Media MLD , LTI William Cohen

Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.

Nodes, Ties and Influence

Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.

Graph Data Management Lab School of Computer Science , Bristol, UK.

1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Simple Linear Regression

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Business Statistics - QBM117 Statistical inference for regression.

Chapter 2 Research Methods. The Scientific Approach: A Search for Laws Empiricism: testing hypothesis Basic assumption: events are governed by some lawful.

A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人：徐波.

Models of Influence in Online Social Networks

Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.

Chapter 2: The Research Enterprise in Psychology

1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.

(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.

Chapter 2: The Research Enterprise in Psychology

Chapter 2 Research Methods. The Scientific Approach: A Search for Laws Empiricism: testing hypothesis Basic assumption: events are governed by some lawful.

Influence and Correlation in Social Networks Mohammad Mahdian Yahoo! Research Joint work with Aris Anagnostopoulos and Ravi Kumar to appear in KDD’08.

Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.

Influence and Correlation in Social Networks Priyanka Garg.

Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Chapter 1: The Research Enterprise in Psychology.

WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.

Tie Strength, Embeddedness & Social Influence: Evidence from a Large Scale Networked Experiment Sinan Aral, Dylan Walker Presented by: Mengqi Qiu(Mendy)

Chapter 2 Research in Abnormal Psychology. Slide 2 Research in Abnormal Psychology  Clinical researchers face certain challenges that make their investigations.

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.

Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU

1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.

Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.

PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Networks and Surrounding Contexts Chapter 4, from D. Easley and J. Kleinberg book.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Online Social Networks and Media

Computer Simulation. The Essence of Computer Simulation A stochastic system is a system that evolves over time according to one or more probability distributions.

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.

Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.

A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Univariate Gaussian Case (Cont.)

FTCE 5-9 Test Prep Center for Teaching and Learning.

Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.

Network (graph) Models

Chapter 2 Research Methods.

Univariate Gaussian Case (Cont.)

Presented by: Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Nanyang Technological University

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

Maximal Independent Set

Maximal Independent Set

LECTURE 09: BAYESIAN LEARNING

Modeling Topic Diffusion in Scientific Collaboration Networks

Presentation transcript:

Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Preliminaries - Correlations exist in users' behaviors

Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active

Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active - Want to distinguish between different sources of social correlation

Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences

Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors)‏

Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors)‏ - Influence = the action of one individual induces another individual to act in a similar way.

Motivation - Useful to know when social influence is the source of correlation

Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals

Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏

Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏ - We want to identify situations when such techniques can be applied.

Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏ - We want to identify situations when such techniques can be applied. - Also useful for analysis (predicting future state of network)‏

Modeling Influence 1. Graph G drawn according to some distribution

Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1,..., T, each non-active agent decides whether to become active.

Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1,..., T, each non-active agent decides whether to become active. 3. An agent becomes active with probability p(a), a function of the number of neighboring and active nodes.

or, alternatively,

Some remarks... - The coefficient α measures social correlation.

Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step

Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes

Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes - However, these simplifying assumption are practical

Estimating α, β - Can estimate using maximum likelihood logistic regression - Maximize expression where is the number of users who at the beginning of time had a active friends and became active at time t

The Shuffle Test - Idea: if influence does not play a role, then the timing of activations amongst users should be independent of each other: Pr(a active before b) = Pr(b active before a)‏

The Shuffle Test 1. Estimate α for initial graph 2. Randomly permute the order in which active nodes have been activated: set the time of 3. Estimate α' for this configuration 4. If the values for α and α' are close to each other, the model exhibits little or no social influence.

The Edge-reversal Test 1. reverse direction of all the edges 2. run the same logistic regression on the data using the new graph If correlation is not due to influence, then α should not change

Generative Models - No Correlation - Influence - Correlation, no influence

Generative Models - No Correlation - network grows just as the real data - at every step, randomly pick n nodes, and make them active

Influence Model - network grows just as the real data - at every step, every inactive node flips a coin, with

Correlation, No Influence Model - network grows just as the real data - Pick a subset S of G: - randomly pick centers, add a ball of radius 2 from each to S - do this until |S| reaches parameter L - Pick nodes to become active uniformly at random, from S

Distinguishing Influence: Shuffle Test Influence: Correlation:

Distinguishing Influence: Edge Reversal Correlation: Influence:

Real Data: the Flickr Dataset - analyzed 800K users over 16 months - about 340K exhibited tagging behavior - size of giant component: 160K - 2.8M directed edges, 28.5% not mutual - analyzed 1,700 tags independently - various types (event, color, object, etc)‏ - various numbers of users - various growth patterns (bursty, smooth, periodic)‏

Distinguishing Influence in Flickr Shuffle test

Distinguishing Influence in Flickr Edge reversal test

Some Influence - can discover traces of influence by looking at similar tags

Some Influence - can discover traces of influence by looking at similar tags - for the tag "graffiti", the difference between αs was 0 - however, for the misspelling "grafitti", difference was slightly larger - with even less common misspelling "graffitti", difference increased even more

Conclusions - distinguishing between correlation and causation is difficult

Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle)‏

Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle)‏ - knowing of asymmetric social ties is also useful (edge-reversal)‏

Further research directions - formal verification of results? (controlled experiments)‏ - quantification of the strength of influence? - identify which nodes influence others - what if social ties are symmetric? - distinguishing between other forms of correlation - distinguishing between different forms of social influence

Questions?