Influence and Homophily Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Influence and Homophily Social Forces Social Forces connect.

Slides:



Advertisements
Similar presentations
How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
Advertisements

Brief introduction on Logistic Regression
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Spread of Influence through a Social Network Adapted from :
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Correlation and regression Dr. Ghada Abo-Zaid
Integration of sensory modalities
Nodes, Ties and Influence
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Models with Discrete Dependent Variables
Multiple Linear Regression Model
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Chapter 4 Multiple Regression.
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
Lecture 24: Thurs., April 8th
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Behavior Analytics Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Behavior Analytics Examples of Behavior Analytics What motivates.
Models of Influence in Online Social Networks
SIMPLE LINEAR REGRESSION
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
Simple Linear Regression
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
COMM 250 Agenda - Week 12 Housekeeping RP2 Due Wed. RAT 5 – Wed. (FBK 12, 13) Lecture Experiments Descriptive and Inferential Statistics.
1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Examining Relationships in Quantitative Research
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Networks and Surrounding Contexts Chapter 4, from D. Easley and J. Kleinberg book.
Chapter 13 Multiple Regression
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
Slides are modified from Lada Adamic
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
CS 590 Term Project Epidemic model on Facebook
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
Nonparametric Statistics
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating standard error using bootstrap
Nonparametric Statistics
CORRELATION.
Correlation – Regression
Lecture 9 Measures and Metrics.
Lecture 10 Measures and Metrics.
Nonparametric Statistics
Simple Linear Regression
SIMPLE LINEAR REGRESSION
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
Presentation transcript:

Influence and Homophily Social Media Mining

2 Measures and Metrics 2 Social Media Mining Influence and Homophily Social Forces Social Forces connect individuals in different ways Among connected individuals, one often observes high social similarity or assortativity – In networks with assortativity, similar nodes are connected to one another more often than dissimilar nodes. – In social networks, a high similarity between friends is observed – This similarity is exhibited by similar behavior, similar interests, similar activities, and shared attributes such as language, among others. Friendship networks are examples of assortative networks

3 Social Media Mining Measures and Metrics 3 Social Media Mining Influence and Homophily Why connected people are similar? Influence Influence is the process by which an individual (the influential) affects another individual such that the influenced individual becomes more similar to the influential figure. If most of one’s friends switch to a mobile company, he might be influenced by his friends and switch to the company as well. Homophily – It is realized when similar individuals become friends due to their high similarity. Two musicians are more likely to become friends. Confounding – Confounding is environment’s effect on making individuals similar Two individuals living in the same city are more likely to become friends than two random individuals

4 Social Media Mining Measures and Metrics 4 Social Media Mining Influence and Homophily Influence, Homophily, and Confounding

5 Social Media Mining Measures and Metrics 5 Social Media Mining Influence and Homophily Source of Assortativity in Networks Both influence and Homophily generate similarity in social networks but in different ways Homophily selects similar nodes and links them together Influence makes the connected nodes similar to each other

6 Social Media Mining Measures and Metrics 6 Social Media Mining Influence and Homophily Assortativity: An Example The city's draft tobacco control strategy says more than 60% of under-16s in Plymouth smoke regularly

7 Social Media Mining Measures and Metrics 7 Social Media Mining Influence and Homophily Smoking Behavior In a Group of Friends: why is happening? Smoker friends influence their non-smoker friends Smokers become friends There are lots of places that people can smoke Influence Homophily Confounding

8 Social Media Mining Measures and Metrics 8 Social Media Mining Influence and Homophily Our goal in this chapter? How can we measure assortativity? How can we measure influence or homophily? How can we model influence or homophily? How can we distinguish the two?

9 Social Media Mining Measures and Metrics 9 Social Media Mining Influence and Homophily Measuring Assortativity

10 Social Media Mining Measures and Metrics 10 Social Media Mining Influence and Homophily Assortativity: An Example The friendship network in a high school in the US in 1994 Colors in represent races, – Whites are white, – Blacks are grey – Hispanics are light grey – Others are black There is a high assortativity between individuals of the same race

11 Social Media Mining Measures and Metrics 11 Social Media Mining Influence and Homophily Measuring Assortativity for Nominal Attributes Where nominal attributes are assigned to nodes (race), we can use edges that are between nodes of the same type (i.e., attribute value) to measure assortativity of the network – Node attributes could be nationality, race, sex, etc. Kronecker delta function t(v i ) denotes type of vertex v i

12 Social Media Mining Measures and Metrics 12 Social Media Mining Influence and Homophily Assortativity Significance Assortativity significance measures the difference between the measured assortativity and its expected assortativity – The higher this value, the more significant the assortativity observed Example – Consider a school where half the population is white and half the population is Hispanic. It is expected for 50% of the connections to be between members of different races. If all connections in this school were between members of different races, then we have a significant finding

13 Social Media Mining Measures and Metrics 13 Social Media Mining Influence and Homophily Assortativity Significance: Measuring The expected assortativity in the whole graph Assortativity This measure is called modularity

14 Social Media Mining Measures and Metrics 14 Social Media Mining Influence and Homophily Normalized Modularity The maximum happens when all vertices of the same type are connected to one another

15 Social Media Mining Measures and Metrics 15 Social Media Mining Influence and Homophily Modularity: Matrix Form Let denote the indicator matrix and let k denote the number of types The Kronecker delta function can be reformulated using the indicator matrix Therefore,

16 Social Media Mining Measures and Metrics 16 Social Media Mining Influence and Homophily Normalized Modularity: Matrix Form Let Modularity matrix be: Is the degree vector Then, modularity can be reformulated as

17 Social Media Mining Measures and Metrics 17 Social Media Mining Influence and Homophily Modularity Example the number of edges between nodes of the same color is less than the expected number of edges between them the number of edges between nodes of the same color is less than the expected number of edges between them

18 Social Media Mining Measures and Metrics 18 Social Media Mining Influence and Homophily Measuring Assortativity for Ordinal Attributes A common measure for analyzing the relationship between ordinal values is covariance. It describes how two variables change together. In our case we are interested in how values of nodes that are connected via edges are correlated.

19 Social Media Mining Measures and Metrics 19 Social Media Mining Influence and Homophily Covariance Variables We construct two variables X L and X R, where for any edge (v i ; v j ) we assume that x i is observed from variable X L and x j is observed from variable X R. In other words, X L represents the ordinal values associated with the left node of the edges and X R represents the values associated with the right node of the edges Our problem is therefore reduced to computing the covariance between variables X L and X R

20 Social Media Mining Measures and Metrics 20 Social Media Mining Influence and Homophily Covariance Variables: Example X L : (18, 21, 21, 20) X R : (21, 18, 20, 21)  List of edges: ((A, C), (C, A), (C, B), (B, C))

21 Social Media Mining Measures and Metrics 21 Social Media Mining Influence and Homophily Covariance For two given column variables X L and X R the covariance is E(X L ) is the mean of the variable and E(X L X R ) is the mean of the multiplication

22 Social Media Mining Measures and Metrics 22 Social Media Mining Influence and Homophily Covariance

23 Social Media Mining Measures and Metrics 23 Social Media Mining Influence and Homophily Normalizing Covariance Pearson correlation P(X,Y) is the normalized version of covariance In our case:

24 Social Media Mining Measures and Metrics 24 Social Media Mining Influence and Homophily Correlation Example

25 Social Media Mining Measures and Metrics 25 Social Media Mining Influence and Homophily Measuring Influence Modeling Influence Social Influence

26 Social Media Mining Measures and Metrics 26 Social Media Mining Influence and Homophily Social Influence: Definition the act or power of producing an effect without apparent exertion of force or direct exercise of command

27 Social Media Mining Measures and Metrics 27 Social Media Mining Influence and Homophily Measuring the Influence

28 Social Media Mining Measures and Metrics 28 Social Media Mining Influence and Homophily Measuring Influence Measuring influence is assigning a number to each node that represents the influential power of that node The influence can be measured either based on prediction or observation

29 Social Media Mining Measures and Metrics 29 Social Media Mining Influence and Homophily Prediction-based Measurement In prediction-based measurement, we assume that an individual’s attribute or the way she is situated in the network predicts how influential she will be. For instance, we can assume that the gregariousness (e.g., number of friends) of an individual is correlated with how influential she will be. Therefore, it is natural to use any of the centrality measures discussed in Chapter 3 for prediction-based influence measurements. An example: – On Twitter, in-degree (number of followers) is a benchmark for measuring influence commonly used

30 Social Media Mining Measures and Metrics 30 Social Media Mining Influence and Homophily Observation-based Measurement In observation-based we quantify influence of an individual by measuring the amount of influence attributed to the individual – When an individual is the role model Influence measure: size of the audience that has been influenced – When an individual spreads information: Influence measure: the size of the cascade, the population affected, the rate at which the population gets influenced – When an individual increases values: Influence measure: the increase (or rate of increase) in the value of an item or action

31 Social Media Mining Measures and Metrics 31 Social Media Mining Influence and Homophily Measuring Social Influence on Blogosphere Measuring Social Influence on Twitter Case Studies for Measuring Influence in Social Media

32 Social Media Mining Measures and Metrics 32 Social Media Mining Influence and Homophily Measuring Social Influence on Blogosphere The goal of measuring influence in blogosphere is to figure out most influential bloggers on the blogosphere Due to limited time an individual has, following the influentials is often a good heuristic of filtering what’s uninteresting One common measure for quantifying influence of bloggers is to use indegree centrality Due to the sparsity of in-links, more detailed analysis is required to measure influence in blogosphere

33 Social Media Mining Measures and Metrics 33 Social Media Mining Influence and Homophily iFinder: A System to measure influence on blogsphore

34 Social Media Mining Measures and Metrics 34 Social Media Mining Influence and Homophily Social Gestures Recognition – Recognition for a blogpost is the number of the links that point to the blogpost (in-links). Let I p denotes the set of in-links that point to blogpost p. Activity Generation – Activity generated by a blogpost is the number of comments that p receives. c p denotes the number of comments that blogpost p receives. Novelty – The blogpost’s novelty is inversely correlated with the number of references a blogpost employs. In particular the more citations a blogpost has it is considered less novel. O p denotes the set of out-links for blogpost p. Eloquence – Eloquence is estimated by the length of the blogpost. Given the unformal nature of blogs and the bloggers tendency to write short blogposts, longer blogposts are believed to be more eloquent. So the length of a blogpost l p can be employed as a measure of eloquence

35 Social Media Mining Measures and Metrics 35 Social Media Mining Influence and Homophily Influence Flow I(.) denotes the influence a blogpost and w in and w out are the weights that adjust the contribution of in- and out-links, respectively Influence flow describes a measure that accounts for in- links (recognition) and out-links (novelty). p m is the number of blogposts that point to blog post p and p n is the number of blog posts referred to in p

36 Social Media Mining Measures and Metrics 36 Social Media Mining Influence and Homophily Blogpost Influence w length is the weight for the length of the blogpost. w comment describes how the number of comments is weighted in the influence computation Weights w in, w out, w comments, and w length can be tuned to make the model suitable for different domains

37 Social Media Mining Measures and Metrics 37 Social Media Mining Influence and Homophily Measuring Social Influence on Twitter In Twitter, users have an option of following individuals, which allows users to receive tweets from the person being followed Intuitively, one can think of the number of followers as a measure of influence (in-degree centrality)

38 Social Media Mining Measures and Metrics 38 Social Media Mining Influence and Homophily Measuring Social Influence on Twitter: Measures Indegree – The number of users following a person on Twitter – Indegree denotes the “audience size” of an individual. Number of Mentions – The number of times an individual is mentioned in a tweet, by in a tweet. – The number of mentions suggests the “ability in engaging others in conversation” Number of Retweets: – Tweeter users have the opportunity to forward tweets to a broader audience via the retweet capability. – The number of retweets indicates individual’s ability in generating content that is worth being passed on.

39 Social Media Mining Measures and Metrics 39 Social Media Mining Influence and Homophily Measuring Social Influence on Twitter: Measures Each one of these measures by itself can be used to identify influential users in Twitter. This can be performed by utilizing the measure for each individual and then ranking individuals based on their measured influence value. Contrary to public belief, number of followers is considered an inaccurate measure compared to the other two. We can rank individuals on twitter independently based on these three measures. To see if they are correlated or redundant, we can compare ranks of an individuals across three measures using rank correlation measures.

40 Social Media Mining Measures and Metrics 40 Social Media Mining Influence and Homophily Comparing Ranks Across Three Measures In order to compare ranks across more than one measure (say, indegree and mentions), we can use Spearman’s Rank Correlation Coefficient

41 Social Media Mining Measures and Metrics 41 Social Media Mining Influence and Homophily Spearman’s rank correlation is the Pearsons correlation coefficient for ordinal variables that represent ranks (i.e., takes values between 1... n); hence, the value is in range [-1,1]. Popular users (users with high in-degree) do not necessarily have high ranks in terms of number of retweets or mentions.

42 Social Media Mining Measures and Metrics 42 Social Media Mining Influence and Homophily Influence Modeling

43 Social Media Mining Measures and Metrics 43 Social Media Mining Influence and Homophily Influence Modeling At time stamp t 1, node v is activated and node u is not activated Node u becomes activated at time stamp t 2, as the effect of the influence Each node is started as active or inactive; A node, once activated, will activate its neighboring nodes Once a node is activated, this node cannot be deactivated

44 Social Media Mining Measures and Metrics 44 Social Media Mining Influence and Homophily Influence Modeling: Assumptions In general we can assume that the influence process takes place in a network of connected individuals. Sometimes this network is observable (an explicit network) and sometimes not (an implicit network). – In the observable case, we can resort to threshold models such as the linear threshold model – In the case of implicit networks, we can employ methods such as the Linear Influence Model (LIM) that take the number of individuals who get influenced at different times as input, e.g., the number of buyers per week

45 Social Media Mining Measures and Metrics 45 Social Media Mining Influence and Homophily Threshold Models Threshold models are simple, yet effective methods for modeling influence in explicit networks In threshold model actors make decision based on the number or the fraction (the threshold) of their neighborhood that have already decided to make the same decision Using a threshold model, Schelling demonstrated that minor preferences in having neighbors of the same color leads to complete racial segregation –

46 Social Media Mining Measures and Metrics 46 Social Media Mining Influence and Homophily Linear Threshold Model (LTM) An actor would take an action if the number of his friends who have taken the action exceeds (reach) a certain threshold Each node i chooses a threshold i randomly from a uniform distribution in an interval between 0 and 1. In each discrete step, all nodes that were active in the previous step remain active The nodes satisfying the following condition will be activated

47 Social Media Mining Measures and Metrics 47 Social Media Mining Influence and Homophily Linear Threshold Model- An Example (Threshold are on top of nodes)

48 Social Media Mining Measures and Metrics 48 Social Media Mining Influence and Homophily Influence in Implicit Networks An implicit network is one where the influence spreads over nodes in the network Unlike the threshold model, one cannot observe individuals who are responsible for influencing others (the influentials), but only those who get influenced The information available is: – The set of influenced individuals at any time, P(t) – Time t u, where each individual u gets initally influenced (activated)

49 Social Media Mining Measures and Metrics 49 Social Media Mining Influence and Homophily Influence in Implicit Networks Assume that any influenced individual u can influence I(u, t) non-influenced individuals at time t. Assuming discrete timesteps, we can formulate the size of influence population as

50 Social Media Mining Measures and Metrics 50 Social Media Mining Influence and Homophily The Size of the Influenced Population The size of the influenced population as a summation of individuals influenced by activated individuals Individuals u, v, and w are activated at time steps t u, t v, and t w, respectively At time t, the total number of influenced individuals is the summation of influence functions I u, I v, and I w at time steps t - t u, t - t v, and t - t w, respectively

51 Social Media Mining Measures and Metrics 51 Social Media Mining Influence and Homophily Size of the Activated Nodes The goal is to estimate I(.,.) given activation time and the number of influenced individuals at any time Parametric estimation Non-Parametric estimation

52 Social Media Mining Measures and Metrics 52 Social Media Mining Influence and Homophily Parametric Estimation Use some distribution to estimate I function. Assume that all users influence others in the same parametric form For instance, one can use the powerlaw distribution to estimate influence: Here we need to estimate the coefficients

53 Social Media Mining Measures and Metrics 53 Social Media Mining Influence and Homophily Non-Parametric Estimation Assume that nodes can get deactivated over time and can no longer influence others. – Let A(u, t) = 1 denote node u is active at time t – A(u, t) = 0 denotes that u is either deactivated or still not influenced, – |V| is the total size of population and T is the last time stamp This can be solved using non-negative least square methods.

54 Social Media Mining Measures and Metrics 54 Social Media Mining Influence and Homophily “Birds of a feather flock together” Homophily

55 Social Media Mining Measures and Metrics 55 Social Media Mining Influence and Homophily Homophily- Definition Homophily (i.e., "love of the same") is the tendency of individuals to associate and bond with similar others People interact more often with people who are “like them” than with people who are dissimilar What leads to Homophily? Race and ethnicity, Sex and Gender, Age, Religion, Education, Occupation and social class, Network positions, Behavior, Attitudes, Abilities, Believes, and Aspirations

56 Social Media Mining Measures and Metrics 56 Social Media Mining Influence and Homophily Measuring Homophily: Idea To measure homophily, one can measure how the assortativity of the network changes over time – Consider two snapshots of a network Gt(V, E) and Gt’ (V, E’) at times t and t’, respectively, where t’ > t – Assume that the number of nodes stay fixed and edges connecting them are added or removed over time.

57 Social Media Mining Measures and Metrics 57 Social Media Mining Influence and Homophily Measuring Homophily For nominal attributes, the homophily index is defined as For ordinal attributes, the homophily index can be defined as the change in Pearson correlation

58 Social Media Mining Measures and Metrics 58 Social Media Mining Influence and Homophily Modeling Homophily We model homophily using a variation of Independent Cascade Model At each time step, a single node gets activated. – A node once activated will remain activated. P v,w in the ICM model is replaced with the similarity between nodes v and w, sim(v,w). When a node v is activated, we generate a random tolerance value θ v for the node, between 0 and 1. – The tolerance value defines the minimum similarity, node v tolerates or requires for being connected to other nodes. Then for any edge (v, u) that is still not in the edge set, if the similarity sim(v, w) > θ v, the edge (v, w) is added. This continues until all vertices are visited.

59 Social Media Mining Measures and Metrics 59 Social Media Mining Influence and Homophily Distinguishing influence and Homophily

60 Social Media Mining Measures and Metrics 60 Social Media Mining Influence and Homophily Distinguishing Influence and Homophily We are often interested in understanding which social force (influence or homophily) resulted in an assortative network. To distinguish between an influence-based assortativity or homophily-based one, statistical tests can be used Note that in all these tests, we assume that several temporal snapshots of the dataset are available (like the LIM model) where we know exactly, when each node is activated, when edges are formed, or when attributes are changed

61 Social Media Mining Measures and Metrics 61 Social Media Mining Influence and Homophily Shuffle Test IDEA: The basic idea behind the shuffle test comes from the fact that influence is temporal. In other words, when u influences v, then v should have been activated after u. So, in shuffle test, we define a temporal assortativity measure. We assume that if there is no influence, then a shuffling of the activation timestamps should not affect the temporal assortativity measurement. – a is the number of active friends, – α the social correlation coefficient and β a constant to explain the innate bias for activation

62 Social Media Mining Measures and Metrics 62 Social Media Mining Influence and Homophily Shuffle Test Assume the probability of activation of node v depends on a, the number of already-active friends of v. We assume that his probability can be estimated using a logistic function – a is the number of active friends, – α the social correlation coefficient (variable) and β (variable) a constant to explain the innate bias for activation

63 Social Media Mining Measures and Metrics 63 Social Media Mining Influence and Homophily Activation Likelihood Suppose at one time point t, y a,t users with a active friends become active, and n a,t users who also have a active friends yet stay inactive at time t. The likelihood function is Given the user’s activity log, we can compute a correlation coefficient α and bias β to maximize the above likelihood (optional: using a maximum likelihood iterative method).

64 Social Media Mining Measures and Metrics 64 Social Media Mining Influence and Homophily Shuffle Test The key idea of the shuffle test is that if influence does not play a role, the timing of activations should be independent of users. Thus, even if we randomly shuffle the timestamps of user activities, we should obtain a similar α value. Test of Influence: After we shuffle the timestamps of user activities, if the new estimate of social correlation is significantly different from the estimate based on the user’s activity log, there is evidence of influence. Test of Influence: After we shuffle the timestamps of user activities, if the new estimate of social correlation is significantly different from the estimate based on the user’s activity log, there is evidence of influence. UserABC Time123 UserABC Time231

65 Social Media Mining Measures and Metrics 65 Social Media Mining Influence and Homophily The Edge-reversal Test If influence resulted in activation, then the direction of edges should be important (who influenced whom). Reverse directions of all the edges Run the same logistic regression on the data using the new graph If correlation is not due to influence, then α should not change A B C A B C