Discovering Leaders from Community Actions

Slides:



Advertisements
Similar presentations
Amit Goyal Laks V. S. Lakshmanan RecMax: Exploiting Recommender Systems for Fun and Profit University of British Columbia
Advertisements

Viral Marketing – Learning Influence Probabilities.
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Spread of Influence through a Social Network Adapted from :
Maintaining Sliding Widow Skylines on Data Streams.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Discovering Leaders from Community Actions Amit Goyal 1 Francesco Bonchi 2 Laks V.S. Lakshmanan 1 Oct 27,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Discovering Leaders from Community Actions Presenter : Wu, Jia-Hao Authors : Amit Goyal, Francesco Bonchi,
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Mining Association Rules
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Maximizing Product Adoption in Social Networks
Models of Influence in Online Social Networks
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
+ Offline Optimal Ads Allocation in SNS Advertising Hui Miao, Peixin Gao.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Information Spread and Information Maximization in Social Networks Xie Yiran 5.28.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
School of Computer Science Carnegie Mellon University 1 The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Inferring Networks of Diffusion and Influence
Cohesive Subgraph Computation over Large Graphs
Wenyu Zhang From Social Network Group
Nanyang Technological University
Finding Dense and Connected Subgraphs in Dual Networks
Data Mining Find information from data data ? information.
A Signal Processing Approach to Vibration Control and Analysis with Applications in Financial Modeling By Danny Kovach.
The Stream Model Sliding Windows Counting 1’s
CS b659: Intelligent Robotics
E-Commerce Theories & Practices
Query in Streaming Environment
Intelligent Information System Lab
Approximate Lineage for Probabilistic Databases
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Learning Influence Probabilities In Social Networks
CARPENTER Find Closed Patterns in Long Biological Datasets
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Mining Frequent Itemsets over Uncertain Databases
Association Rule Mining
Effective Social Network Quarantine with Minimal Isolation Costs
Dataflow analysis.
Discriminative Frequent Pattern Analysis for Effective Classification
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Cost-effective Outbreak Detection in Networks
Example: Academic Search
Introduction to Stream Computing and Reservoir Sampling
Pei Lee, ICDE 2014, Chicago, IL, USA
Discovering Influential Nodes From Social Trust Network
Analysis of Large Graphs: Overlapping Communities
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Discovering Leaders from Community Actions Amit Goyal1 Francesco Bonchi2 Laks V.S. Lakshmanan1 Oct 27, 2008 Hi! Good afternoon everyone. I am Amit Goyal, a Masters student in UBC. The topic of my talk is “Dis..”. This is a joint work with my supervisor Prof. laks and Francesco, who is working in Yahoo! Research, Barcelona. 1 2

Context & Motivations: Viral Marketing Lets first talk about the context and motivations, which is Viral Marketing.

Word of Mouth and Viral Marketing We are more influenced by our friends than strangers 68% of consumers consult friends and family before purchasing home electronics (Burke 2003) We are .. than strangers because we trust our friends. According to a study by Burke, …. This idea has often been exploited in advertising domains and this kind of advertising is known as viral marketing. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Viral Marketing Also known as Target Advertising Exponential Growth by Spreading the Word Low investments, maximum gain It is also known as Target advertising. By identifying and targeting influential users in a community, it is possible to spread the desire to perform certain actions through initiating a chain reaction by word of mouth effect This form of advertising is one of the oldest and most effective form of advertising, Often companies use it to achieve maximum gains through low investments. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Viral Marketing as an Optimization Problem Given: Network with influence probabilities Problem: Select top-k leaders such that by targeting them, the spread of influence is maximized Domingos et al 2001, Richardson et al 2002, Kempe et al 2003 How to calculate true influence probabilities? One of the attractive ways to study it is as an optimization problem. The problem is defined as follows. Input is a network with influence probabilities, e.g. …. and the problem ask to select top-k leaders such that by targ… Some of the classical papers which talk about it are .. But they an open the issue of calculating influence prob? In real world datasets, these probabilities are not available. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

A pattern mining approach We propose a completely different approach based on frequent pattern mining. We focus on the actions performed by users: Joining a community (as in flickr/facebook community) Rating a song, a movie (as in Y! Music, Y! Movie) Importance of time in which actions are performed Assumption: Users can see their friends’ actions We propose a completely different approach based of frequent pattern mining. As most of you know, frequent pattern mining is a method for discovering interesting relations between items of transaction datasets. We focus on the actions performed by users .. which can be .. Why we focus on actions .. because this is what is available in real world datasets. We give importance to time at which actions are performed and influence propagates. It makes sense to consider time as one of the parameter as advertisers often want to estimate the time in which they get returns. Our assumption is … feeds Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Our Contributions Formally define the notion of leaders and its various flavors Efficient algorithms for extracting these leaders Demonstrate the utility and scalability of our algorithms, via an extensive set of experiments on a real world dataset Yahoo! Messenger (social graph) Yahoo! Movies rating (actions log) Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Rest of the talk Framework definition: Algorithms Experiments Influence propagation on the social network Various notions of leaders Algorithms Experiments Related Work Conclusion Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Framework Definition

Input Data (1) A social network, i.e., an undirected graph G=(V,E) where nodes are users and edges represent social ties. Users declare their friends. e.g. Facebook, Yahoo! Messenger etc We consider two types of input data, first is a social network .. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Input Data (2) An actions log sorted in chronological order, i.e., a relation Actions(User, Action, Time) Example: Jack joined Yoga community at time 5 Assumption: Users can see their friends actions (feeds) Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Action Propagation Jack Jill Jack and Jill are friends 3 time units Jill Jack and Jill are friends Jack and Mary are friends Action is “Joining the Yoga community” Joined Yoga Community at time 8 Joined Yoga Community at time 5 995 time units Mary Joined Yoga Community at time 1000 Action Propagated from Jack to Jill Action propagated from Jack to Mary Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Propagation Graph Jack Jill Ben Joey Mary Joined Yoga Community at time 8 Joined Yoga Community at time 5 Ben Joined Yoga Community at time 15 Joey Mary Joined Yoga Community at time 12 Joined Yoga Community at time 1000 Can we say Mary got influenced by Jack?? NO Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

User Influence Graph When an action propagates from user u to user v, we may think of v being influenced by u Influence should decay in time Size of influence graph << Size of PG Propagation Graph User Influence Graph for Jack Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Leaders – first definition Who should be a leader? For an action, should influence sufficiently large number of users ( >ψ ) For an action, should influence these users in a reasonable amount of time ( <π ) Should act as a leader in sufficiently large number of actions ( >σ ) Jack Jill Joey Joined Yoga Community at time 5 Community at time 8 Community at time 1000 Mary Ben Community at time 12 Community at time 15 Jack Jill Joey Joined Yoga Community at time 5 Community at time 8 Community at time 1000 Mary Ben Community at time 12 Community at time 15 Jack Jill Joey Joined Yoga Community at time 5 Community at time 8 Community at time 1000 Mary Ben Community at time 12 Community at time 15 Jack Jill Joey Joined Yoga Community at time 5 Community at time 8 Community at time 1000 Mary Ben Community at time 12 Community at time 15 3 3 If ψ= 2, π = 15, σ = 1 then, both Jack and Jill are leaders 7 7 7 7 4 3 995 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Tribe Leader A leader may influence different users for different actions What if a leader lead a fixed set of users for different actions? We call these leaders as Tribe Leaders Can be considered as small communities jack A2 A3 A1 A1, A2 and A3 are 3 different actions Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Additional Constraint: Genuineness It may happen that one user acts as a leader but in concrete he is always a follower of the other leaders We want to avoid this kind of fake leaders. gen(Jill) = 1/3 Another constraint: confidence Jack Tom A1 A2 Jill A1 A3 A2 A1, A2 and A3 are 3 different actions Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Algorithms but how will I discover the leaders??

Algorithms: Overview Assumptions: Social graph is huge – millions of nodes Actions log is huge – millions of tuples For an action, size of user Influence Graph << size of Propagation Graph for all users Our algorithms are able to extract the patterns (leaders and tribe leaders) in no more than one scan of the action log table. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Algorithms: Overview Compute leaders from IM Scan the action log table by means of a window of sizeπbackward in time, i.e., starting from the most recent timestamp (bottom of the table if we assume tuples to be ordered by time). Efficiently compute the influence matrix, i.e., a matrix Users x Actions IMπ(u, a) represents number of users, influenced by u w.r.t. action a within timeπ Compute leaders from IM IM10(Jack, “joining yoga community”) = 3 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Computing Influence Matrix (1) We use a bit vector to track which users are influenced by a given user. Locking mechanism using another bit vector 0 => free bit; 1 => occupied bit Node to bit index mapping stored in a queue Bits must be dynamically allocated. (V,2) (W,1) (T,4) (S,6) (R,0) Queue Head R Node InfVec R 01010111 S 01000110 T 00010110 W 00000110 V 00000100 Time window on propagation graph S T To compute IM, we use influence bit vectors which keep track of the info about which nodes influenced what nodes in the current time window. As the time window moves, some nodes will drop off the window and some nodes will appear. Hence, the influence bit vectors should be computed incrementally. To do that, we make use of two operations called update and propagate which we will look into next slides. As nodes enter and drop out of time window, we need to manage which nodes are represented by which bits of the influence bit vector. To do that, we make use of locking mechanism through lock bit vector which serves as an index. In a lock bit vector, 0 means free bit and 1 means the lock has been been occupied by some node. W V 01010111 Lock bit Vector Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Computing Influence Matrix (2) Slide up the current window – delete node V Delete the entry from queue Update the lock Update influence vectors (V,2) (W,1) (T,4) (S,6) (R,0) (V,2) (W,1) (T,4) (S,6) (R,0) Queue Head R Node InfVec R 01010111 S 01000110 T 00010110 W 00000110 V 00000100 Node InfVec R 01010011 S 01000010 T 00010010 W 00000010 V 00000100 Time window on propagation graph S T W 01010111 Lock bit Vector 01010011 Lock bit Vector V Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Computing Influence Matrix (3) New node P added Issue a lock, add entry to the queue Compute its Influence Vector by propagation Number of followers of P = 4 IM(P,a) = 4 (W,1) (T,4) (S,6) (R,0) (W,1) (T,4) (S,6) (R,0) (P,2) Queue Head P Node InfVec P 01010111 R 01010011 S 01000010 T 00010010 W 00000010 Time window on propagation graph Node InfVec R 01010011 S 01000010 T 00010010 W 00000010 R S T 01010011 Lock bit Vector W 01010111 Lock bit Vector Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Mining Tribe Leaders Influence Matrix not enough We use influence cube: Users x Actions x Users ICπ(u,a,v) = 1, when user v is influenced by user u for action a within time π We do not explicitly compute the whole cube due to sparsity. Problem same as discovering existence of frequent itemsets of size larger than a given threshold Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Algorithms - Final Comments The only truly mandatory threshold is π(time threshold) Influence Matrix: O(TAn2) in bit level operations T = total number of tuples in action log A = total number of distinct actions n = maximum number of nodes visible in any position of the time window n << N, where N is the total number of users Tribe Leaders: Influence Cube: O(TAn2) Finding existence of frequent itemsets: exponential in number of followers But very fast due to optimizations (Bonchi 2003) Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Experiments enough talking, show me the results dude!!

Data Preparation Data Social graph: Yahoo! Instant Messenger Actions log: Yahoo! Movies Action = user u rated movie m at time t joined through common users identifiers Started from Yahoo! Instant Messenger subgraph of “most active” users (110M nodes) and 21M ratings from Yahoo! Movies. Ended with 217.5K nodes, 221.4K edges and 1.8M ratings. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Data characteristics: connected components Total 46,650 connected components Giant component 94K Users (43.2% of connected users) Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Leaders Vs. Tribe leaders π – threshold on time σ – threshold on number of actions ψ – threshold on number of influenced users Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Number of leaders found π – threshold on time σ – threshold on number of actions ψ – threshold on number of influenced users Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Run-time π – threshold on time σ – threshold on number of actions ψ – threshold on number of influenced users Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Genuineness: an almost binary concept! Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Top-10 tribe leaders w.r.t. tribe size Tribe leaders exhibit high confidence. Tribe leaders with low genuineness were found dominated by other tribe leaders present in the tables. We found many users acting as leader in many actions but not being a tribe leader. Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Related Work (1) Identifying influential users Domingos et al 2001, Richardson et al 2002, Kempe et al 2005 Identifying influential bloggers Agarwal et al 2008 Identifying communities in Social Networks Hoproft et al 2003, Kumar et al 2006, Backstrom et al 2006, Tantipathananadh et al 2007, Huang et al 2008, Friedland at el 2007 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Related Work (2) Influence and Correlation in Social Networks Aris Anagnostopoulos et al 2008 Revenue maximization Hartline et al 2008 Near optimal sensor placement for outbreak detection Leskovec et al 2007 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Conclusions Various notions of leader, tribe leader Proposed framework based on frequent pattern mining for discovering leaders in social networks Formally define the problem of extracting leaders from social graph and actions log. Various notions of leader, tribe leader Their confidence and genuine variants Efficient algorithms for extracting leaders of various flavors Just one pass over the actions log table Demonstrate the utility and scalability of our algorithms, via an extensive set of experiments on a real world dataset Yahoo! Messenger (social graph) Yahoo! Movies rating (actions log) Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Ongoing/Future Work Gurumine: Pattern Mining System for Discovering Leaders and Tribes (Demo paper to appear in ICDE 2009) Leadership Cube: What kind of leaders attract what kind of followers for what kind of actions? Viral Marketing Stronger notions of influence? Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Thanks! 3 1 4 13 2 7 5 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Backup

Number of leaders found π – threshold on time σ – threshold on number of actions ψ – threshold on number of influenced users Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/

Additional constraint: confidence Similarly to association rules, we can have a confidence measure for leaders. Leadership confidence = # actions in which is a leader / # actions performed Example: Lets say Jack performed 10 actions out of which in 7 actions, he acted as a leader (i.e. more than ψ users followed in short time), then conf(Jack) = 7/10 Amit Goyal (University of British Columbia) http://cs.ubc.ca/~goyal/