Machine learning for Dynamic Social Network Analysis

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Spread of Influence through a Social Network Adapted from :
Sharing the Cost of Multicast Transmissions J. Feigenbaum, C. Papadimitriou, S. Shenker Hong Zhang, CIS620, 4/24.
Seminar In Game Theory Algorithms, TAU, Agenda  Introduction  Computational Complexity  Incentive Compatible Mechanism  LP Relaxation & Walrasian.
Maximizing the Spread of Influence through a Social Network
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Visual Recognition Tutorial
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Network Bandwidth Allocation (and Stability) In Three Acts.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Diffusion in Social and Information Networks Part II W ORLD W IDE W EB 2015, F LORENCE MPI for Software SystemsGeorgia Institute of Technology Le Song.
MAKING COMPLEX DEClSlONS
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Inventory Control with Time-Varying Demand. 2  Week 1Introduction to Production Planning and Inventory Control  Week 2Inventory Control – Deterministic.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Inferring Networks of Diffusion and Influence
P & NP.
Nanyang Technological University
Machine learning for Dynamic Social Network Analysis
Independent Cascade Model and Linear Threshold Model
Manuel Gomez Rodriguez
Learning Deep Generative Models by Ruslan Salakhutdinov
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Machine learning for Dynamic Social Network Analysis
Variational filtering in generated coordinates of motion
Non-additive Security Games
A Study of Group-Tree Matching in Large Scale Group Communications
Moran Feldman The Open University of Israel
Manuel Gomez Rodriguez
Generalization and adaptivity in stochastic convex optimization
Reinforcement Learning
Morgan Bruns1, Chris Paredis1, and Scott Ferson2
"Playing Atari with deep reinforcement learning."
Independent Cascade Model and Linear Threshold Model
Exploiting Graphical Structure in Decision-Making
Hidden Markov Models Part 2: Algorithms
Announcements Homework 3 due today (grace period through Friday)
Collaborative Filtering Matrix Factorization Approach
Discovering Functional Communities in Social Media
CSCI B609: “Foundations of Data Science”
Summarizing Data by Statistics
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Cost-effective Outbreak Detection in Networks
CS 188: Artificial Intelligence Spring 2006
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Major Design Strategies
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Independent Cascade Model and Linear Threshold Model
Human-centered Machine Learning
Human-centered Machine Learning
Human-centered Machine Learning
Human-centered Machine Learning
Submodular Maximization with Cardinality Constraints
Human-centered Machine Learning
Markov Decision Processes
Major Design Strategies
Markov Decision Processes
Reinforcement Learning
Human-centered Machine Learning
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Machine learning for Dynamic Social Network Analysis Applications: Control Manuel Gomez Rodriguez Max Planck Institute for Software Systems University of Sydney, January 2017

Outline of the Seminar Representation: Temporal Point processes 1. Intensity function 2. Basic building blocks 3. Superposition 4. Marks and SDEs with jumps Applications: Models 1. Information propagation 2. Opinion dynamics 3. Information reliability 4. Knowledge acquisition Outline of tutorial Applications: Control 1. Influence maximization 2. Activity shaping 3. When-to-post Slides/references: learning.mpi-sws.org/sydney-seminar

Applications: Control 1. Influence maximization 2. Activity shaping 3. When-to-post Outline of tutorial

Influence maximization Can we find the most influential group of users? Why this goal?

Influence of a set of sources Influence of a set of source nodes A: Probability that a node n follows up Average number of nodes who follow-up by time T tn It depends on the nodes intensity tA = 0 Source tA = 0 (e.g., terminating intensity; idea adoption) Node n

Influence estimation: exact vs. approx. Exact Follow-up Probability Approximate Influence Estimation Can be exponential in network size, not scalable! tn tA = 0 tA = 0 Source Key idea Neighborhood Estimation Node n

How good are approximate methods? Not only theoretical guarantees, but they also work well in practice. [Du et al., NIPS 2013]

Maximizing the influence Once we have defined influence, what about finding the set of source nodes that maximizes influence? Theorem. For a wide variety of influence models, the influence maximization problem is NP-hard.

NP-hardness of influence maximization The influence maximization can be reduced to a Set Cover problem Set Cover is a well-known NP-hard problem

Submodularity of influence maximization The influence function satisfies a natural diminishing property: submodularity! F(B) F(A) In practice, the greedy solution is often (very close to) the optimal solution s s Consequence: Greedy algorithm with 63% provable guarantee

Applications: Control 1. Influence maximization 2. Activity shaping 3. When-to-post Outline of tutorial

Can we steer users’ activity in a social network in general? Activity shaping Can we steer users’ activity in a social network in general? Why this goal?

Activity shaping vs influence maximization Related to Influence Maximization Problem Activity shaping is a generalization of influence maximization One time the same piece of information It is only about maximizing adoption Influence Maximization Fixed incentive Activity Shaping Variable incentive Multiple times multiple pieces, recurrent! Many different activity shaping tasks

Event representation (Lecture 2) We represent messages using nonterminating temporal point processes: N1(t) Recurrent event: N2(t) N3(t) User Time N4(t) N5(t) [Farajtabar et al., NIPS 2014]

Events intensity (Lecture 2) Exogenous activity Memory Hawkes process User’s intensity Messages on her own initiative Influence from user ui on user u 15 [Farajtabar et al., NIPS 2014]

a few users to produce a given level of overall users’ activity Activity shaping… how? Incentivize a few users to produce a given level of overall users’ activity Exogenous activity Endogenous activity

Activity shaping… what is it? Activity Shaping: Find exogenous activity that results in a desired average overall activity at a given time: Average with respect to the history of events up to t!

Exogenous intensity & average overall intensity How do they relate? Convolution Surprisingly… linearly: matrix that depends on and non negative kernel influence matrix

Corollary exogenous intensity is constant Exact Relation If the memory g(t) is exponential: Matrix exponentials Corollary exogenous intensity is constant

Does it really work in practice?

Activity shaping optimization framework Once we know that we can find to satisfy many different goals: Activity Shaping Problem We can solve this problem efficiently for a large family of utilities! Utility (Goal) Budget Cost for incentivizing

Capped activity maximization (CAM) If our goal is maximizing the overall number of events across a social network: Max feasible activity per user

Minimax activity shaping (MMASH) If our goal is make the user with the minimum activity as active as possible:

Least-squares activity shaping (LSASH) If our goal is to achieve a pre-specified level of activity for each user or group of users:

Solving the activity shaping problem For any activity shaping problem, we need to: 1. Compute: Can be cubic in the network size Large matrix exponential Inverse of a large matrix Large matrix exponential 2. Solve the convex problem: Standard: projected gradient descent

Computing the average overall intensity The explicit computation of becomes quickly intractable for large networks (large sparse A) Key property: we don’t need but 1. [Al-Mohy et al., 2011] 2. Sparse linear systems of equations [GMRES method]:

Capped activity maximization: results +34,000 more events per month than best heuristic for 2,000 Twitter users +10% more events than best heuristic

Applications: Control 1. Influence maximization 2. Activity shaping 3. When-to-post Outline of tutorial

Social media as a broadcasting platform Everybody can build, reach and broadcast information to their own audience Broadcasted content Audience reaction

Attention is scarce Social media users follow many broadcasters Twitter feed Instagram feed Older posts Older posts

What are the best times to post? Can we design an algorithm that tell us when to post to achieve high visibility?

Representation of broadcasters and feeds Broadcasters’ posts as a counting process N(t) Users’ feeds as sum of counting processes M(t) N1(t) M1(t) t M(t) = AT N(t) N2(t) t … Mn(t) t … Nn(t) t

Broadcasting and feeds intensities M(t) t t Broadcaster intensity function (tweets / hour) Feed intensity function (tweets / hour) Given a broadcaster i and her followers Feed due to other broadcasters

Definition of visibility function Visibility of broadcaster i at follower j Position of the highest ranked tweet by broadcaster i in follower j’s wall M(t) rij(t) = 0 rij(t’) = 4 rij(t’’) = 0 In general, the visibility depends on the feed ranking mechanism! t Feed ranking …. . Older tweets Ranked stories Post by broadcaster u Post by other broadcasters

Optimal control of temporal point processes Formulate the when-to-post problem as a novel stochastic optimal control problem (of independent interest) Visibility and feed dynamics Optimizing visibility Experiments System of stochastic equations with jumps Optimal control of jumps Twitter

Visibility dynamics in a FIFO feed (I) New tweets Reverse chronological order M(t) Older tweets Rank at t+dt Other broadcasters post a story and broadcaster i does not post Broadcaster i posts a story and other broadcasters do not post Nobody posts a story rij(t)=2 rij(t+dt) = 3 rij(t)=2 rij(t+dt) =0 rij(t)=2 rij(t+dt)=2 Follower’s wall … … … … … …

Visibility dynamics in a FIFO feed (II) Zero-one law Stochastic differential equation (SDE) with jumps Broadcaster i posts a story Other broadcasters posts a story Our Goal: Optimize rij(t) over time, so that it is small, by controlling dNi(t) through the intensity μi(t)

Feed dynamics We consider a general intensity: Deterministic arbitrary intensity Stochastic self-excitation (e.g. Hawkes, inhomogeneous Poisson) Jump stochastic differential equation (SDE)

The when-to-post problem … Terminal penalty Nondecreasing loss Optimization problem Dynamics defined by Jump SDEs

Bellman’s Principle of Optimality Lemma. The optimal cost-to-go satisfies Bellman’s Principle of Optimality t Hamilton-Jacobi-Bellman (HJB) equation Partial differential equation in J (with respect to r, λ and t)

Solving the HJB equation Consider a quadratic loss Favors some periods of times (e.g., times in which the follower is online) Trade-offs visibility and number of broadcasted posts We propose and then show that the optimal intensity is: It only depends on the current visibility!

The RedQueen algorithm Consider s(t) = s u*(t) = (s/q)1/2 r(t) How do we sample the next time? r(t) Superposition principle t1 t2 t3 t4 t Δi exp( (s/q)1/2 ) t1 + Δ1 t2 + Δ2 t3 + Δ3 t4 + Δ4 mini ti + Δi It only requires sampling M(tf) times!

When-to-post for multiple followers Consider n followers and a quadratic loss: We can easily adapt the efficient sampling algorithm to multiple followers! Favors some periods of times (e.g., times in which the follower is online) Trade-offs visibility and number of broadcasted posts Then, we can show that the optimal intensity is: It only depends on the current visibilities!

Case study: one broadcaster Significance: followers’ retweets per weekday Average position over time Broadcaster’s posts RedQueen True posts 40% lower!

Post by other broadcasters Evaluation metrics Position over time Time at the top Post by broadcaster Post by other broadcasters r(t1) = 0 r(t2) = 1 r(t3) = 0 r(t4) = 1 r(t5) = 2 r(t6) = 0 Follower’s wall … … … … … … Position over time = 0x(t2 – t1) + 1x(t3 – t2) + 0x(t4 – t3) + 1x(t5 – t4) + 2x(t6 – t5) Time at the top = (t2 – t1) + 0 + (t4 – t3) + 0 + 0

broadcasters’ true posts Position over time broadcasters’ true posts average across users Better It achieves (i) 0.28x lower average position, in average, than the broadcasters’ true posts and (ii) lower average position for 100% of the users.

broadcasters’ true posts Time at the top average across users Better broadcasters’ true posts It achieves (i) 3.5x higher time at the top, in average, than the broadcasters’ true posts and (ii) higher time at the top for 99.1% of the users.

Representation: Temporal Point processes 1. Intensity function 2. Basic building blocks 3. Superposition 4. Marks and SDEs with jumps Applications: Models 1. Information propagation 2. Opinion dynamics 3. Information reliability 4. Knowledge acquisition Applications: Control 1. Influence maximization 2. Activity shaping 3. When-to-post Slides/references: learning.mpi-sws.org/sydney-seminar

Internships/PhDs/postdoc positions at Interested? Internships/PhDs/postdoc positions at learning.mpi-sws.org