A History Sensitive Cascade Model in Diffusion Networks Stephen Foster1, Walt Potter1, Jiang Wu2, Bin Hu2, Yu Zhang3 March 23, 2009 ADS’09, San Diego, CA Trinity University | Laboratory for Distributed Intelligent Agent Systems 1 Southwestern University 2 Huazhong University of Science and Technology 3 Trinity University
Outline Introduction History Sensitive Cascade Model (HSCM) A Polynomial Solution of HSCM in Tree Graphs. A Markov Solution of HSMC in General Graphs. Experiment Conclusion Trinity University | Laboratory for Distributed Intelligent Agent Systems
3 Diffusion Diffusion is a process by which information, viruses, ideas or new behavior spread over social networks. Trinity University | Laboratory for Distributed Intelligent Agent Systems
Two Basic Diffusion Models Linear Threshold Model A node becomes active if a predetermined fraction of the node’s neighbors are active. Independent Cascade Model An active node gets a one-time chance to activate each of its neighboring nodes with some probability. Threshold=60% Progressive P=50% History-Insensitive Trinity University | Laboratory for Distributed Intelligent Agent Systems
Influence Maximization Problem 5 Influence Maximization Problem Given some value k and some diffusion network with a set of nodes N, the goal is to select an initially active k-node subset from N, such that the number of nodes in N that eventually becomes active is maximized. Trinity University | Laboratory for Distributed Intelligent Agent Systems
Existing Results NP-Hard General Heuristics 6 Existing Results NP-Hard General Heuristics Greedy [Kempe, Kleinberg and Tardos 2003] Submodular A greedy strategy obtains a solution that is provably within 63% of optimal solution. Hill Climbing [Rolfe 2004] Simulated Annealing [Jackson Mo and Yariv 2005] Cost-Effective Heuristic [Leskovec, Krause and Guestrin 2007] The bound is ≥ 63%. Trinity University | Laboratory for Distributed Intelligent Agent Systems
Our Diffusion Model History Sensitive Cascade Model .99 .99 .001 .001 Alice .99 .99 Cathy Bobby .001 Donald .001 .001 Francine Ethan .99 Trinity University | Laboratory for Distributed Intelligent Agent Systems
The HSCM Algorithm function HSCM [G=(V,E), W(ev,u)] 8 function HSCM [G=(V,E), W(ev,u)] Inputs: G = (V,E) where V is a set of vertices and E is a set of edges, with some initially active vertices. W(ev,u), the spreading probability, that if v is active in time step t, then u will be active in time step t+1. For time step = 1 to k For each vertex v in V If A(v) = true For each vertex u in targets(v) random = a random number between 0 and 1 If random < W(ev,u) Set A(u) = true; Trinity University | Laboratory for Distributed Intelligent Agent Systems
Activation Probability Problem 9 Activation Probability Problem Given some time step k and some vertex v, what is the probability that v will be active at t=k. Trinity University | Laboratory for Distributed Intelligent Agent Systems
A Polynomial Solution in Tree Graphs Let G=(V,E) be a graph without cycles, and let there be no two edges in E, ew,x and ey,z, such that x=z. We use the function inf(u) to denote the vertex v such that utargets(v). Trinity University | Laboratory for Distributed Intelligent Agent Systems
After 4 time steps, the activation probability for each node. An Example 11 Initially only Node 1 is active. 1 2 3 4 5 After 4 time steps, the activation probability for each node. V P(vt=0) P(vt=1) P(vt=2) P(vt=3) P(vt=4) 1 1.0 2 0.0 0.5 0.75 0.875 0.9375 3 0.25 0.6875 4 5 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Problem with Loops T=k, Pvk T=k+1, Puk+1 = Pvk × W(ev,u) < Pvk W(eu,v) T=k, Pvk T=k+1, Puk+1 = Pvk × W(ev,u) < Pvk T=k+2, Pvk+2 = (Puk+1 Pvk) × W(eu,v) + Pvk Trinity University | Laboratory for Distributed Intelligent Agent Systems
A Markov Solution in General Graphs Consider the graph to be a finite state system, where “state” is understood as some combination of activated vertices in V. Define the system as a state transition matrix AN×N, where N is the size of PoweSet(V). Update A at each time step. Trinity University | Laboratory for Distributed Intelligent Agent Systems
The principle of inclusion/exclusion An Example A[i,j] is defined as the probability that the network will move from state i to state j. [] 1 2 1,2 3 3,1 3,2 3,1,2 [] 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0.0 0.8 0.0 0.0 0.0 0.2 0.0 0.0 2 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 1,2 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.6 3 0.0 0.0 0.0 0.0 0.4 0.1 0.4 0.1 3,1 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 3,2 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.2 3,1,2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 The principle of inclusion/exclusion 0.2+0.50.2×0.5=0.6 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Experiment Each run lasts 100 time steps Input Value Definition Nodesize 100 The total # of nodes in the network W(ev,u) 0.6, 0.58, 0.56, 0.54, …, 0.4 The spreading probability that is Uniform Distribution WU(0,1) Selected-Number 1, 2, 3, …, 9 The initial selected # of nodes for targeting in cascade Density 0.02, 0.03, 0.04, 0.06 Each run lasts 100 time steps All results are the average of 100 runs Trinity University | Laboratory for Distributed Intelligent Agent Systems
Scale-Free Network SN,K, P(K) decays as a power law. P(K) ~ K- Large networks can self-organize into a scale free state, independent of the agents. N=100 K=6 =2.5 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Dynamics vs. Node-Size and Density Selected Number=9, Nodesize=100, Threshold= 0.5, Timestep=5 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Cascade Time per Node Over Selected-Number and Threshold Trinity University | Laboratory for Distributed Intelligent Agent Systems
Dynamics vs. Initial Selected Number Nodesize=100, Threshold= 0.5, Density=0.2 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Dynamics vs. Diffusion Threshold Nodesize=100, Density=0.2 Trinity University | Laboratory for Distributed Intelligent Agent Systems
21 Conclusion History Sensitive Cascade Model (HSCM) allows activated nodes to receive more than a one-time chance to activate their neighbors. HSCM provides a polynomial algorithm for calculating the probability of activity for any arbitrary node at any arbitrary time in tree graphs, a Markov model for calculating the probability in general graphs. HSCM is intractable for most general graphs. In the future, we will study the influence maximization problems under different time constraints for HSCM. Trinity University | Laboratory for Distributed Intelligent Agent Systems
SAN ANTONIO A Case Study With one filing for every 143 households, San Antonio, ranks 21st among the top 25 cities with the highest foreclosure rates in the U.S. city. In the 2008 first quarter alone 3,830 foreclosures were filed with the city. Data provided by Realtytrac on August 2008 Trinity University | Laboratory for Distributed Intelligent Agent Systems
Acknowledgements NSF grants IIS 0755405 and CNS 0821585. Collaborators Trinity: Dr. Christine Drennon in Sociology George Mason: Dr. David Wong in Geography Drexel: Dr. Roger McCain in Economics Students Lucy Elder, Stephen Foster, Jason Leezer, Patricia Perez, Will Thornton, Hudson Thrift, Aaron Welch Trinity University | Laboratory for Distributed Intelligent Agent Systems