Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity University CS REU, 05.June.2009
Motivation Network Diffusion: the process by which some nodes in a network influence other, neighboring, nodes and change their state Applications Brand recognition Diffusion in other domains Infectious diseases Ideas New technologies
Modeling Network Diffusion Common Models Linear Threshold Model: node activates when a certain (weighted) fraction of its neighbors is active Independent Cascade Model: active node has a one-time chance of activating a neighbor and succeeds with certain probability
Modeling Network Diffusion New Model History Sensitive Cascade Model (HSCM) Main idea: Allows nodes to try to activate neighbors multiple times Benefit: More plausible as in reality people have multiple interactions with each other
History Sensitive Cascade Model Application: A company releases a new product -- what should the advertising target audience be? Consumers with the highest willingness to pay? More influential consumers? Model consumers as nodes that have both “intrinsic” value and “network” value.
History Sensitive Cascade Model Application: A company releases a new product -- what should the advertising target audience be? Consumer with low intrinsic value worth marketing to just because of her network value Marketing to a profitable consumer may be redundant if network effect already makes her likely to buy
History Sensitive Cascade Model Problems Given a node, what is the probability of this node becoming active at a given time? (Vertex Activation Problem) What is the best subset of nodes to activate initially as to maximize the number of active nodes given a certain time for interaction? (Optimization Problem)
History Sensitive Cascade Model Problems Current algorithm implementing HSCM runs in exponential time We hope to invent an approximation algorithm running in polynomial time
3. Problem Definition The problems we are trying to solve
Outline Vertex Activation Problem – Approximating it Optimization Problems – Time Minimization – Activation Maximization – Approximating them
Vertex Activation Problem Given a directed, and weighted graph G – Each edge represents the probability of that edge’s source activating its target in one time step. – What is the probability that a certain vertex v is active on the k th time step?
Vertex Activation Approximation Problem Given a directed, and weighted graph G, a vertex v and a time step k If we have a program P that takes (G,k,v) and returns the exact probability of v being active by the k th time step Create a program A such that – |P(G,k,v)-A(G,k,v)|≤ε – 0<ε<1 – Guaranteed to be ε for all G,k,v.
Possible Problems With the Approximation We may not be able to create a polynomial time approximation algorithm for general graphs for any ε<1 because of the complexity of the HSCM model – We will explore this, and if we can’t do it, then we’ll do it for restricted graphs, – A polynomial time solution has been created during last year’s REU for tree graphs.
What we can do with a Vertex Activation Solver Use the concept of Θ-Certitude – We are Θ-certain that a particular vertex is active by the k th time step if P(G,v,k)≥Θ Determine whether we are Θ-certain that a subest of V, U is active by time step k – We simply check that P(G,u,k)≥Θ, for all u in U. We use Θ-Certitude to define two optimization problems.
Time Minimization Problem Given G, and a number m<|V| – Which subset of V, U where |U| ≤m should be selected – So that k is minimized, where k refers to the time step where all v in V are activated with Θ- Certitude.
Activation Maximization Problem Given G, and m<|V| – Which subset of V, U should be selected such that at the k th time step – The size of the set of nodes activated with Θ- Certitude, |A Θ | is maximized. Both optimization problems are NP-C, so in order to work with large data sets, we need to create approximations.
Approximating the Activation Maximization Problem Given G, and m<|V|, which subset of V, U should be selected such that – At time step k, the size of the set of vertices activated with Θ-certitude, |A Θ | is at least of size ε|A Θ * | – 0<ε<1 – |A Θ * | denotes the size of set of vertices activated with Θ-certitude if the optimal U is chosen.
4. Proposed Solution The strategies we expect to use to solve our problems
Solving the Vertex Activation Problem Building up from the work of last year’s REU we have created and implemented an algorithm Uses Markov chains to calculate the probability of a vertex being activated by the k th time step Involves multiplying a state transition matrix; since there are 2 |V| states the graph can take, this matrix is of size 2 2|V| It can be multiplied in polynomial time, but its size forces the algorithm overall to run in exponential time.
A Graph and the State Transition Matrix [][0][1][0, 1][2][0, 2][1, 2][0, 1, 2] [] [0] [1] [0, 1] [2] [0, 2] [1, 2] [0, 1, 2]
Empirical Evidence of Intractibility
Wrapping Up the Vertex Activation Problem Provide a rigorous analysis of the space and time complexities Optimize the matrix calculation and matrix multiplication – It’s easy to determine that it’s not possible for our graph to go from some states to others, or whether it cannot move from some states. – Take advantage of the fact that the matrix is upper-triangular.
Some (unexplored) ideas for approximating the Vertex Activation Problem Instead of using the Vertex Activation Problem in order to decide how good a set U is, heuristically determine a set of the most influential nodes in the graph – This might be done using standard graph search, path, or spanning tree algorithms. Simulate the History Sensitive Cascade Model, without paying too much attention to the cyclical nature of the graph Use Bayesian Networks to solve the Vertex Activation Problem, and determine whether they are easier to simulate.
Approximating the Optimization Problems The solutions we have in mind depend on us being able to determine how good some proposed solution U is (U is a subset of V). – Hopefully we will be able to do this with our approximation to the Vertex Activation Problem, otherwise we might use a heuristic as described before. Given this, we hope to explore several strategies for calculating U: – Algorithms that greedily add vertices to U – Hill-Climbing and Simulated-Annealing algorithms – A Genetic Algorithm
Proposed Experiment Domain Difficult to test Need two datasets: Feed the initial state of the network to the algorithm and compare against the final state Vertex Activation Problem is NP- Complete: The approximation algorithm will not fully reflect the expressive power of the model
Proposed Experiment Domain Simulation Test approximations against optimal predictions (Kempe at al., Maximizing the Spread of Influence through a Social Network)
Proposed Experiment Domain Comparison of HSCM with collected data The arXiv database Contains citations between scientific papers Probability of a certain author being cited at a given point, depending on the set of all others he cited and who cited him. A Keyboard The keys you press influence, which keys you will press next Interesting optimization problems: Dvorak vs QWERTY, etc.
Timeline End of next week: Whole system up and running; using the exponential-time algorithm In three weeks: Approximation of the Vertex Activation Problem In four weeks: Genetic algorithm to approximate the Optimization Problem In five weeks: Other ways to approximate the Optimization Problem
Conclusion Novel research We understand the problem But maybe not in its whole complexity? Venture into algorithm design Haven’t had much experience in this Learn a lot Even if goal fails Algorithms + AI: approximation techniques + applications of model (future work)
Thank you for listening We will take questions