DM-MEETING Bijaya Adhikari
OUTLINE From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics Yu et al. Graph Summarization with Quality Guarantees Riondato et al.
FROM MICRO TO MACRO: UNCOVERING AND PREDICTING INFORMATION CASCADING PROCESS WITH BEHAVIORAL DYNAMICS
MOTIVATION Can we predict cascades in a network ? Are they predictable ? If yes, given an early stage of information cascade, can we predict its cumulative cascade size for any later time ?
KEY IDEA When a node is involved in cascades, so are some of its offspring. If the dynamic process of these node level sub-cascades can be accurately modelled, then the whole cascade process can be predicted by an additive function of these local sub-cascades. Look into micro mechanism of cascades by decomposing it into multiple local (one-hop) sub-cascades and predict cascading processes.
ILLUSTRATION
EXAMPLE Comparison of Prediction for observations at various times against the true cascade(red)
BEHAVIORAL DYNAMICS Behavioral dynamics of a node captures cumulative number of its infected descendants once it gets infected Cumulative size varies from cascade to cascade, use survival rate
PARAMETERIZING BEHAVIORAL DYNAMICS KS-Statistic shows that Weibull distribution is most adequate for parameterizing behavioral dynamics PDF Survival Hazard Source:
COVARIATES OF BEHAVIORAL FEATURES Some nodes have no or very little sub-cascades and the parameters learned form data are difficult to interpret (twitter like data)
WHY CAN WE INFER CASCADES FROM EARLY STAGES ? Minor Dominance and Early Stage Dominance
FORMAL STATEMENT
SURVIVAL ANALYSIS
NETWORKED WEIBULL REGRESSION (NEWER) MODEL Fit Weibull distribution on survival time of node i
REGULARIZED NLL FOR NEWER Optimize F by coordinate descent
EFFICIENT CASCADE PREDICTION
SAMPLING MODEL Estimate Cascade dynamically so that the changes are monitored Sub-cascade generated by a node is zero if no other node is involved Temporal size counter and final death rate do not change but death rate increases over time Causes relative error rate of Therefore cascade size can be dynamically estimated within some error bound
EXPERIMENTS : CASCADE SIZE PREDICTION
EXPERIMENTS: OUTBREAK TIME PREDICTION
GRAPH SUMMARIZATION WITH QUALITY GUARANTEES
MOTIVATION As the graph sizes grow, analysis, visualizing, and mining graphs become computationally challenging. As large networks do not fit in memory, accessing disk makes computation even slower. Can we find lossy concise representation of large graph that fits into main memory ?
DEFINITION Given a graph G =(V, E) and an integer k, k summary S of G is a complete weighted undirected graph The vertices of S are called supernodes and they have superedges between them Each superedge is weighted by density of edges between V i and V J Where, A G is the Adjacency matrix of original graph
DEFINITION Density matrix The density matrix can be lifter to n*n matrix, Where s(v) of a vertex in a original graph is a supernode in S
EXAMPLE
PROBLEM DEFINITION
L P RECONSTRUCTION ERROR
THE BEST MATRIX FOR A GIVEN PARTITION Given a k partition we say that n*n matrix M is P- constatnt if S i * S J submatrix of M is constant for all i and j between 1 an k It is shown that finding a P-constant matrix to represent the graph with some guaranteed quality reduces to k-means problem with l 2 metric (k-meadian with l 1 metric)
EXPERIMENTS: RECONSTRUCTION ERROR
EXPERIMENTS: SUMMARIZATION