CMU SCS KDD 2006Leskovec & Faloutsos1 ??
CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos Faloutsos Carnegie Mellon University
CMU SCS KDD 2006Leskovec & Faloutsos3 Problems and recommendations Q: How to sample from a large graph? A: FF, RN Q: Which properties to preserve? A: (at least) the 13 ones we list Q: How to measure success/similarity? A: K-S, towards ‘back-in-time’ version
CMU SCS KDD 2006Leskovec & Faloutsos4 Criteria in-degree; out-degree distribution distr. of WCC; SCC hop-plot; hop-plot for WCC distr. of first left singular vector values scree plot distr. of clustering coefficient Densification power law shrinking diameter normalized size of largest c.c. first eigenvalue STATICTEMPORAL
CMU SCS KDD 2006Leskovec & Faloutsos5 Targets scale-down (= fewer nodes; same diameter, same degree etc) back-in-time (match an earlier, real, smaller version of the graph)
CMU SCS KDD 2006Leskovec & Faloutsos6 Sampling Methods RN random nodes RPN pageRank random nodes RDN random nodes, degree- biased RE random edges RNE HYB (Hybrid) RNN RJ random jump RW random walk FF Forest fire
CMU SCS KDD 2006Leskovec & Faloutsos7 4 Datasets Arxiv (author-paper) Citation (HEP-TH, HEP-PH) A.S. epinions.com 26K - 500K edges
CMU SCS KDD 2006Leskovec & Faloutsos8 Diameter vs N; CC vs degree
CMU SCS KDD 2006Leskovec & Faloutsos9 degree distribution; avg CC vs N
CMU SCS KDD 2006Leskovec & Faloutsos10 diameterDPL
CMU SCS KDD 2006Leskovec & Faloutsos11 better D-statistic vs sample size scale-downback-in-time
CMU SCS KDD 2006Leskovec & Faloutsos12 Conclusions random nodes + a little exploration -> FF (RN, RJ are close) 15% sample seems enough back-in-time concept