Kijung Shin1 Mohammad Hammoud1

Slides:



Advertisements
Similar presentations
Lindsey Bleimes Charlie Garrod Adam Meyerson
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
gSpan: Graph-based substructure pattern mining
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Fast Algorithms For Hierarchical Range Histogram Constructions
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Mining High Utility Itemset in Big Data
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
A New Hybrid Wireless Sensor Network Localization System Ahmed A. Ahmed, Hongchi Shi, and Yi Shang Department of Computer Science University of Missouri-Columbia.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Kijung Shin Jinhong Jung Lee Sael U Kang
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks (WWW2013) BEUTEL, ALEX, WANHONG XU, VENKATESAN GURUSWAMI, CHRISTOPHER.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.
1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)
A Peta-Scale Graph Mining System
Data Driven Resource Allocation for Distributed Learning
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
International Conference on Data Engineering (ICDE 2016)
Chilimbi, et al. (2014) Microsoft Research
Applying Control Theory to Stream Processing Systems
Tohoku University, Japan
Spark Presentation.
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
Augmented Sketch: Faster and More Accurate Stream Processing
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Supporting Fault-Tolerance in Streaming Grid Applications
Introduction to Spark.
Spatial Online Sampling and Aggregation
StreamApprox Approximate Stream Analytics in Apache Flink
湖南大学-信息科学与工程学院-计算机与科学系
Large Graph Mining: Power Tools and a Practitioner’s guide
On Spatial Joins in MapReduce
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Asymmetric Transitivity Preserving Graph Embedding
GANG: Detecting Fraudulent Users in OSNs
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Resource Allocation for Distributed Streaming Applications
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Tri-Fly Distributed Estimation of Global and Local Triangle Counts in Graph Streams Kijung Shin1 Mohammad Hammoud1 Euiwoong Lee1 Jinoh Oh2 Christos Faloutsos1 1 Carnegie Mellon University 2 Adobe Systems

Triangles in a Graph Graphs are everywhere! Introduction Algorithm Experiments Problem Conclusion Analysis Triangles in a Graph Graphs are everywhere! social networks, the web, citation networks Triangles are a fundamental primitive 3 nodes connected to each other Counting triangles has many applications community detection, anomaly detection, query optimization Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Application: Anomaly Detection Introduction Algorithm Experiments Problem Conclusion Analysis Application: Anomaly Detection [KMF11] [LJK18] # Incident Triangles # Incident Triangles Telemarketer Degree Degree Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Remaining Challenges Counting triangles in real-world graphs, such as Introduction Algorithm Experiments Problem Conclusion Analysis Remaining Challenges Counting triangles in real-world graphs, such as Real-world graphs are Large: not fitting in main memory Dynamic: growing with new nodes and edges online social networks Web Citation networks Call networks Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Previous Approaches Distributed algorithms [SS11] [PC13] [PPK18] Introduction Algorithm Experiments Problem Conclusion Analysis Previous Approaches Distributed algorithms [SS11] [PC13] [PPK18] pros: utilize multiple machines cons: inapplicable to dynamic graphs Streaming algorithms [DERU16] [Shi17] [LJK18] pros: applicable to dynamic graphs cons: limited to a single machine Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Our Approach and Goal Can we have the best of both worlds? Introduction Algorithm Experiments Problem Conclusion Analysis Our Approach and Goal Can we have the best of both worlds? for dynamic graphs on multiple machines We design a distributed streaming algorithm Fast and Accurate: outperforming competitors Scalable: with linear data scalability Theoretically Sound: with unbiased estimates Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Road Map Problem Definition Algorithm: Tri-Fly Theoretical Analyses Experiments Conclusion Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Problem Definition Given: graph stream Introduction Algorithm Experiments Problem Conclusion Analysis Problem Definition Given: graph stream a sequence of new edges in a dynamic graph Estimate: counts of global and local triangles Using: multiple machines with limited memory up to 𝑘 edges can be stored in each machine to Minimize: estimation error Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Problem Definition (cont.) Introduction Algorithm Experiments Problem Conclusion Analysis Problem Definition (cont.) Given: graph stream a sequence of new edges in a dynamic graph Estimate: counts of global and local triangles Using: multiple machines with limited memory up to 𝑘 edges can be stored in each machine to Minimize: estimation error 3 2 1 2 3 4 1 Global triangles: all triangles in the graph Local triangles: the triangles incident to each node 3 2 1 Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Road Map Problem Definition Algorithm: Tri-Fly << Theoretical Analyses Experiments Conclusion Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Inputs: new edges streamed from source(s) discover triangles with limited memory aggregate estimates Introduction Algorithm Experiments Problem Conclusion Analysis Overview of Tri-Fly Inputs: new edges streamed from source(s) master(s) worker(s) aggregator(s) source(s) Outputs: estimated counts of global and local triangles Processes each new edge when it arrives Updates estimated counts for each edge Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Overview of Tri-Fly (cont.) discover triangles with limited memory aggregate estimates Introduction Algorithm Experiments Problem Conclusion Analysis Overview of Tri-Fly (cont.) unicast broadcast shuffle counts by ℎ(node) new edge ℎ( ) ℎ( )=ℎ( ) master(s) worker(s) ×4 ×2 aggregator(s) source(s) count new triangles using local memory ×4 ×2 aggregate counts & update outputs Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Challenge: Limited Memory Introduction Algorithm Experiments Problem Conclusion Analysis Challenge: Limited Memory How should we ‘count’ and ‘aggregate’ for accurate estimation when each machine has limited memory? Our solution adapts Triest-IMPR [DERU16] ℎ( ) master(s) worker(s) ×4 aggregator(s) ×4 ℎ( ) ℎ( )=ℎ( ) ×4 source(s) ×2 count new triangles using local memory ×4 ×4 aggregate counts & update outputs ×4 ×2 Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Workers in Detail Details Runs three steps for each received edge Introduction Algorithm Experiments Problem Conclusion Analysis Details Workers in Detail Runs three steps for each received edge (a) Edge arrival (b) Discovering (c) Sampling new edge 𝑢−𝑣 𝑢−𝑣 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑦 𝑢 | 𝑥 𝑢 | 𝑣 𝑣 | 𝑥 𝑣 | 𝑦 memory Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Workers in Detail (cont.) Introduction Algorithm Experiments Problem Conclusion Analysis Details Workers in Detail (cont.) (a) Edge arrival step receives a new edge (a) Edge arrival new edge 𝑢−𝑣 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | memory Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Workers in Detail (cont.) Introduction Algorithm Experiments Problem Conclusion Analysis Details Workers in Detail (cont.) (b) Discovering step discovers new triangles in its local memory sends updates to the aggregators 𝛿:= 1 / discovering prob. of the triangle 𝑢−𝑣 𝑥 (a) Edge arrival (b) Discovering discovered !! new edge 𝑢−𝑣 𝑢−𝑣 send (𝑢,𝛿) to aggregator ℎ 𝑢 (𝑣,𝛿) to aggregator ℎ 𝑣 (𝑥,𝛿) to aggregator ℎ 𝑥 ( ,𝛿) to aggregator ℎ( ) 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑦 memory Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Workers in Detail (cont.) Introduction Algorithm Experiments Problem Conclusion Analysis Details Workers in Detail (cont.) (b) Discovering step discovers new triangles in its local memory sends updates to the aggregators 𝛿:= 1 / discovering prob. of the triangle 𝑢−𝑣 𝑦 (a) Edge arrival (b) Discovering discovered !! new edge 𝑢−𝑣 𝑢−𝑣 send (𝑢,𝛿) to aggregator ℎ 𝑢 (𝑣,𝛿) to aggregator ℎ 𝑣 (𝑦,𝛿) to aggregator ℎ 𝑦 ( ,𝛿) to aggregator ℎ( ) 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑦 memory Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Workers in Detail (cont.) Introduction Algorithm Experiments Problem Conclusion Analysis Details Workers in Detail (cont.) (c) Sampling step stores or discards the new edge follows the standard reservoir sampling (a) Edge arrival (b) Discovering (c) Sampling new edge 𝑢−𝑣 𝑢−𝑣 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑢 | 𝑥 𝑢 | 𝑦 𝑣 | 𝑥 𝑣 | 𝑦 𝑢 | 𝑥 𝑢 | 𝑣 𝑣 | 𝑥 𝑣 | 𝑦 memory Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Aggregators in Detail Details Maintain estimates Introduction Algorithm Experiments Problem Conclusion Analysis Details Aggregators in Detail Maintain estimates 𝒈 in ℎ( ) for the global triangle count 𝒍[𝒖] in ℎ(𝑢) for the local triangle count of node 𝑢 Update estimates for each update ,𝜹 , increase 𝒈 by 𝛿 number of workers for each update 𝒖,𝜹 , increase 𝒍[𝒖] by 𝛿 number of workers Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Summary of Tri-Fly discover triangles with limited memory aggregate estimates Introduction Algorithm Experiments Problem Conclusion Analysis Summary of Tri-Fly unicast broadcast shuffle counts by ℎ(node) new edge ℎ( ) ℎ( )=ℎ( ) master(s) worker(s) ×4 ×2 aggregator(s) source(s) count new triangles in its local memory ×4 ×2 aggregate counts & update outputs Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Road Map Problem Definition Algorithm: Tri-Fly Theoretical Analyses << Experiments Conclusion Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

𝑬𝒙𝒑 𝒈 =𝐓𝐫𝐮𝐞 𝐠𝐥𝐨𝐛𝐚𝐥 𝐜𝐨𝐮𝐧𝐭 𝑬𝒙𝒑 𝒍 𝒖 =𝐓𝐫𝐮𝐞 𝐥𝐨𝐜𝐚𝐥 𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝒖 Introduction Algorithm Experiments Problem Conclusion Analysis THM1: Unbiasedness Tri-Fly maintains estimates satisfying the following: 𝑬𝒙𝒑 𝒈 =𝐓𝐫𝐮𝐞 𝐠𝐥𝐨𝐛𝐚𝐥 𝐜𝐨𝐮𝐧𝐭 For each node 𝑢, 𝑬𝒙𝒑 𝒍 𝒖 =𝐓𝐫𝐮𝐞 𝐥𝐨𝐜𝐚𝐥 𝐜𝐨𝐮𝐧𝐭 𝐨𝐟 𝒖 True Count Frequency Estimates Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

THM2: Linear Drop of Variance Introduction Algorithm Experiments Problem Conclusion Analysis THM2: Linear Drop of Variance Tri-Fly maintains estimates satisfying the following: 𝑽𝒂𝒓 𝒈 ∝𝟏 / 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐰𝐨𝐫𝐤𝐞𝐫𝐬 For each node 𝑢, 𝑽𝒂𝒓 𝒍 𝒖 ∝𝟏 / 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐰𝐨𝐫𝐤𝐞𝐫𝐬 log(Variance) log(# Workers) Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

THM3: Linear Scalability Introduction Algorithm Experiments Problem Conclusion Analysis THM3: Linear Scalability With a fixed per-worker memory budget 𝑘, 𝐑𝐮𝐧𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐨𝐟 Tri-Fly 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐞𝐝𝐠𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐢𝐧𝐩𝐮𝐭 𝐬𝐭𝐫𝐞𝐚𝐦 ∝ Running Time # Edges Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Properties of Tri-Fly Fast and accurate: outperforming competitors Introduction Algorithm Experiments Problem Conclusion Analysis Properties of Tri-Fly Fast and accurate: outperforming competitors Scalable: with linear data scalability (THM 3) Theoretically sound: with unbiased estimates (THM 1) Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Road Map Problem Definition Algorithm: Tri-Fly Theoretical Analyses Experiments << Conclusion Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Experimental Settings Introduction Algorithm Experiments Problem Conclusion Analysis Experimental Settings Competitors: MASCOT [LJK18] & Triest-IMPR [DERU16] state-of-the-art single-machine streaming algorithms for both global and local triangle counts Implementations: C++ & MPICH (asynchronous communication) 1 master & 1 aggregator & up to 40 workers Datasets: ER Synthetic (100B) Social (1.8B+) Social (22M+) Patent citation (16M+) Web (6M+) Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

EXP1. Bias Analysis “Does Tri-Fly give unbiased estimates?” (THM 1) Introduction Algorithm Experiments Problem Conclusion Analysis EXP1. Bias Analysis “Does Tri-Fly give unbiased estimates?” (THM 1) True Count Tri-Fly (10 workers) Tri-Fly (5 workers) Tri-Fly (1 worker) - 𝑘: 5% of edges - Dataset: Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

EXP2. Variance Analysis “How rapidly does the variance decrease Introduction Algorithm Experiments Problem Conclusion Analysis EXP2. Variance Analysis “How rapidly does the variance decrease w.r.t. the number of workers?” (THM 2) MASCOT Triest-IMPR Tri-Fly Slope=−1.0 - 𝑘: 5% of edges - Dataset: Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin) 29/36

Introduction Algorithm Experiments Problem Conclusion Analysis EXP3. Speed and Accuracy “Does Tri-Fly outperform single-machine baselines?” Tri-Fly 30 workers, 𝑘: {2%,5%,40%} of edges, Dataset: Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Introduction Algorithm Experiments Problem Conclusion Analysis EXP3. Speed and Accuracy “Does Tri-Fly outperform single-machine baselines?” Tri-Fly Root Mean Square Error 30 workers, 𝑘: {2%,5%,40%} of edges, Dataset: Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

EXP4. Scalability ER “Does Tri-Fly scale linearly with Introduction Algorithm Experiments Problem Conclusion Analysis EXP4. Scalability “Does Tri-Fly scale linearly with the size of the input stream?” (THM 3) Tri-Fly Linear Increase (slope=1) 100B edges (800GB) ER 30 workers, 𝑘: 10 7 , Dataset: Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Introduction Algorithm Experiments Problem Conclusion Analysis Properties of Tri-Fly Fast and accurate: outperforming competitors (EXP 3) Scalable: with linear data scalability (EXP 4) Theoretically sound: with unbiased estimates (EXP 1) Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Road Map Problem Definition Algorithm: Tri-Fly Theoretical Analyses Experiments Conclusion << Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

Conclusion Tri-Fly We propose Tri-Fly Introduction Algorithm Experiments Problem Conclusion Analysis Conclusion We propose Tri-Fly the first distributed streaming algorithm for counting global and local triangles Code and datasets: https://github.com/kijungs/trifly Fast & Accurate Scalable Theoretically Sound Tri-Fly Download Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)

References Introduction Algorithm Experiments Problem Conclusion Analysis References [SV11] Siddharth Suri, Sergei Vassilvitskii, “Counting triangles and the curse of the last reducer” WWW 2011 [KMF11] U Kang, Brendan Meeder, Christos Faloutsos, “Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation” PADD 2011 [PC13] Ha-Myung Park, Chin-Wan Chung, “An Efficient MapReduce Algorithm for Counting Triangles in a Very Large graph”, CIKM 2013 [DERU16] Lorenzo De Stefani et al., “TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size.” KDD 2016 [Shi17] Kijung Shin, “WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017 [LJK18] Yongsub Lim, Minsoo Jung, U Kang, “Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs”, TKDD 2018 [PPK18] Ha-Myung Park, Chiwan Park, U Kang, “PegasusN: A Scalable and Versatile Graph Mining System”, AAA 18 Distributed Estimation of Global and Local Triangle Counts in Graph Streams (by Kijung Shin)