Download presentation
Presentation is loading. Please wait.
Published byDrusilla Johns Modified over 6 years ago
1
GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms
GRAPHALYTICS was made possible by a generous contribution from Oracle. Alexandru Iosup, Tim Hegeman, Yong Guo, Ana Lucia Varbanescu, Wing Lung Ngai, Stijn Heldens, Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi Graph image Graphalytics: Benchmarking Graph-Processing Platforms LDBC TUC Meeting UPC Barcelona, March 2016 1
2
Graphs at the Core of Our Society: The LinkedIn ExampleData Deluge
Nov 2015 Apr 2014 400 400 Sources: Vincenzo Cosenza, The State of LinkedIn, via Christopher Penn,
3
LinkedIn Is Not Unique: Data Deluge
company/day: posts, 1,000+ comments IBM 280k employee users, 2.6M followers 270M MAU 200+ avg followers >54B edges 1.2B MAU 0.8B DAU 200+ avg followers >240B edges Twitter Beevolve 2012 estimates avg # of followers at 208 Facebook -
4
Graph Processing @large
A Graph Processing Platform Distribution to processing platform ETL Active Storage (filtering, compression, replication, caching) Algorithm Interactive processing not considered in this presentation. Streaming not considered in this presentation.
5
Graph Processing @large
A Graph Processing Platform Distribution to processing platform Ideally, N cores/disks Nx faster Ideally, N cores/disks Nx faster ETL Active Storage (filtering, compression, replication, caching) Algorithm Interactive processing not considered in this presentation. Streaming not considered in this presentation.
6
Graph Processing @large
Data-intesive workload 10x graph size 100x—1,000x slower A Graph Processing Platform Distribution to processing platform Ideally, N cores/disks Nx faster Ideally, N cores/disks Nx faster ETL Active Storage (filtering, compression, replication, caching) Algorithm Compute-intesive workload different/more complex analysis ?x slower Dataset-dependent workload unfriendly graphs ??x slower Interactive processing not considered in this presentation. Streaming not considered in this presentation.
7
Graph-Processing Platforms
Platform: the combined hardware, software, and programming system that is being used to complete a graph processing task 2 Which to choose? What to tune? Trinity
8
Graphalytics, in a nutshell
An LDBC benchmark* Advanced benchmarking harness Diverse real and synthetic datasets Many classes of algorithms Granula for manual choke-point analysis Modern software engineering practices Supports many platforms Enables comparison of community-driven and industrial systems
9
Benchmarking Harness Iosup et al. LDBC Graphalytics: A Benchmark for Large Scale Graph Analysis on Parallel and Distributed Platform (submitted).
10
Graphalytics = Representative Classes of Algorithms and Datasets
2-stage selection process of algorithms and datasets Class Examples % Graph Statistics Diameter, Local Clust. Coeff., PageRank 20 Graph Traversal BFS, SSSP, DFS 50 Connected Comp. Reachability, BiCC, Weakly CC 10 Community Detection Clustering, Nearest Neighbor, Community Detection w Label Propagation 5 Other Sampling, Partitioning <15 + weighted graphs: Single-Source Shortest Paths (~35%) : 1. Literature survey of + 2. Expert selection 2009–2013, 120+ articles in 10 top conferences: SIGMOD, VLDB, HPDC, Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis, IPDPS’14.
11
Graphalytics = Distributed Graph Generation w DATAGEN
11 Rich set of configurations More diverse degree distribution than Graph500 Realistic clustering coefficient and assortativity Graphalytics Person Generation Edge Generation Activity Generation “Knows” graph serialization Activity serialization Level of Detail
12
Graph Processing System
Graphalytics = Portable Perf. Analysis w Granula Graph Processing System Logging Patch Performance Analyzer Granula Archive Granula Performance Model Modeling Archiving logs rules Archiver Sharing Monitoring Minimal code invasion + automated data collection at runtime + portable archive (+ web UI) portable bottleneck analysis
13
Graphalytics = Diverse Set of Automated Experiments
Category Experiment Algo. Data Nodes/ Threads Metrics Baseline Dataset variety BFS,PR All 1 Run, norm. Algorithm variety R4(S), D300(L) Runtime Scalability Vertical vs. horiz. BFS, PR D300(L), D1000(XL) 1—16/1—32 Runtime, S Weak vs. strong G22(S)—G26(XL) Robustness Stress test BFS SLA met Variability D300(L), D1000(L) 1/16 CV Self-Test Time to run/part -- Datagen 1—16
14
Implementation status
G=validated, on GitHub V=validation stage MapReduce2 Giraph GraphX PowerGraph Graph Lab Neo4j PGX.D Graph Mat OpenG TOTEM Map Graph Medusa LCC G -- BFS V WCC CDLP P’Rank SSSP Let me show you a demo run of our system, and maybe even some actual results.
15
Implementation status
G=validated, on GitHub V=validation stage MapReduce2 Giraph GraphX PowerGraph Graph Lab Neo4j PGX.D Graph Mat OpenG TOTEM Map Graph Medusa LCC G -- BFS V WCC CDLP P’Rank SSSP Let me show you a demo run of our system, and maybe even some actual results. Benchmarking and tuning performed by vendors
16
Graphalytics Capabilities: An Example
Graphalytics enables deep comparison of many systems at once, through diverse experiments and metrics Diverse datasets Your system here! Diverse metrics Diverse algorithms
17
Processing time (s) + Edges[+Vertices]/s
Which system is the best? It depends… Algorithm + Dataset + Metric OK, but … why is this system better for this workload for this metric?
18
Granula Visualizer Portable choke-point analysis for everyone!
19
Graphalytics = Modern Software Engineering Process
Graphalytics code reviews Internal release to LDBC partners (first, Feb 2015; last, Feb 2016) Public release, announced first through LDBC (Apr 2015) First full benchmark specification, LDBC criteria (Q1 2016) Jenkins continuous integration server SonarQube software quality analyzer
20
Graphalytics, in the future
github.com/tudelft-atlarge/graphalytics/ An LDBC benchmark* Advanced benchmarking harness Diverse real and synthetic datasets Many classes of algorithms Granula for manual choke-point analysis Modern software engineering practices Supports many platforms Enables comparison of community-driven and industrial systems + more data generation + deeper performance metrics + choke-point analysis
21
PELGA – Performance Engineering for Large-scale Graph Analytics, workshop with EuroPar 2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.