LDBC & The Social Network Benchmark Peter Boncz Database Architectures CWI Special chair “Large-Scale Data VU event.cwi.nl/lsde2015.

Slides:



Advertisements
Similar presentations
Project Goals and Status Peter Boncz (CWI & VU Amsterdam) Amsterdam April 3, 2014.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Project Goals and Status Peter Boncz (VU Amsterdam) Munich April 22+23, 2013.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
Y.C. Tay National University of Singapore Data Generation for Application-Specific Benchmarking.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Overview of Web Data Mining and Applications Part I
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Models of Influence in Online Social Networks
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Guillaume Erétéo, Michel Buffa, Fabien Gandon, Olivier Corby.
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
Project Overview for the Technical User Community Peter Boncz (VU Amsterdam) UPC November 19+20, 2012.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
LDBC: Benchmarking Graph Data Management Systems Peter Boncz.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Linked Data Benchmark Council 2-year status report LDBC Linked Data Benchmark Council 2-year status report Peter Boncz.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
December 7-10, 2013, Dallas, Texas
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks: a dynamical approach to a topological.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Service Reliability Engineering The Chinese University of Hong Kong
Informatics tools in network science
The Structure of Scientific Collaboration Networks by M. E. J. Newman CMSC 601 Paper Summary Marie desJardins January 27, 2009.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
Mining of Massive Datasets Edited based on Leskovec’s from
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Book web site:
Neo4j: GRAPH DATABASE 27 March, 2017
Lecture 23: Structure of Networks
Cohesive Subgraph Computation over Large Graphs
A Viewpoint-based Approach for Interaction Graph Analysis
Groups of vertices and Core-periphery structure
Tutorial: Big Data Algorithms and Applications Under Hadoop
GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Probabilistic Data Management
David Ostrovsky | Couchbase
Dieudo Mulamba November 2017
Lecture 23: Structure of Networks
Generative Model To Construct Blog and Post Networks In Blogosphere
Towards Next Generation Panel at SAINT 2002
Apache Spark & Complex Network
The likelihood of linking to a popular website is higher
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Department of Computer Science University of York
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Lecture 23: Structure of Networks
Social Network Analysis with Apache Spark and Neo4J
Introducing complex networks into quantum regime
Analyzing Massive Graphs - ParT I
Presentation transcript:

LDBC & The Social Network Benchmark Peter Boncz Database Architectures CWI Special chair “Large-Scale Data VU event.cwi.nl/lsde2015 LDBC & The Social Network Benchmark - Scientific Meeting

Engines for Data Analysis LDBC & The Social Network Benchmark - Scientific Meeting Inaugural Lecture October 2014

The Start-Up Company Experience    LDBC & The Social Network Benchmark - Scientific Meeting

the relational industry has been reshaped...

LDBC & The Social Network Benchmark Peter Boncz Database Architectures CWI Special chair “Large-Scale Data VU event.cwi.nl/lsde2015 LDBC & The Social Network Benchmark - Scientific Meeting

 a benchmark is a standard test that measures efficiency Goal: quantification  make competing systems comparable  important tool in experimental science  accelerate progress, make technology viable  social goal, influence a research field Benchmarking? LDBC & The Social Network Benchmark - Scientific Meeting

Graph data management Many Big Data problems revolve around graphs  Social network data  AI methods that build/discover relationships Wave of new systems (/research):  Graph database systems  e.g. Neo4j -- graph & paths “first class citizens”  RDF / SPARQL systems  Graph extensions to relational systems  Extensions: e.g. recursive queries, traversals  Graph Programming Frameworks  leveraging cluster computing for graph algorithms  e.g. GraphLab – distributed AI algorithms  Giraph “think like a vertex” LDBC & The Social Network Benchmark - Scientific Meeting

SNB (Social Network Benchmark) schema LDBC & The Social Network Benchmark - Scientific Meeting

SNB Workloads  Interactive: tests a system's throughput with relatively simple queries with concurrent updates  For one person, recommend a friend based on shared friends and interests  Business Intelligence: consists of complex structured queries for analyzing online behavior  Who are influential people the topic of open source development?  Graph Analytics: tests the functionality and scalability on most of the data as a single operation  PageRank, Shortest Paths, Community Detection LDBC & The Social Network Benchmark - Scientific Meeting

Social Networks  correlation between property values and network structure LDBC & The Social Network Benchmark - Scientific Meeting

SNB datagen: correlated graph structure LDBC & The Social Network Benchmark - Scientific Meeting P4 P5 Student “Anna” “University of Leipzig” “Germany” “1990” P1 “University of Leipzig” “Laura” “1990” P3 “University of Leipzig” “1990” P2 “University of Amsterdam” “Netherlands”

SNB datagen: correlated graph structure P4P4 P5P5 Student “Anna” “University of Leipzig” “Germany ” “1990” P1P1 “University of Leipzig” “Laura ” “1990 ” P3P3 “Universit y of Leipzig” “1990 ” P2P2 “University of Amsterdam” “Netherland s” Danger: this is very expensive to compute on a large graph! (quadratic, random access) ? ? ? ? ? Compute similarity of two nodes based on their (correlated) properties. Use a probability density function wrt to this similarity for connecting nodes Compute similarity of two nodes based on their (correlated) properties. Use a probability density function wrt to this similarity for connecting nodes connection probability highly similar  less similar ?

SNB datagen: correlated graph structure P4P4 P5P5 Student “Anna” “University of Leipzig” “Germany ” “1990” P1P1 “University of Leipzig” “Laura ” “1990 ” P3P3 “University of Leipzig” “1990 ” P2P2 “University of Amsterdam” “Netherland s” Probability that two nodes are connected is skewed w.r.t the similarity between the nodes (due to probability distr.) connection probability highly similar  less similar Window Trick: disregard nodes with too large similarity distance (only connect nodes in a similarity window)

SNB datagen: MapReduce approach LDBC & The Social Network Benchmark - Scientific Meeting

SNB datagen: temporal effects LDBC & The Social Network Benchmark - Scientific Meeting

SNB datagen: friend degree distribution LDBC & The Social Network Benchmark - Scientific Meeting  Based on “Anatomy of Facebook” blogpost (2013)  Diameter increases logarithmically with dataset scale factor

SNB datagen: how realistic is it? LDBC & The Social Network Benchmark - Scientific Meeting GRADES2014 “How community-like is the structure of synthetically generated graphs” - Arnau Prat (UPC); David Domínguez-Sal (Sparsity Technologies) Livejournal LFR3 (synthetic)SNB datagen

ldbcouncil.org LDBC & The Social Network Benchmark - Scientific Meeting github/ldbc

Industry Membership LDBC & The Social Network Benchmark - Scientific Meeting

Summary  LDBC  Graph and RDF benchmark council  Choke-point driven benchmark design (user+system expert involvement)  Social Network Benchmark (SNB)  Advanced social network generator (scale-free,power-laws,clsuetring,correlations)  Real data distributions from DBpedia LDBC & The Social Network Benchmark - Scientific Meeting SIGMOD 2015 publication (to appear)

Working with Industry increases impact  Jim GrayMichael Stonebreaker Designing Engines for Data Analysis - Inaugural Lecture - 14/10/2014 ACM Turing Award 1998 IEEE Von Neumann Medal 2004 ACM Turing Award 2015