David Ostrovsky | Couchbase

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
Knowledge Graph: Connecting Big Data Semantics
AMDM UNIT 7: Networks and Graphs
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Introduction to Networks HON207. Graph Theory In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to.
Graphs. Graph A “graph” is a collection of “nodes” that are connected to each other Graph Theory: This novel way of solving problems was invented by a.
Euler and Hamilton Paths
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Neo4j Adam Foust.
Graphs and Euler cycles Let Maths take you Further…
Neo4j Sarvesh Nagarajan TODO: Perhaps add a picture here.
LDBC & The Social Network Benchmark Peter Boncz Database Architectures CWI Special chair “Large-Scale Data VU event.cwi.nl/lsde2015.
Proximity service Main idea – provide “glue” between experiments and sonar topology – mainly map sonars to storages and vice versa – determine existing.
Euler Paths and Circuits. The original problem A resident of Konigsberg wrote to Leonard Euler saying that a popular pastime for couples was to try.
The Bridge Obsession Problem By Vamshi Krishna Vedam.
Titan Graph Database Meet Bhatt(13MCEC02).
Can you find a way to cross every bridge only once?
SQL vs NOSQL Discussion
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
COMP5338 – Advanced Data Models
Euler and Hamilton Paths
Euler and Hamilton Paths. Euler Paths and Circuits The Seven bridges of Königsberg a b c d A B C D.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Matthew Winter and Ned Shawa
Associated Matrices of Vertex Edge Graphs Euler Paths and Circuits Block Days April 30, May 1 and May
11 Introduction to Neo4j. 2 We all have our own graphs...
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
Euler Paths and Circuits. The original problem A resident of Konigsberg wrote to Leonard Euler saying that a popular pastime for couples was to try.
Lecture 11: 9.4 Connectivity Paths in Undirected & Directed Graphs Graph Isomorphisms Counting Paths between Vertices 9.5 Euler and Hamilton Paths Euler.
Chapter 6: Graphs 6.1 Euler Circuits
Review Euler Graph Theory: DEFINITION: A NETWORK IS A FIGURE MADE UP OF POINTS (VERTICES) CONNECTED BY NON-INTERSECTING CURVES (ARCS). DEFINITION: A VERTEX.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
NoSQL databases A brief introduction NoSQL databases1.
Graph Database - Neo4j ISQS3358, Spring Graph Database A graph database is a database that uses graph structures for semantic queries with nodes,
Apache Tinkerpop What is Tinkerpop ? What can it do ? Why am I interested ? Uses Gremlin Implementations Define Graphs Traverse Graphs Architecture Books.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Factors.
BIG DATA/ Hadoop Interview Questions.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. How Can RDF and OWL Coexist with Property Graph Zhe Wu Architect Oracle Spatial and.
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
Microsoft Ignite /28/2017 6:07 PM
Apache Titan What is Titan ? Graph Storage Uses Tinkerpop CAP Theorum Architecture Books
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
CSCI5570 Large Scale Data Processing Systems
NoSQL: Graph Databases
and Big Data Storage Systems
Big Data is a Big Deal!.
Graph theory. Graph theory Leonard Euler (“Oiler”)
Every Good Graph Starts With
Introduction & Options for Storing Connected Data
Operational & Analytical Database
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Intro to MA 4027 Graph Theory
NOSQL databases and Big Data Storage Systems
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Storage Systems for Managing Voluminous Data
NOSQL and CAP Theorem.
Introduction to Graph Theory Euler and Hamilton Paths and Circuits
Managing batch processing Transient Azure SQL Warehouse Resource
Overview of big data tools
Konigsberg- in days past.
Database Systems Summary and Overview
Euler and Hamilton Paths
Introduction to NoSQL Database Systems
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Polyglot Persistence: Graph Stores
Presentation transcript:

David Ostrovsky | Couchbase Who’s afraid of graphs? David Ostrovsky | Couchbase

The Seven Bridges of Konigsberg Problem Leonard Euler The Seven Bridges of Konigsberg Problem Devise a route through the city that only crosses each bridge once. Paper published in 1736 – regarded as the first paper on Graph Theory. Konigsberg, Prussia – which is Kaliningrad, Russia today.

Graph Databases Use Nodes, Edges and Properties to store data. Important to note that a graph database has: Native graph storage – the engine is built to handle graph data Native graph processing capability, including index-free adjacency to facilitate traversals

Use Cases For Graph Databases Social – of course Recommendation systems (a logical extension from the social graph, or stand-alone – find all customers who bought a book that X customers liked., then find all books similar to that one, etc.) Managing interconnected datasets: Networks, Organization Hierarchies, ACL, in-game economy, etc. Geo-location and routing (think Waze or network routing.) Use-cases for migrating from RDBMS: Problems with JOIN performance Continually evolving dataset or open-ended business requirements The domain is naturally designed for graph representation

Meet the Players For comparison – MongoDB has a score of 330.47, Cassandra 124.21

Databases vs Frameworks Real-time queries Smaller datasets Standard NoSQL features (scaling, HA, etc.) Offline/batch Larger datasets Relies on big data platform (usually Hadoop) Frameworks: Giraph – apache project, used by Facebook to power it’s graph search and process trillions of connections. GraphX – Integrated with Apache Spark, has a library of build in algorithms and ETL functionality. Doesn’t perform as well as Giraph. Franus (from the same team as Titan) GraphLab – open source graph toolkit.

Querying and Traversal

(a) –[:FRIEND]-> (b) Cypher (Neo4j) a b FRIEND (a) –[:FRIEND]-> (b)

SQL-Derivatives (OrientDB)

g.v(1).outE('friend').inV.name // Starting with vertex 1 // find outgoing edges ‘friend’, // follow to the next vertex, // and return the property ‘name’. Gremlin is the graph traversal language of Apache TinkerPop, which in turn a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).

Scaling Graphs is Hard Most graph partitioning algorithms fall into the N—Hard category, which is a set of problems that are at least as hard as the hardest problem in NP. Some specialized graph partitioning algorithms have NP-Complete complexity. So unless P=NP, graph partitioning solutions will continue to rely on approximations and various statistical approaches.

Clustering Architecture Neo4J Clustering Architecture

Polyglot Persistence To the Rescue