Download presentation
Presentation is loading. Please wait.
Published byPaulina Avice Tyler Modified over 9 years ago
1
Survey of Graph Database Models Byoung Ju Yang 2011. 04. 01. IDS Lab., Seoul National University
2
Copyright 2008 by CEBT Table of contents Survey of Graph Database Models Renzo Angles, Alaudio Gutierrez ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008) Data structures, Query languages, and Integrity constraints 1. Introduction 2. Graph Data Modeling 3. Graph Database Models (~2002) The latest Graph Database Models Neo4j, FlockDB Blueprint Sharding 2
3
Copyright 2008 by CEBT 1. Introduction 3
4
Copyright 2008 by CEBT 2-1. What is a Graph Data Model? Data Structure(Schema) Represented by graph, or by data structure generalizing the notion of graph(hypergraph) - (un)labeled, (un)directed Separation between schema and data in most cases. Data Manipulation (Query languages) Expressed by graph transformations, or by operations whose main primitives are on graph features like paths, neighborhoods, subgraphs, graph patterns, connectivity, and graph statistics. Integrity constraints Enforce graph data consistency 4
5
Copyright 2008 by CEBT 2-2. Why a Graph Data Model? It allows for a more natural modeling of data Being able to keep all the information about an entity in a single node and showing related information by arcs connected to it. Queries can refer directly to this graph structure Such as finding shortest paths, determining certain subgraphs, and so forth. For implementation, graph databases may provide special graph storage structures and efficient graph algorithms for realizing specific operations. 5
6
Copyright 2008 by CEBT 2-3. Comparison with other DB Models Physical DB Models Hierarchical(1976), network(1976) models Lack a good abstraction level Relational DB Models Introduced a separation btw physical and logical levels Landmark development (mathematical foundation) Geared toward simple record-type data (schema is known) Not easy to integrate different schemas Query language cannot explore the underlying graph of relationships among the data (path, neighborhoods, patterns) 6
7
Copyright 2008 by CEBT 2-3. Comparison with other DB Models Semantic DB Models DB designer can represent objects and their relations in a natural and clear manner by using high-level abstraction concepts (E-R) Relevant to graph DB (graph-like structures) Object-oriented DB Models For data-intensive domains (knowledge bases, eng. applications) Permit much richer structures but still require predefined schema Related to graph DB (use graph structures in definitions) Semi-structured DB Models Irregular, implicit, and partial structures 7
8
Copyright 2008 by CEBT 2-4. Motivations and Applications Motivations Real-life App. where component interconnectivity is a key feature Applications Classical applications Complex networks - Social networks (people, groups) - Information networks (citation, word thesaurus) - Technological networks (spatial and geographical) - Biological networks (genomics) 8
9
Copyright 2008 by CEBT 3-1. Brief historical overview 9
10
Copyright 2008 by CEBT 3-2. Data Structures Hypernode Simple flat graph is not good at presenting information to user Hypernode provides inherent support (nested graphs) Hypergraph Generalization of a graph 2-uniform hypergraph is a graph 10 Person2Sang 1 name Person3Yong chin name Person1Young key name Person2Sang 1 Person3Yong chin Person1Young key name
11
Copyright 2008 by CEBT 3-3. Integrity Constraints Schema-instance consistency The instance should contain only concrete entities and relations from entity types and relations that were defined in the schema Schema-instance separation In most models there is a separation An exception is the hypernode (dynamic DB) Concentrated in the creation of consistent instances and the correct identification and reference of entities. 11
12
Copyright 2008 by CEBT 3-4. Query and Manipulation Languages There is substantial work focused on query languages, the problem of querying graphs, the visual presentation of results, and graphical query languages Some graph-oriented object models regard database transformations as graph transformations based on graph-pattern matching GOOD, GOAL, etc. 12
13
Copyright 2008 by CEBT 3. Summary 13
14
Copyright 2008 by CEBT NoSQL DataBases 14 Schema-less Shared nothing architecture Each server uses only its own local storage (faster) Elasticity Able to add servers without downtime Sharding Asynchronous replication BASE instead of ACID
15
Copyright 2008 by CEBT NoSQL Database Models 15
16
Copyright 2008 by CEBT Graph Database Models 16 Scalability ACID vs. BASE Complexity Relational - no redundancy or information loss (normalization) powerful SQL, optimization by RDBMS - performance problem in deep queries (many joins) no schema evolution, etc Graph – property graph model
17
Copyright 2008 by CEBT The latest Graph Database Models 17 AllegroGraph RDFStore HyperGraphDB InfoGrid Neo4j FlockDB Sones Virtuoso
18
Copyright 2008 by CEBT The latest Graph Database Models 18 License Distribution The only one truly distributed solution is HyperGraphDB Indexing Neo4j, indexing is not default behavior (index by Lucene, Solr) Storage system General vs. Special HyperGraphDB uses Berkeley DB APIs Most of them provide java and web APIs
19
Copyright 2008 by CEBT Neo4j 19 Full ACID-transaction compliant graph DB written in java High performance Handles several billion nodes, relationships and properties 1~2 million traversal / second - constant time (independent of total size) Example code Node creation Find friend
20
Copyright 2008 by CEBT Neo4j 20 Example code Traversal Indexing
21
Copyright 2008 by CEBT Neo4j 21
22
Copyright 2008 by CEBT FlockDB 22 Goals High rate of add/update/remove operations Complex set arithmetic queries Paging through query result sets containing millions of entries Ability to ‘archive’ and later restore archived edges Horizontal scaling including replication Non-goals Multi-hop queries (or graph-walking queries) Automatic shard migrations Characteristics Optimized for very large adjacency lists (no traversal)
23
Copyright 2008 by CEBT FlockDB - Twitter 23 Previous models (could not have both) Relational tables – handling write operations Key-value storage – paging through giant result sets Implementation goals Write the simplest possible thing that could work Use off-the-shelf MySQL as the storage engine Allow horizontal partitioning Allow write operations to arrive out of order or be processed more than one. (allow redundant work rather than lost work) Twitter (April 2010) More than 13 B edges, 20k writes/second, 100k reads/second
24
Copyright 2008 by CEBT FlockDB - Twitter 24 Stores graphs as sets of edges Primary key (a compound key of the source ID, state, and position) When an adge is deleted, the row is just marked ‘removed’ without deleting from MySQL Keep only a compound primary key and a secondary index for each row, and answer all queries from a single index.
25
Copyright 2008 by CEBT Sharding in Graph DB 25 Especially hard in graph DB due to traversal Unless we store the entire graph on a single machine, we are forced to query across machine boundaries (expensive) Neo4j provides master/slave structure (still has limit) FlockDB(twitter) does not consider (interested in 1-level relations)
26
Copyright 2008 by CEBT How to shard? 26 A proposal: gravity Localizing data leads to greater performance (like cache) Shard graph data based on gravity
27
Copyright 2008 by CEBT Blueprints 27 A collection of interfaces, etc for the property graph DB model Analogous to the JDBC, but for graph DB Provides a common set of interfaces to allow developers to plug- and-play their graph DB backend. (Pipes, Gremlin, Rexster)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.