Survey of Graph Database Models Byoung Ju Yang 2011. 04. 01. IDS Lab., Seoul National University.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Introduction to Databases
Managing Data Resources
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
File Systems and Databases
Organizing Data & Information
Introduction to Database Management
BUSINESS DRIVEN TECHNOLOGY
Evolution in Database Models
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
NoSQL Database.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Database Management COP4540, SCS, FIU An Introduction to database system.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
Database Technical Session By: Prof. Adarsh Patel.
Database Design - Lecture 2
2005 SPRING CSMUIntroduction to Information Management1 Organizing Data John Sum Institute of Technology Management National Chung Hsing University.
CS 474 Database Design and Application Terminology Jan 11, 2000.
Goodbye rows and tables, hello documents and collections.
CODD’s 12 RULES OF RELATIONAL DATABASE
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Storing Organizational Information - Databases
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
XML and Database.
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
NoSQL: Graph Databases
Database Systems: Design, Implementation, and Management Tenth Edition
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
CS4222 Principles of Database System
Intro to MIS – MGS351 Databases and Data Warehouses
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
CS 174: Server-Side Web Programming February 12 Class Meeting
File Systems and Databases
Data Model.
Presentation transcript:

Survey of Graph Database Models Byoung Ju Yang IDS Lab., Seoul National University

Copyright  2008 by CEBT Table of contents  Survey of Graph Database Models Renzo Angles, Alaudio Gutierrez ACM Computing Surveys, Vol. 40, No. 1, Article 1 (2008) Data structures, Query languages, and Integrity constraints 1. Introduction 2. Graph Data Modeling 3. Graph Database Models (~2002)  The latest Graph Database Models Neo4j, FlockDB Blueprint Sharding 2

Copyright  2008 by CEBT 1. Introduction 3

Copyright  2008 by CEBT 2-1. What is a Graph Data Model?  Data Structure(Schema) Represented by graph, or by data structure generalizing the notion of graph(hypergraph) - (un)labeled, (un)directed Separation between schema and data in most cases.  Data Manipulation (Query languages) Expressed by graph transformations, or by operations whose main primitives are on graph features like paths, neighborhoods, subgraphs, graph patterns, connectivity, and graph statistics.  Integrity constraints Enforce graph data consistency 4

Copyright  2008 by CEBT 2-2. Why a Graph Data Model?  It allows for a more natural modeling of data Being able to keep all the information about an entity in a single node and showing related information by arcs connected to it.  Queries can refer directly to this graph structure Such as finding shortest paths, determining certain subgraphs, and so forth.  For implementation, graph databases may provide special graph storage structures and efficient graph algorithms for realizing specific operations. 5

Copyright  2008 by CEBT 2-3. Comparison with other DB Models  Physical DB Models Hierarchical(1976), network(1976) models Lack a good abstraction level  Relational DB Models Introduced a separation btw physical and logical levels Landmark development (mathematical foundation) Geared toward simple record-type data (schema is known) Not easy to integrate different schemas Query language cannot explore the underlying graph of relationships among the data (path, neighborhoods, patterns) 6

Copyright  2008 by CEBT 2-3. Comparison with other DB Models  Semantic DB Models DB designer can represent objects and their relations in a natural and clear manner by using high-level abstraction concepts (E-R) Relevant to graph DB (graph-like structures)  Object-oriented DB Models For data-intensive domains (knowledge bases, eng. applications) Permit much richer structures but still require predefined schema Related to graph DB (use graph structures in definitions)  Semi-structured DB Models Irregular, implicit, and partial structures 7

Copyright  2008 by CEBT 2-4. Motivations and Applications  Motivations Real-life App. where component interconnectivity is a key feature  Applications Classical applications Complex networks - Social networks (people, groups) - Information networks (citation, word thesaurus) - Technological networks (spatial and geographical) - Biological networks (genomics) 8

Copyright  2008 by CEBT 3-1. Brief historical overview 9

Copyright  2008 by CEBT 3-2. Data Structures  Hypernode Simple flat graph is not good at presenting information to user Hypernode provides inherent support (nested graphs)  Hypergraph Generalization of a graph 2-uniform hypergraph is a graph 10 Person2Sang 1 name Person3Yong chin name Person1Young key name Person2Sang 1 Person3Yong chin Person1Young key name

Copyright  2008 by CEBT 3-3. Integrity Constraints  Schema-instance consistency The instance should contain only concrete entities and relations from entity types and relations that were defined in the schema  Schema-instance separation In most models there is a separation An exception is the hypernode (dynamic DB)  Concentrated in the creation of consistent instances and the correct identification and reference of entities. 11

Copyright  2008 by CEBT 3-4. Query and Manipulation Languages  There is substantial work focused on query languages, the problem of querying graphs, the visual presentation of results, and graphical query languages  Some graph-oriented object models regard database transformations as graph transformations based on graph-pattern matching GOOD, GOAL, etc. 12

Copyright  2008 by CEBT 3. Summary 13

Copyright  2008 by CEBT NoSQL DataBases 14  Schema-less  Shared nothing architecture Each server uses only its own local storage (faster)  Elasticity Able to add servers without downtime  Sharding  Asynchronous replication  BASE instead of ACID

Copyright  2008 by CEBT NoSQL Database Models 15

Copyright  2008 by CEBT Graph Database Models 16  Scalability ACID vs. BASE  Complexity Relational - no redundancy or information loss (normalization) powerful SQL, optimization by RDBMS - performance problem in deep queries (many joins) no schema evolution, etc Graph – property graph model

Copyright  2008 by CEBT The latest Graph Database Models 17  AllegroGraph RDFStore  HyperGraphDB  InfoGrid  Neo4j  FlockDB  Sones  Virtuoso

Copyright  2008 by CEBT The latest Graph Database Models 18  License  Distribution The only one truly distributed solution is HyperGraphDB  Indexing Neo4j, indexing is not default behavior (index by Lucene, Solr)  Storage system General vs. Special HyperGraphDB uses Berkeley DB  APIs Most of them provide java and web APIs

Copyright  2008 by CEBT Neo4j 19  Full ACID-transaction compliant graph DB written in java  High performance Handles several billion nodes, relationships and properties 1~2 million traversal / second - constant time (independent of total size)  Example code Node creation Find friend

Copyright  2008 by CEBT Neo4j 20  Example code Traversal Indexing

Copyright  2008 by CEBT Neo4j 21

Copyright  2008 by CEBT FlockDB 22  Goals High rate of add/update/remove operations Complex set arithmetic queries Paging through query result sets containing millions of entries Ability to ‘archive’ and later restore archived edges Horizontal scaling including replication  Non-goals Multi-hop queries (or graph-walking queries) Automatic shard migrations  Characteristics Optimized for very large adjacency lists (no traversal)

Copyright  2008 by CEBT FlockDB - Twitter 23  Previous models (could not have both) Relational tables – handling write operations Key-value storage – paging through giant result sets  Implementation goals Write the simplest possible thing that could work Use off-the-shelf MySQL as the storage engine Allow horizontal partitioning Allow write operations to arrive out of order or be processed more than one. (allow redundant work rather than lost work)  Twitter (April 2010) More than 13 B edges, 20k writes/second, 100k reads/second

Copyright  2008 by CEBT FlockDB - Twitter 24  Stores graphs as sets of edges Primary key (a compound key of the source ID, state, and position) When an adge is deleted, the row is just marked ‘removed’ without deleting from MySQL Keep only a compound primary key and a secondary index for each row, and answer all queries from a single index.

Copyright  2008 by CEBT Sharding in Graph DB 25  Especially hard in graph DB due to traversal Unless we store the entire graph on a single machine, we are forced to query across machine boundaries (expensive) Neo4j provides master/slave structure (still has limit) FlockDB(twitter) does not consider (interested in 1-level relations)

Copyright  2008 by CEBT How to shard? 26  A proposal: gravity Localizing data leads to greater performance (like cache) Shard graph data based on gravity

Copyright  2008 by CEBT Blueprints 27  A collection of interfaces, etc for the property graph DB model Analogous to the JDBC, but for graph DB Provides a common set of interfaces to allow developers to plug- and-play their graph DB backend. (Pipes, Gremlin, Rexster)