Aules d’Empresa 2011 Aules d’empresa 2011 DEX. Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Contents Graph database Motivation DEX.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Leveraging Commercial Graph DB Technologies in Open Source and Polyglot Application Environments Brian Clark, VP Product Management Objectivity, Inc.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Chapter 10: Designing Databases
Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
Parallel Universe Fast Parallel MySQL Server. Target Markets Database Servers Data Warehouse Servers Data Analytics Servers.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
1 NETE4631 Cloud deployment models and migration Lecture Notes #4.
Copyright GeneGo CONFIDENTIAL »« MetaCore TM (System requirements and installation) Systems Biology for Drug Discovery.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2007 all rights.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
Michael Povolotsky CMSC491s/691s. What is Virtuoso? Virtuoso, known as Virtuoso Universal Server, is a multi-protocol RDBMS Includes an object-relational.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Web Server Hardware and Software
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
1 SCHEMALESS APPROACH OF MAPPING XML DOCUMENTS INTO RELATIONAL DATABASE Ibrahim Dweib, Ayman Awadi, Seif Elduola Fath Elrhman, Joan Lu CIT 2008 Sydney,
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Cross Platform Mobile Backend with Mobile Services James
Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
Microsoft Azure Virtual Machines. Networking Compute Storage Virtual Machine Operating System Applications Data & Access Runtime Provision & Manage.
Fundamentals of Database Chapter 7 Database Technologies.
Goodbye rows and tables, hello documents and collections.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Cooperative Caching for Efficient Data Access in Disruption Tolerant Networks.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Reaching out… through IT R Document Store - Pilot 001 Presented to.
Resource Addressable Network (RAN) An Adaptive Peer-to-Peer Substrate for Internet-Scale Service Platforms RAN Concept & Design  Adaptive, self-organizing,
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
The Digital Archive Database Tool Shih Lin Computing Center Academia Sinica.
Bibex: Bibliographic Exploration Bibliographic Exploration Raquel Pau 25 Gen 2011.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Sparsity Technologies & DAMA-UPC Aules d’empresa 2011 DEX Use Cases.
Session 1 Module 1: Introduction to Data Integrity
An Open Source GIS Architecture Connected and Linked Data
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models.
Aules d’Empresa 2011 Aules d’empresa 2011 Hands-on course.
SCHOOL OF ENGINEERING AND ADVANCED TECHNOLOGY Engineering Project Routing in Small-World Networks.
Cloud Computing: Pay-per-Use for On-Demand Scalability Developing Cloud Computing Applications with Open Source Technologies Shlomo Swidler.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
SmartCode Brad Argue INLS /19/2001.
Databases (CS507) CHAPTER 2.
Databases and DBMSs Todd S. Bacastow January 2005.
NoSQL: Graph Databases
NOSQL databases and Big Data Storage Systems
CS 174: Server-Side Web Programming February 12 Class Meeting
Chapter 2 Database Environment Pearson Education © 2009.
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Chapter 3 Hardware and software 1.
Chapter 3 Hardware and software 1.
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Aules d’Empresa 2011 Aules d’empresa 2011 DEX

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Contents Graph database Motivation DEX Experiments

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Graph database What is a graph database? Data and schema are represented by graphs. Nodes, edges, and properties. Data manipulation is expressed as graph operations. Integrity constraints enforce graph consistency.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Motivation Trends in current data sets: A higher degree of connectivity among entities. A higher degree of complexity of data models. Decentralization of data generation. Users provide contents. Requirements: Queries with different flavors: Structural queries (not based on the schema). Link analysis. Manage unstructured data. Flexible schemas.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Scenarios Social networks MySpace, Facebook, Flickr … Information networks Bibliographic databases: DBLP, Scopus … On-line encyclopedias: Wikipedia … Technological networks Electric power grids, airline routes, telephone networks … Biological networks Genomics, chemical structures …

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Why not RDBMS? Classical relational model Inefficient for unstructured data or flexible schemas Prefixed schema, based on relations (tables) Inefficient for structural queries Intensive use of join operations

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011, a graph database DEX is a programming library which allows to manage a graph database. Focuses on: Very large datasets. High performance query processing.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Basic concepts Persistent and temporary graph management programming library. Data model: Typed and attributed directed multigraph. Node and edge instances belong to a type (label). Node and edge instances have attribute values. Edge can be directed or undirected. Multiple edges between two nodes. Type of edges: Materialized: directed and undirected. Virtual: constrained by the values of two attributes (foreign keys) Just for navigation

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 A graph model

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Software architecture

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Software architecture Java library: jdex.jar  public API Native library Linux: libjdex.so Windows: jdex.dll System requirements: Java Runtime Environment, v1.5 or higher. Operative system: Windows – 32 bits Linux – 32 and 64 bits

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Application architecture Presentation Network Application Logic Data Desktop application DEX Data Sources Graphs Java Swing Application Browser HTML + Javascript DEX Graphs Data Sources Query Servlet INTERNET Web application API DEX Load and Query API DEX

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Experiments Five categories: Bulk load performance. Core operations performance and memory usage Scalability. Comparison with other approaches. Relational (MySQL) and OIM. Query performance analysis Different datasets: Wikipedia. IMDb, the Internet Movie Database. XMark, a standard and scalable benchmark for XML. LUBM, a benchmark to evaluate the performance of RDF repositories. R-MAT, a synthetic scale-free network.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Load performance IMDbWikipediaXMarkLUBM DbGraph (GB) Ratio DbGraph/raw data Objects (millions) Time (hours) Speed (objs / sec) Memory (%) Bitmaps Maps 39.58% 60.42% 39.12% 60.88% 33.32% 66.68% 34.11% 65.89% Single CPU with 4096 KB of cache, 2 GB of RAM and 80 GB of disk. Operating system: Linux Debian etch 4.0 DEX buffer pool: 1.5 GB max.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Operations performance and memory usage QueryTime (s)Results Bitmaps 64K pages Operations Maps 64K pages Operations Q1 – count Q2 – scan Q3 – select Q4 – projection Q5 – combine Q6 – explode Q7 – values Benchmark: Wikipedia with more than 200 million nodes and edges

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Scalability XMark over 5 different scale factors ranging from 0.1 (110MB) to 25 (2.78GB) SF=01SF=1SF=5SF=10SF=25 Graph size (MB) I/O (MB) Objects (millions) Load (secs.) Optimize (secs.) Total (secs.)

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 R-MAT scalability ScaleNodesEdgesLoad (sec) Edges/s ec GBQ1%visitedTraversa ls Trav/sec 2529M268M M361K 2658M536M M337K 27116M1073M M307K 28230M2147M M295K 29457M4294M

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Comparison with Other Approaches Comparison with a relational database (MySQL) and with an Oriented Incidence Matrix QueryMySQLOIM DEX Q1 – count Q2 – scan Q3 – select Q4 – projection Q5 – combine Q6 – explode Q7 – values Q8 – hub> 3 hours MySQLOIM DEX Data (GB) Ratio overhead Load time (secs)

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Comparison with Neo4j Neo4jDEX4.0 Size (GB) Load time (h) Q1 (s) Q2 (s) Q3 (s) Q4 (s) Q5 (s) Q6 (s)> 1week Query 1: max-outdegree + SPT Query 2: paper recommender (2-hops) Query 3: pattern matching Query 4: for each language: number of papers and images Query 5: for each paper: materialize number of images Query 6: delete papers with no images

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with a RDBMS Datasets: D1: Synthetic data, generated from R-MAT Scale factor = 16 (524K edges) D2: Synthetic data, generated from R-MAT Scale factor = 18 (2M edges) D1 and D2 both just nodes and edges, no attributes. R-MAT generates scale-free networks. Queries: Q1: 3-hops from a given node.

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with RDBMS Test: Execute Q1 for 5 specific nodes. These query nodes have a significant number of out-going edges. Scale factor 16: about some tens Scale factor 18: about some hundreds Results: Scale factor 16: reached about 160K nodes Scale factor 18: reached about 600K nodes

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with RDBMS Schema: CREATE TABLE `edges` ( `src` int(11) NOT NULL, `dst` int(11) NOT NULL, INDEX `srcI` (`src`) USING BTREE, INDEX `dstI` (`dst`) USING BTREE ) ENGINE=InnoDB; Query: SELECT DISTINCT c.dst FROM edges as a, edges as b, edges as c WHERE (a.dst=b.src AND b.dst=c.src AND a.src=node);

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Results Platform test MacBook 2.4GHz Intel Core 2 Duo (Mac OS X 10.6) Up to 1GB memory for MySQL buffer pool. Results Test T1MySQLDEX Dataset D11m 57s9s Dataset D213m 36s34s

Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Any question? DAMA Group Web Site: Sparsity Web Site: