Download presentation
Presentation is loading. Please wait.
Published byDana Blankenship Modified over 9 years ago
1
Aules d’Empresa 2011 Aules d’empresa 2011 DEX
2
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Contents Graph database Motivation DEX Experiments
3
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Graph database What is a graph database? Data and schema are represented by graphs. Nodes, edges, and properties. Data manipulation is expressed as graph operations. Integrity constraints enforce graph consistency.
4
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Motivation Trends in current data sets: A higher degree of connectivity among entities. A higher degree of complexity of data models. Decentralization of data generation. Users provide contents. Requirements: Queries with different flavors: Structural queries (not based on the schema). Link analysis. Manage unstructured data. Flexible schemas.
5
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Scenarios Social networks MySpace, Facebook, Flickr … Information networks Bibliographic databases: DBLP, Scopus … On-line encyclopedias: Wikipedia … Technological networks Electric power grids, airline routes, telephone networks … Biological networks Genomics, chemical structures …
6
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Why not RDBMS? Classical relational model Inefficient for unstructured data or flexible schemas Prefixed schema, based on relations (tables) Inefficient for structural queries Intensive use of join operations
7
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011, a graph database DEX is a programming library which allows to manage a graph database. Focuses on: Very large datasets. High performance query processing.
8
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Basic concepts Persistent and temporary graph management programming library. Data model: Typed and attributed directed multigraph. Node and edge instances belong to a type (label). Node and edge instances have attribute values. Edge can be directed or undirected. Multiple edges between two nodes. Type of edges: Materialized: directed and undirected. Virtual: constrained by the values of two attributes (foreign keys) Just for navigation
9
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 A graph model
10
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Software architecture
11
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Software architecture Java library: jdex.jar public API Native library Linux: libjdex.so Windows: jdex.dll System requirements: Java Runtime Environment, v1.5 or higher. Operative system: Windows – 32 bits Linux – 32 and 64 bits
12
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Application architecture Presentation Network Application Logic Data Desktop application DEX Data Sources Graphs Java Swing Application Browser HTML + Javascript DEX Graphs Data Sources Query Servlet INTERNET Web application API DEX Load and Query API DEX
13
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Experiments Five categories: Bulk load performance. Core operations performance and memory usage Scalability. Comparison with other approaches. Relational (MySQL) and OIM. Query performance analysis Different datasets: Wikipedia. IMDb, the Internet Movie Database. XMark, a standard and scalable benchmark for XML. LUBM, a benchmark to evaluate the performance of RDF repositories. R-MAT, a synthetic scale-free network.
14
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Load performance IMDbWikipediaXMarkLUBM DbGraph (GB)0.409.6912.196.13 Ratio DbGraph/raw data2.643.864.381.14 Objects (millions)14.65486.45343.77215.06 Time (hours)0.0826.552.614.22 Speed (objs / sec)5034948853652110074 Memory (%) Bitmaps Maps 39.58% 60.42% 39.12% 60.88% 33.32% 66.68% 34.11% 65.89% Single CPU with 4096 KB of cache, 2 GB of RAM and 80 GB of disk. Operating system: Linux Debian etch 4.0 DEX buffer pool: 1.5 GB max.
15
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Operations performance and memory usage QueryTime (s)Results Bitmaps 64K pages Operations Maps 64K pages Operations Q1 – count0.0029169864291141 Q2 – scan3.200016986429661698643043 Q3 – select0.8000458329420458329583 Q4 – projection33.20004583294204583295215618333175 Q5 – combine0.00501222386 Q6 – explode0.0057462246318929 Q7 – values0.0110253122537781 Benchmark: Wikipedia with more than 200 million nodes and edges
16
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Scalability XMark over 5 different scale factors ranging from 0.1 (110MB) to 25 (2.78GB) SF=01SF=1SF=5SF=10SF=25 Graph size (MB)63.9546.32596.95093.912480.4 I/O (MB)0.0 40.5185.7890.0 Objects (millions)1.3813.7168.75137.47343.77 Load (secs.)16.84172.15928.61934.145121.17 Optimize (secs.)1.7422.54243.54807.444292.38 Total (secs.)18.58194.69 2741.589413.55
17
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 R-MAT scalability ScaleNodesEdgesLoad (sec) Edges/s ec GBQ1%visitedTraversa ls Trav/sec 2529M268M437261398.78111465.8285529M361K 2658M536M949956518.68213134851058M337K 27116M1073M2033652800.0541.06888.9085.012118M307K 28230M2147M5414639660.988314323.9084.624236M295K 29457M4294M22520219071.62162
18
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Comparison with Other Approaches Comparison with a relational database (MySQL) and with an Oriented Incidence Matrix QueryMySQLOIM DEX Q1 – count20.38017.3470.001 Q2 – scan32.760174.6353.137 Q3 – select7.3405.4300.837 Q4 – projection17.34043.69933.192 Q5 – combine0.7402.6120.005 Q6 – explode0.070202.0700.006 Q7 – values12.128020.7740.011 Q8 – hub> 3 hours 624.681 MySQLOIM DEX Data (GB)27.36549.69 Ratio overhead10.921.513.96 Load time (secs)528911745395579
19
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Comparison with Neo4j Neo4jDEX4.0 Size (GB)8216.98 Load time (h)8.222.25 Q1 (s)32230.00118.93 Q2 (s)24832.00205.97 Q3 (s)2045.0010.68 Q4 (s)34882.00146.77 Q5 (s)32539.00141.06 Q6 (s)> 1week7518.06 Query 1: max-outdegree + SPT Query 2: paper recommender (2-hops) Query 3: pattern matching Query 4: for each language: number of papers and images Query 5: for each paper: materialize number of images Query 6: delete papers with no images
20
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with a RDBMS Datasets: D1: Synthetic data, generated from R-MAT Scale factor = 16 (524K edges) D2: Synthetic data, generated from R-MAT Scale factor = 18 (2M edges) D1 and D2 both just nodes and edges, no attributes. R-MAT generates scale-free networks. Queries: Q1: 3-hops from a given node.
21
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with RDBMS Test: Execute Q1 for 5 specific nodes. These query nodes have a significant number of out-going edges. Scale factor 16: about some tens Scale factor 18: about some hundreds Results: Scale factor 16: reached about 160K nodes Scale factor 18: reached about 600K nodes
22
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Another comparison with RDBMS Schema: CREATE TABLE `edges` ( `src` int(11) NOT NULL, `dst` int(11) NOT NULL, INDEX `srcI` (`src`) USING BTREE, INDEX `dstI` (`dst`) USING BTREE ) ENGINE=InnoDB; Query: SELECT DISTINCT c.dst FROM edges as a, edges as b, edges as c WHERE (a.dst=b.src AND b.dst=c.src AND a.src=node);
23
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Results Platform test MacBook 2.4GHz Intel Core 2 Duo (Mac OS X 10.6) Up to 1GB memory for MySQL buffer pool. Results Test T1MySQLDEX Dataset D11m 57s9s Dataset D213m 36s34s
24
Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Any question? DAMA Group Web Site: www.dama.upc.eduwww.dama.upc.edu Sparsity Web Site: www.sparsity-technologies.comwww.sparsity-technologies.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.