Download presentation
Presentation is loading. Please wait.
1
Storage Engine for Semantic Web
2
Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results and lessons from – R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001.
3
Typical E-Commerce Data Characteristics Nearly 2 Million components More than 2000 leaf-level categories Large number of Attributes (5000) An Experimental E-marketplace for Computer components Constantly evolving schema Sparsely populated data (about 50-100 attributes/component)
4
Alternative Physical Representations Horizontal – One N-ary relation Binary – N 2-ary relations Vertical – One 3-ary relation
5
Conventional horizontal representation (n-ary relation) NameMonitorHeightRechargeOutputplaybackSmooth scanProgressive Scan PAN DVD-L757 inch-Built-inDigital--- KLH DVD221-3.75-S-Video--No SONY S-7000------- SONY S-560D----Cinema SoundYes- …………………… DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns) Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index Hard to load/update Schema evolution is expensive Querying is straightforward
6
Binary Representation (N 2-ary relations) Dense representation Manageability is hard because of large number of tables Schema evolution expensive Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87] Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99] Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99] Val 7 inch Name PAN DVD-L75 Monitor ValName KLH DVD221 Height 3.75 ValName PAN DVD-L75 Output Digital S-VideoKLH DVD221
7
Vertical representation (One 3-ary relation) Oid (object identifier)Key (attribute name)Val (attribute value) Objects can have large number of attributes Handles sparseness well Schema evolution is easy OidKeyVal 0‘Name’‘PAN DVD- L75’ 0‘Monitor’‘7 inch’ 0‘Recharge’‘Built-in’ 0‘Output’‘Digital’ 1‘Name’‘KLH DVD221’ 1‘Height’‘3.75’ 1‘Output’‘S-Video’ 1‘Progressiv e Scan’ ‘No’ 2‘Name’‘SONY S-7000’ ……… Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK 99]
8
Querying over Vertical Representation is Complex Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’ Becomes quite complex: SELECT v1.Val FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid Writing applications becomes much harder. What can we do ?
9
Solution Provide horizontal view of the vertical table Translation layer automatically maps operations on H to operations on V …Attrk…Attr2Attr1 Query Mapping Layer ValKeyOid Horizontal view (H) Vertical table (V)
10
Transformation Algebra Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation. Two key operators: – v2h ( ) – h2v ( )
11
Sample Algebraic Transforms v2h ( Operation – Convert from vertical to horizontal k (V) = [ Oid (V)] [ i=1,k Oid,Val ( Key=‘Ai’ (V))] h2V ( Operation – Convert from horizontal to vertical k (H) = i=1,k Oid,’Ai’Ai ( Ai ‘ ’ (V))] i=1,k Oid,’Ai’Ai ( i=1,k Ai=‘ ’ (V)) Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively Complete transforms in VLDB-2001 Paper
12
From the Algebra to SQL Equivalent SQL transforms for algebraic transforms – Select, Project – Joins (self, two verticals, a horizontal and a vertical) – Cartesian Product – Union, Intersection, Set difference – Aggregation Extend DDL to provide the Horizontal View CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable USING COLUMNS (Attr1, Attr2, … Attrk, …)
13
Alternative Implementation Strategies VerticalSQL – Uses only SQL-92 level capabilities VerticalUDF – Exploits User Defined Functions and Table Functions to provide a direct implementation Binary (hand-coded queries) – 2-ary representation with one relation per attribute (using only SQL-92 transforms)
14
Data Organization Matters: Clustering by Key significantly outperforms by Oid Data Organization Matters: Clustering by Key significantly outperforms by Oid density = 10%, 1000 cols x 20K rows 0 5 10 15 20 25 0.1%1%5% Join selectivity Execution time (seconds) VerticalSQL_oid VerticalSQL_key Join
15
Projection of 10 columns VerticalSQL comparable to Binary and outperforms Horizontal 0 10 20 30 40 50 60 200x100K400x50K800x25K1000x20K Table (#colsx #rows) Execution time (seconds) density = 10% HorizontalSQL VerticalSQL Binary
16
VerticalUDF is the best approach 0 10 20 30 200x100K400x50K800x25K1000x20K Table (#cols x #rows) Execution time (seconds) density = 10% VerticalUDF VerticalSQL Binary Projection of 10 columns
17
Summary + - +-Flexibility ++Manageability Vertical (w/ Mapping) Horizontal - - Binary (w/ Mapping) + Performance Querying+ + +
18
Remarks Lessons of this study directly apply to building storage engine for semantics webs Performance of vertical representation can be further improved by: – Enhanced table functions – First class treatment of table functions – Native support for v2h and h2v operations – Partial indices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.