Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results.

Storage Engine for Semantic Web

Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results and lessons from – R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001.

Typical E-Commerce Data Characteristics Nearly 2 Million components More than 2000 leaf-level categories Large number of Attributes (5000) An Experimental E-marketplace for Computer components Constantly evolving schema Sparsely populated data (about 50-100 attributes/component)

Alternative Physical Representations Horizontal – One N-ary relation Binary – N 2-ary relations Vertical – One 3-ary relation

Conventional horizontal representation (n-ary relation) NameMonitorHeightRechargeOutputplaybackSmooth scanProgressive Scan PAN DVD-L757 inch-Built-inDigital--- KLH DVD221-3.75-S-Video--No SONY S-7000------- SONY S-560D----Cinema SoundYes- ……………………  DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns)  Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index  Hard to load/update  Schema evolution is expensive  Querying is straightforward

Binary Representation (N 2-ary relations)  Dense representation  Manageability is hard because of large number of tables  Schema evolution expensive Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87] Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99] Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99] Val 7 inch Name PAN DVD-L75 Monitor ValName KLH DVD221 Height 3.75 ValName PAN DVD-L75 Output Digital S-VideoKLH DVD221

Vertical representation (One 3-ary relation) Oid (object identifier)Key (attribute name)Val (attribute value)  Objects can have large number of attributes  Handles sparseness well  Schema evolution is easy OidKeyVal 0‘Name’‘PAN DVD- L75’ 0‘Monitor’‘7 inch’ 0‘Recharge’‘Built-in’ 0‘Output’‘Digital’ 1‘Name’‘KLH DVD221’ 1‘Height’‘3.75’ 1‘Output’‘S-Video’ 1‘Progressiv e Scan’ ‘No’ 2‘Name’‘SONY S-7000’ ……… Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK 99]

Querying over Vertical Representation is Complex Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’ Becomes quite complex: SELECT v1.Val FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid Writing applications becomes much harder. What can we do ?

Solution Provide horizontal view of the vertical table Translation layer automatically maps operations on H to operations on V …Attrk…Attr2Attr1 Query Mapping Layer ValKeyOid Horizontal view (H) Vertical table (V)

Transformation Algebra Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation. Two key operators: – v2h (  ) – h2v (  )

Sample Algebraic Transforms v2h (  Operation – Convert from vertical to horizontal  k (V) = [  Oid (V)]   [  i=1,k  Oid,Val (  Key=‘Ai’ (V))] h2V (  Operation – Convert from horizontal to vertical  k (H) =    i=1,k  Oid,’Ai’Ai (  Ai  ‘  ’ (V))]    i=1,k  Oid,’Ai’Ai (   i=1,k  Ai=‘  ’ (V)) Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively Complete transforms in VLDB-2001 Paper

From the Algebra to SQL Equivalent SQL transforms for algebraic transforms – Select, Project – Joins (self, two verticals, a horizontal and a vertical) – Cartesian Product – Union, Intersection, Set difference – Aggregation Extend DDL to provide the Horizontal View CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable USING COLUMNS (Attr1, Attr2, … Attrk, …)

Alternative Implementation Strategies VerticalSQL – Uses only SQL-92 level capabilities VerticalUDF – Exploits User Defined Functions and Table Functions to provide a direct implementation Binary (hand-coded queries) – 2-ary representation with one relation per attribute (using only SQL-92 transforms)

Data Organization Matters: Clustering by Key significantly outperforms by Oid Data Organization Matters: Clustering by Key significantly outperforms by Oid density = 10%, 1000 cols x 20K rows 0 5 10 15 20 25 0.1%1%5% Join selectivity Execution time (seconds) VerticalSQL_oid VerticalSQL_key Join

Projection of 10 columns VerticalSQL comparable to Binary and outperforms Horizontal 0 10 20 30 40 50 60 200x100K400x50K800x25K1000x20K Table (#colsx #rows) Execution time (seconds) density = 10% HorizontalSQL VerticalSQL Binary

VerticalUDF is the best approach 0 10 20 30 200x100K400x50K800x25K1000x20K Table (#cols x #rows) Execution time (seconds) density = 10% VerticalUDF VerticalSQL Binary Projection of 10 columns

Summary + - +-Flexibility ++Manageability Vertical (w/ Mapping) Horizontal - - Binary (w/ Mapping) + Performance Querying+ + +

Remarks Lessons of this study directly apply to building storage engine for semantics webs Performance of vertical representation can be further improved by: – Enhanced table functions – First class treatment of table functions – Native support for v2h and h2v operations – Partial indices

Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results.

Similar presentations

Presentation on theme: "Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results.

Similar presentations

Presentation on theme: "Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results."— Presentation transcript:

Similar presentations

About project

Feedback