Download presentation
Presentation is loading. Please wait.
Published byEmerald Adams Modified over 9 years ago
1
Storage and Retrieval of E-Commerce Data R. Agrawal, A. Somani, Y. Xu: VLDB-2001
2
Overview E-Commerce Data Characteristics Alternative Physical Representations Querying Mapping Layer Performance evaluation of various approaches Conclusions
3
Typical E-Commerce Data Characteristics Nearly 2 Million components More than 2000 leaf-level categories Large number of Attributes (5000) An Experimental E-marketplace for Computer components Constantly evolving schema Sparsely populated data (about 50-100 attributes/component)
4
Outline E-Commerce Data Characteristics Alternative Physical Representations Horizontal - One n-ary relation Binary - N 2-ary relations Vertical - One 3-ary relation Query Mapping Layer Performance evaluation Conclusion
5
Conventional horizontal representation (n-ary relation) NameMonitorHeightRechargeOutputplaybackSmooth scanProgressive Scan PAN DVD-L757 inch-Built-inDigital--- KLH DVD221-3.75-S-Video--No SONY S-7000------- SONY S-560D----Cinema SoundYes- …………………… DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns) Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index Hard to load/update Schema evolution is expensive Querying is straightforward
6
Binary Representation (N 2-ary relations) Dense representation Manageability is hard because of large number of tables Schema evolution expensive Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87] Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99] Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99] Val 7 inch Name PAN DVD-L75 Monitor ValName KLH DVD221 Height 3.75 ValName PAN DVD-L75 Output Digital S-VideoKLH DVD221
7
Vertical representation (One 3-ary relation) Oid (object identifier)Key (attribute name)Val (attribute value) Objects can have large number of attributes Handles sparseness well Schema evolution is easy OidKeyVal 0‘Name’‘PAN DVD- L75’ 0‘Monitor’‘7 inch’ 0‘Recharge’‘Built-in’ 0‘Output’‘Digital’ 1‘Name’‘KLH DVD221’ 1‘Height’‘3.75’ 1‘Output’‘S-Video’ 1‘Progressiv e Scan’ ‘No’ 2‘Name’‘SONY S-7000’ ……… Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK 99]
8
Querying over Vertical Representation Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’ Becomes quite complex: SELECT v1.Val FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid Writing applications becomes much harder. What can we do ?
9
Solution : Query Mapping on Vertical Representation Simplify querying of the data – Provide a horizontal view of the vertical table – Automatically transform queries on the horizontal view to the vertical table
10
Solution (Continued) Translation layer maps relational algebraic operations on H to operations on V …Attrk…Attr2Attr1 Query Mapping Layer ValKeyOid Horizontal view (H) Vertical table (V)
11
Can we define a translation algebra ? Standard database view mechanism does not work attribute values become column names (need higher order views) Can the vertical representation support fast querying ? What are the new requirements from the database engines? Key Issues
12
Outline E-Commerce data characteristics Alternative Approaches Query Mapping Layer Transformation Algebra Implementation Strategies Performance evaluation Conclusion
13
Transformation Algebra Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation. Two key operators: – v2h (Vertical to Horizontal) – h2v (Horizontal to Vertical)
14
Notation Select Project ( ) Join ( ) Left Outerjoin ( Right Outerjoin ( ) Union ( ) Intersection ( ) Difference (-) Aggregation ( ) Null Value ( )
15
Sample Algebraic Transforms v2h ( Operation – Convert from vertical to horizontal k (V) = [ Oid (V)] [ i=1,k Oid,Val ( Key=‘Ai’ (V))] h2V ( Operation – Convert from horizontal to vertical k (H) = i=1,k Oid,’Ai’Ai ( Ai ‘ ’ (V))] i=1,k Oid,’Ai’Ai ( i=1,k Ai=‘ ’ (V)) Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively Complete transforms in VLDB-2001 Paper
16
From the Algebra to SQL Equivalent SQL transforms for algebraic transforms – Select, Project – Joins (self, two verticals, a horizontal and a vertical) – Cartesian Product – Union, Intersection, Set difference – Aggregation Extend DDL to provide the Horizontal View Four implementation strategies to provide Horizontal View CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable USING COLUMNS (Attr1, Attr2, … Attrk, …)
17
Implementation Strategies on Vertical Representation VerticalSQL – Uses only SQL-92 level capabilities – Used XQGM code to represented parsed SQL queries VerticalUDF – Exploits User Defined Functions and Table Functions to provide a direct implementation
18
Alternative Implementation Strategies Binary (hand-coded queries) – 2-ary representation with one relation per attribute (using only SQL-92 transforms) SchemaSQL – Addresses a more general problem – Performed 2-3X worse than Vertical representation because of temporary tables
19
Transformation strategy: VerticalSQL Attr1Attr2……… Horizontal view SELECT Attr1, Attr2 FROM hview SELECT t7.Attr1, t7.Attr2 FROM ( SELECT t4.Oid, t4.Attr1, t6.Attr2 FROM (SELECT t0.Oid, t3.Attr1 FROM (SELECT DISTINCT t0.Oid FROM vtable AS t0 ) AS t1(Oid) LEFT OUTER JOIN (SELECT t2.Oid, t2.Val FROM vtable AS t2 WHERE t2.Key = ‘Attr1’ ) AS t3(Oid, Attr1) ON t1.Oid = t3.Oid ) AS t4(Oid, Attr1) LEFT OUTER JOIN (SELECT t5.Oid, t5.Val FROM vtable AS t5 WHERE t5.Key = ‘Attr2’ ) AS t6(Oid, Attr2) ON t4.Oid = t6.Oid ) AS t7(Oid, Attr1, Attr2) OidKeyVal Vertical table Query transformation
20
Transformation strategy: VerticalUDF Attr1Attr2……… Horizontal view SELECT Attr1, Attr2 FROM hview SELECT t1.Attr1, t1.Attr2 FROM vtable AS t0, TABLE(v2h(t0.Oid, t0.Key, t0.Val)) AS t1(Oid, Attr1, Attr2) WHERE t0.Key = ‘Attr1’ OR t0.Key = ‘Attr2’ OidKeyVal 0‘Attr1’… Vertical table Query transformation The v2h table function reads tuples of vertical table sorted on Oid and outputs a horizontal tuple for each oid Severely penalized because of lack of engine support
21
Transformation strategy: Binary Attr1Attr2……… Horizontal view SELECT Attr1, Attr2 FROM hview ValOid Binary Relations Query transformation ValOid ATTR1 ATTR2 Oid ALLPROD SELECT t2.Attr1, ATTR2.val FROM ( SELECT t1.Oid, ATTR1.Val FROM ( SELECT Oid FROM ALLPROD ) AS t1(Oid) LEFT OUTER JOIN ATTR1 ON t1.Oid = Attr1.Oid ) AS t2(Oid, Attr1) LEFT OUTER JOIN ATTR2 ON t2.Oid = ATTR2.Oid
22
Story So Far … E-Commerce Data – High-Arity, Sparse and Constantly Evolving Three Approaches But what about Performance ? + +Querying +-Flexibility ++Manageability Vertical (w/ Mapping) Horizontal - - Binary (w/ Mapping) +
23
Performance: Experimental setup 600 MHz dual-processor Intel Pentium machine 512 MB RAM Windows NT 4.0 Database IBM DB2 UDB 7.1 Two 30GB IDE Drives Buffer Pool Size – 50 MB All numbers reported are cold numbers
24
Experimental Setup (Continued) Parameters used for Synthetic Data – Number of columns – Number of rows – Non-null density – Selectivity of a predicate for a column For example, 200 X 100K & =10% implies 200 columns, 100K rows and non-null density = 10% Complete results in technical report
25
Clustering by Key outperforms clustering by Oid Clustering by Key outperforms clustering by Oid density = 10%, 1000 cols x 20K rows 0 5 10 15 20 25 0.1%1%5% Join selectivity Execution time (seconds) VerticalSQL_oid VerticalSQL_key Join
26
Projection of 10 columns VerticalSQL comparable to Binary and outperforms Horizontal 0 10 20 30 40 50 60 200x100K400x50K800x25K1000x20K Table (#colsx #rows) Execution time (seconds) density = 10% HorizontalSQL VerticalSQL Binary
27
VerticalUDF is the best approach 0 10 20 30 200x100K400x50K800x25K1000x20K Table (#cols x #rows) Execution time (seconds) density = 10% VerticalUDF VerticalSQL Binary Projection of 10 columns
28
Wish List from Database Engines – VerticalUDF approach showed need for – Enhanced table functions – First class treatment of table functions – Native support for v2h and h2v operations – Partial indices
29
Summary + - +-Flexibility ++Manageability Vertical (w/ Mapping) Horizontal - - Binary (w/ Mapping) + Performance Querying+ + +
30
Conclusions Vertical Representation attractive for dealing with E- Commerce Data High-arity, sparse and constantly evolving schema Get the best of both horizontal and vertical representations Translation layer => Querying as easy as horizontal representation Great flexibility and manageability Performance – VerticalUDF best approach – VerticalSQL comparable to Binary and outperforms Horizontal Full Report @ http://www.almaden.ibm.com/cs/quest/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.