Download presentation
Presentation is loading. Please wait.
Published byJohnathan Neal Modified over 8 years ago
1
Catalog Integration B2B electronics portal: 2000 categories, 200K datasheets Master CatalogNew Catalog After integration: Goal Use affinity information in new catalog. –Products in same category are similar. Accuracy boost depends on match between two categorizations. Problem Statement Given –master categorization M: categories C 1, C 2, …, C n set of documents in each category –new categorization N: categories S 1, S 2, …, S n set of documents in each category Standard Alg: Compute Pr(C i | d) Enhanced Alg: Compute Pr(C i | d, S) Enhanced Naïve Bayes classifier Use tuning set to determine w. –Defaults to standard Naïve Bayes if w = 0. Only affects classification of borderline documents. Searching with Numbers. Empirical Results Reflectivity If we get a close match on numbers, how likely is it that we have correctly matched attribute names? –Likelihood Non-reflectivity (of data) Let –D: dataset, n i : co-ordinates of point x i, –reflections(x i ): permutations of n i – (n i ): # of points within distance r of n i – (n i ): # of reflections within distance r of n i Non-overlapping attributes Non-reflective. –Memory: 64 - 512 Mb, Disk: 10 - 40 Gb Correlations or Clustering Low reflectivity. –Memory: 64 - 512 Mb, Disk: 10 - 100 Gb Database Technologies For Electronic Commerce Rakesh Agrawal, Ramakrishnan Srikant, Yirong Xu IBM Thinkpad 750 MHz Pentium 3, 196 MB DRAM, … Dell Computer 700 MHz Celeron, 256 MB SDRAM, … Catalog Database IBM Thinkpad (750 MHz, 196 MB) … Dell (700 MHz, 256 MB) 800 200 3 lb 800 200 R. Agrawal and R. Srikant, “Searching with Numbers”, W W W 2002 R. Agrawal and R. Srikant, “On Integrating Catalogs”, W W W 2001 eCommerce Applications Data stored in conventional way SELECT name, output FROM H Query Mapping Layer Query Parsing Transformation Pure SQL-92 Transform: SELECT V1.val, V2.val FROM V V1, V V2 WHERE V1.key = ‘name’ AND V2.key = ‘output’ AMD V1.oid = V2.oid Optimized Operator Implementation Vertical Table (V) Recommendations for Database Vendors: üPartial Indices üEnhanced Table Functions (TF) üFirst Class treatment of TF üNative Support for v2h operation Other Applications: Stores for XML, RDF, LDAP and Data Mining eCommerce Applications Horizontal View (HV) SELECT name, output FROM HV namemonitorrechargeoutputscan… PANL757 inchBuilt-in--… KLH 221--S-Video-… namemonitorrechargeoutputscan… oidkeyval 0namePANL75 0monitor7 inch 0RechargeBuilt-in 0OutputDigital ……… 1nameKLH 221 1OutputS-Video Storage & Querying of eCommerce Data 2. Advantages of Vertical Schema Objects can have large number of attributes Handles sparseness well Easy schema evolution But … Writing SQL is painful 3. Solution: Query Mapping Layer Hides complexity of vertical representation Fast performance 1. Problem with Conventional Schema Large number of Columns Sparsity Constant schema evolution Performance R. Agrawal, A. Somani and Y. Xu, “Storage and Querying of E-Commerce Data”, VLDB 2001 DSPMem.Logic ICs abcdef Cat1Cat2 ICs xyzw DSPMem.Logic ICs abcdefxyzw
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.