Catalog Integration B2B electronics portal: 2000 categories, 200K datasheets Master CatalogNew Catalog After integration: Goal Use affinity information.

Slides:



Advertisements
Similar presentations
PAKDD Panel: What Next Ramakrishnan Srikant. What Next Electronic Commerce –Catalog Integration (WWW 2001, with R. Agrawal) –Searching with Numbers (WWW.
Advertisements

XML DOCUMENTS AND DATABASES
By Daniela Floresu Donald Kossmann
From portions of Chapter 8, 9, 10, &11. Real world is complex. GIS is used model reality. The GIS models then enable us to ask questions of the data by.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Merging Taxonomies. Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to.
ItCompress: An Iterative Semantic Compression Algorithm
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Implementing P3P Using Database Technology Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu Presented by Yajie Zhu 03/24/2005.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
Chapter 12 Distributed Database Management Systems
Recommender systems Ram Akella November 26 th 2008.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Storage Engine for Semantic Web. Assertion Storage engine for semantic web has requirements similar to those for e- commerce aplications. Draw upon results.
Database Technologies for E-Commerce Rakesh Agrawal IBM Almaden Research Center.
Data Mining Technologies for Digital Libraries & Web Information Systems Ramakrishnan Srikant.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Database Design for DNN Developers Sebastian Leupold.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapters 17 & 18 Physical Database Design Methodology.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
1 A K-Means Based Bayesian Classifier Inside a DBMS Using SQL & UDFs Ph.D Showcase, Dept. of Computer Science Sasi Kumar Pitchaimalai Ph.D Candidate Database.
Implementing P3P Using Database Technology Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Module 4 Designing and Implementing Views. Module Overview Introduction to Views Creating and Managing Views Performance Considerations for Views.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
Storage and Retrieval of E-Commerce Data R. Agrawal, A. Somani, Y. Xu: VLDB-2001.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Indexes and Views Unit 7.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Intro to GIS | Summer 2012 Attribute Tables – Part 1.
Catalog Integration R. Agrawal, R. Srikant: WWW-10.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Database Technologies for E-Commerce Rakesh Agrawal IBM Almaden Research Center.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Database Systems: Design, Implementation, and Management Tenth Edition
Table General Guidelines for Better System Performance
CS422 Principles of Database Systems Course Overview
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Database Performance Tuning and Query Optimization
Chapter 15 QUERY EXECUTION.
HARDWARE SPECIFICATIONS.
Storing and Querying XML Documents Without Using Schema Information
RDF Stores S. Sakr and G. A. Naymat.
Prepared by: Mahmoud Rafeek Al-Farra
Table General Guidelines for Better System Performance
Database Systems Instructor Name: Lecture-3.
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

Catalog Integration B2B electronics portal: 2000 categories, 200K datasheets Master CatalogNew Catalog After integration: Goal Use affinity information in new catalog. –Products in same category are similar. Accuracy boost depends on match between two categorizations. Problem Statement Given –master categorization M: categories C 1, C 2, …, C n set of documents in each category –new categorization N: categories S 1, S 2, …, S n set of documents in each category Standard Alg: Compute Pr(C i | d) Enhanced Alg: Compute Pr(C i | d, S) Enhanced Naïve Bayes classifier Use tuning set to determine w. –Defaults to standard Naïve Bayes if w = 0. Only affects classification of borderline documents. Searching with Numbers. Empirical Results Reflectivity If we get a close match on numbers, how likely is it that we have correctly matched attribute names? –Likelihood  Non-reflectivity (of data) Let –D: dataset, n i : co-ordinates of point x i, –reflections(x i ): permutations of n i –  (n i ): # of points within distance r of n i –  (n i ): # of reflections within distance r of n i Non-overlapping attributes  Non-reflective. –Memory: Mb, Disk: Gb Correlations or Clustering  Low reflectivity. –Memory: Mb, Disk: Gb Database Technologies For Electronic Commerce Rakesh Agrawal, Ramakrishnan Srikant, Yirong Xu IBM Thinkpad 750 MHz Pentium 3, 196 MB DRAM, … Dell Computer 700 MHz Celeron, 256 MB SDRAM, … Catalog Database IBM Thinkpad (750 MHz, 196 MB) … Dell (700 MHz, 256 MB) lb R. Agrawal and R. Srikant, “Searching with Numbers”, W W W 2002 R. Agrawal and R. Srikant, “On Integrating Catalogs”, W W W 2001 eCommerce Applications Data stored in conventional way SELECT name, output FROM H Query Mapping Layer Query Parsing Transformation Pure SQL-92 Transform: SELECT V1.val, V2.val FROM V V1, V V2 WHERE V1.key = ‘name’ AND V2.key = ‘output’ AMD V1.oid = V2.oid Optimized Operator Implementation Vertical Table (V) Recommendations for Database Vendors: üPartial Indices üEnhanced Table Functions (TF) üFirst Class treatment of TF üNative Support for v2h operation Other Applications: Stores for XML, RDF, LDAP and Data Mining eCommerce Applications Horizontal View (HV) SELECT name, output FROM HV namemonitorrechargeoutputscan… PANL757 inchBuilt-in--… KLH 221--S-Video-… namemonitorrechargeoutputscan… oidkeyval 0namePANL75 0monitor7 inch 0RechargeBuilt-in 0OutputDigital ……… 1nameKLH 221 1OutputS-Video Storage & Querying of eCommerce Data 2. Advantages of Vertical Schema Objects can have large number of attributes Handles sparseness well Easy schema evolution But … Writing SQL is painful 3. Solution: Query Mapping Layer Hides complexity of vertical representation Fast performance 1. Problem with Conventional Schema Large number of Columns Sparsity Constant schema evolution Performance R. Agrawal, A. Somani and Y. Xu, “Storage and Querying of E-Commerce Data”, VLDB 2001 DSPMem.Logic ICs abcdef Cat1Cat2 ICs xyzw DSPMem.Logic ICs abcdefxyzw