S. B. Roy, S. A.-Yahia, A. Chawla, G. Das, and C. Yu SIGMOD 2010 Constructing and Exploring Composite Items 2011/4/14 1.

Slides:



Advertisements
Similar presentations
Incremental Clustering for Trajectories
Advertisements

Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
Hadi Goudarzi and Massoud Pedram
Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
November 5, 2007 ACM WEASEL Tech Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of.
Preference Elicitation Partial-revelation VCG mechanism for Combinatorial Auctions and Eliciting Non-price Preferences in Combinatorial Auctions.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.
Modeling Seller Listing Strategies Quang Duong University of Michigan Neel Sundaresan Nish Parikh Zeqiang Shen eBay Research Labs 1.
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
D1: Quick Sort. The quick sort is an algorithm that sorts data into a specified order. For a quick sort, select the data item in the middle of the list.
Evaluating Search Engine
Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
A TABU SEARCH APPROACH TO POLYGONAL APPROXIMATION OF DIGITAL CURVES.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Admission Control and Dynamic Adaptation for a Proportional-Delay DiffServ-Enabled Web Server Yu Cai.
Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.
Multiple testing correction
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Finding dense components in weighted graphs Paul Horn
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Protecting Sensitive Labels in Social Network Data Anonymization.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.
Software Engineering Review CS 244 Brent M. Dingle, Ph.D. Game Design and Development Program Department of Mathematics, Statistics, and Computer Science.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Data Structures Using C++ 2E Chapter 1 Software Engineering Principles and C++ Classes.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
Data Structures Using C++ 2E
SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.
FORS 8450 Advanced Forest Planning Lecture 6 Threshold Accepting.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Understanding RFID Counting Protocols Authors: Binbin Chen, Ziling Zhou, Haifeng Yu MobiCom 2013 Presenter: Musab Hameed.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.
Deterministic Algorithms for Submodular Maximization Problems Moran Feldman The Open University of Israel Joint work with Niv Buchbinder.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Effective C# Item 10 and 11. Understand the Pitfalls of GetHashCode Item 10.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Polyhedral Optimization Lecture 5 – Part 3 M. Pawan Kumar Slides available online
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Personalization and Visualization on Handheld Devices Dongsong Zhang, George Karabatis, Zhiyuan Chen, Boonlit Adipat, Liwei Dai, Tony Zhang, and Wang Yu.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Spatial Online Sampling and Aggregation
Presentation transcript:

S. B. Roy, S. A.-Yahia, A. Chawla, G. Das, and C. Yu SIGMOD 2010 Constructing and Exploring Composite Items 2011/4/14 1

Outline 2011/4/14 2 Motivation Three challenges Maximal package construction Summarization Visual Effect Experiments Conculsion

Motivation 2011/4/14 3 Nowadays, online shopping has become a daily activity. While many online sites are still centered around facilitating a user’s interatction with individual items, an increasing emphasis, composite items, is being put on helping users. Central item budget satellite item

Three challenges 2011/4/14 4 The goal of this work is to develop a principled approach for constructing composite items and helping users explore them efficiently and effectively.  To identify all valid and maximal satellite packages with a central item. ‚ To summarize the packages associated with a central item into k representative packages ƒ To efficiently identify an ordering of the k packages which maximizes the visual effect of diversity.

Valid Packages 2011/4/14 5

(Cont.) 2011/4/14 6

(Cont.) 2011/4/14 7 Compatible:

Example 2011/4/14 8 To consider a user shopping an iPhone for less than $350

2011/4/14 9

(Cont.) 2011/4/14 10 To consider a user shopping an iPhone for less than $350

Maximal Packages 2011/4/14 11 iPhone3G /8GB S4caseS2chargerS1kitS4screenS1penTotal cost $99$39.95$99$24.95$66$19.95$ iPhone 3GS/8GB S2speaker $199$149$348

Summarization 2011/4/14 12 Maximal package can still become very large in practice. Different maximal packages associated with the same central item, may overlap significantly in their satellite items. iPhone 3G/16GB S2caseS4chargerS3cableS3speaker-- iPhone 3G/16GB S2caseS4charger-S3speakerS3screenS1pen

(Cont.) 2011/4/14 13 Maximal package can still become very large in practice. Different maximal packages associated with the same central item, may overlap significantly in their satellite items. Hence, this paper further propose to summarize maximal packages into a smaller set Ic, summary set, containing k representative packages. iPhone 3G/16GB S2caseS4chargerS3cableS3speaker-- iPhone 3G/16GB S2caseS4charger-S3speakerS3screenS1pen

Visual Effect 2011/4/14 14 After obtaining k summary packages, how to effectively present them to the user. It use diversity to rank the summary packages to avoid presenting a package that is too similar to a package the user has just seen. This paper introduce the notion of satellite type prioritization. One user looking for an iPhone may prefer seeing variety in chargers over in speakers One user may prefer variety in protective screens over in cables.

(Cont.) 2011/4/14 15

(Cont.) 2011/4/14 16

(Cont.) 2011/4/14 17 pv(p 1,p 2 )=

(Cont.) 2011/4/14 18 pv(p 2,p 3 )=

(Cont.) 2011/4/14 19 pv(p 3,p 4 )=

(Cont.) 2011/4/14 20 The first ordering pv(p 1,p 2,p 3,p 4 )=

Maximal package construction 2011/4/14 21 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101

(Cont.) 2011/4/14 22 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101

(Cont.) 2011/4/14 23 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 ƒ {S3cable, S3speaker, S2pen}  $34.95+$64.95+$9.95=$109.85>$101

(Cont.) 2011/4/14 24 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 ƒ {S3cable, S3speaker, S2pen}  $34.95+$64.95+$9.95=$109.85>$101

(Cont.) 2011/4/14 25 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 „ {S3cable, S3speaker}is a maximal package

(Cont.) 2011/4/14 26 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 „ {S3cable, S3speaker}is a maximal package … To judge {S3cable, S3speaker}whether exist Mc

(Cont.) 2011/4/14 27 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 „ {S3cable, S3speaker}is a maximal package … To judge {S3cable, S3speaker}whether exist Mc † If it doesn’t exist, count ({S3cable, S3speaker})=1

(Cont.) 2011/4/14 28 Central item: iPhone 3G/16GB  $199 Budget: $300  the budget for the satellite package is $101 Assume there are 5 satellite items : S1kit($24.95), S3cable($34.95), S3speaker($64.95), S4screen($66), S2pen($9.95)  {S3cable}  $34.95<$101 ‚ {S3cable, S3speaker}  $34.95+$64.95=$99.9<$101 „ {S3cable, S3speaker}is a maximal package … To judge {S3cable, S3speaker}whether exist Mc † If it exists, count ({S3cable, S3speaker})++

(Cont.) 2011/4/14 29

Summarization 2011/4/14 30 The goal of summarization is to compute a set of k representative maximal packages Ic such that Coverage (Ic) is maximized.

(Cont.) 2011/4/14 31 The goal of summarization is to compute a set of k representative maximal packages Ic such that Coverage (Ic) is maximized. Selecting p 1 and p (2 3 -1)=279

(Cont.) 2011/4/14 32 Baseline Greedy algorithm: Assume k=2 Ic={} Ic  p1 Compute p2, p3, p4 with p1 coverage argmax p =p3 Ic  p3 return

(Cont.) 2011/4/14 33 Because of the need to compute the coverage of multiple sets at each iteration, baseline greedy algo. Can still be quite expensive in practice. It proposed FastGreedy algo. to improve upon the performance and maintain the same approximation bound. Key : using Bonferroni upper and lower bounding techniques ?

(Cont.) 2011/4/14 34 In practice, the number of maximal packages can be large and limits how fast the summary can be generated. It describes a randomized algo. to produce k representative packages directly from the set of compatible satellite items. It makes similar random walks to generate a set of maximal packages. Two differences: It stops as soon as k packages are generated. Each random walk invoked from within Algorithm 4 is designed to generate a package that is as different as possible from the packages already discovered by the previous random walks.

(Cont.) 2011/4/14 35

(Cont.) 2011/4/14 36

(Cont.) 2011/4/14 37 Algorithm 4 discovers the max. satellite package p1={s1kit, s3speaker, s2 pen} at the first iteration In the second iteration, the probabilities of the items that appear in p1 are reduced. S1kit gets 16% probability of being chosen at second iteration, compared against its 20% probability in the fisrt iteration.

Visual Effect 2011/4/14 38

(Cont.) 2011/4/14 39

(Cont.) 2011/4/14 40

Experiments 2011/4/14 41 The number of maximal packages grows quickly As the price budget goes up As the number of compatible satellite items increases

(Cont.) 2011/4/14 42

(Cont.) 2011/4/14 43

(Cont.) 2011/4/14 44

Conclusion 2011/4/14 45 In this paper, it designs and implements efficient algorithms to address three chanllenges.  To identify all valid and maximal satellite packages with a central item. ‚ To summarize the packages associated with a central item into k representative packages ƒ To efficiently identify an ordering of the k packages which maximizes the visual effect of diversity.