Mining E-Commerce Query Relations using Customer Interaction Networks

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
PARTITIONAL CLUSTERING
Fast Algorithms For Hierarchical Range Histogram Constructions
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Collaborative Filtering CMSC498K Survey Paper Presented by Hyoungtae Cho.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Resource Placement and Assignment in Distributed Network Topologies Accepted to: INFOCOM 2013 Yuval Rochman, Hanoch Levy, Eli Brosh.
Models of Influence in Online Social Networks
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
December 7-10, 2013, Dallas, Texas
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Post-Ranking query suggestion by diversifying search Chao Wang.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Informatics tools in network science
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Prof. Yu-Chee Tseng Department of Computer Science
Cohesive Subgraph Computation over Large Graphs
What Is Cluster Analysis?
Semi-Supervised Clustering
A Viewpoint-based Approach for Interaction Graph Analysis
Groups of vertices and Core-periphery structure
MEIKE: Influence-based Communities in Networks
Sofus A. Macskassy Fetch Technologies
RE-Tree: An Efficient Index Structure for Regular Expressions
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Haim Kaplan and Uri Zwick
E-Commerce Theories & Practices
Categorizing networks using Machine Learning
Distributed Representations of Subgraphs
Effective Social Network Quarantine with Minimal Isolation Costs
Apache Spark & Complex Network
KDD Reviews 周天烁 2018年5月9日.
The Recommendation Click Graph: Properties and Applications
Department of Computer Science University of York
3.3 Network-Centric Community Detection
Asymmetric Transitivity Preserving Graph Embedding
Automatic Segmentation of Data Sequences
Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita
Graph and Link Mining.
Practical Applications Using igraph in R Roger Stanton
Clustering.
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Presentation transcript:

Mining E-Commerce Query Relations using Customer Interaction Networks Bijaya Adhikari*, Parikshit Sondhi+, Wenke Zhang+, Mohit Sharma+ and B. Aditya Prakash* *Department of Computer Science, Virginia Tech +Walmart Labs WWW, Lyon, April 26th, 2018

Adhikari, Sondhi, Zhang, Sharma, Prakash User Engagement “sweater” Begins with query submission List of results are produced Based on the result Click Purchase Reformulate the query Specialization Generalization “red sweater” Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Engagement data Log of Engagement Organized by `sessions’ Consists of Queries and engagement Sorted by time Engagement data captures User intent Measure of satisfaction Evolution of intent Engagement log Many possible applications Adhikari, Sondhi, Zhang, Sharma, Prakash

Application 1: Intent Based Query Clustering Important to group queries with similar intents Improve search results A natural question How to group queries with the same intents? “Phone” “Mobile” “TV” “Television” “TV” “Television” “Phone” “Mobile” Adhikari, Sondhi, Zhang, Sharma, Prakash

Application 2: item recommendation Many queries have little/no engagement data ``rare’’ queries Problem How to recommend items for such queries? “unicorn” Adhikari, Sondhi, Zhang, Sharma, Prakash

Application 3: critical queries 6 Application 3: critical queries Queries impact each other Engagement data from one can be used for another A natural question Which queries have the highest cumulative impact on others? Critical Queries Meaningful representatives Improve other queries Measure state of the search system ‘‘iPhone’’ ‘‘Phone’’ Adhikari, Sondhi, Zhang, Sharma, Prakash Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Challenges A1: Intent based Query Clustering Naïve methods like string matching don’t work well Meta data like query category not available A2: Item recommendation No engagement data A3: Mining critical queries Characterizing impact and importance Engagement log Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Main Idea Engagement log All problems rely on query/item relations Our Idea: Build graphs Capture relations Leverage graphs to solve Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Engagement log Goal of our work Generate Generate meaningful Customer Interaction Networks from engagement data Understand Develop insights of the nature of the generated networks Exploit Leverage new insights for Query Mining tasks CIN Query Mining Tasks Intent based Query Clustering Item Recommendation Mining Critical Queries … Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Outline Motivation Customer Interaction Networks Application 1: Intent Based Query Clustering Application 2: Item Recommendation Application 3: Critical Queries Conclusion Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Networks We generate following networks: Query Reformulation Network (QRN) Item Click Network (ICN) Composite Click Network (CCN) Cover Network (CN) Engagement log CIN Adhikari, Sondhi, Zhang, Sharma, Prakash

QRN: Query Reformulation Network Query-to-Query relations Nodes: Queries Edges: Reformulation Construction Consecutive queries in a session Having minimum Appearance threshold Reformulation threshold Engagement log Query Reformulation Network Adhikari, Sondhi, Zhang, Sharma, Prakash

ICN: Item Click Network Query-to-Item relations Nodes: Queries/Items Edges: Click Construction Clicks observed in the log Having minimum Appearance threshold Click threshold Engagement log Item Click Network Adhikari, Sondhi, Zhang, Sharma, Prakash

Item Click + Query Reformulation = Composite Click Network Other CINs CN: Cover Network CCN: Composite Click Network Item Click Network Cover Network Item Click + Query Reformulation = Composite Click Network Not considering other networks which use content Adhikari, Sondhi, Zhang, Sharma, Prakash

Empirical properties and implications Network # Nodes # Edges Degree Dist. Assortativity Diameter Average CC GCC QRN 2.11 M 2.14 M PL/LN None 94 0.05 No ICN 5.4 M 18.4 M LN 37 0.12 CCN 6.3 M 20.5 M PL 36 0.17 CN 785 K 71 M Positive 13 0.76 Yes Observations Implications Low Density Heavy Tailed Distributions Positive Assortativity Long Diameter Low Clustering No GCC Sparse Skewed Clustering of Popular queries Separation of queries with different intents Adhikari, Sondhi, Zhang, Sharma, Prakash

Empirical properties and implications Network # Nodes # Edges Degree Dist. Assortativity Diameter Average CC GCC QRN 2.11 M 2.14 M PL/LN None 94 0.05 No ICN 5.4 M 18.4 M LN 37 0.12 CCN 6.3 M 20.5 M PL 36 0.17 CN 785 K 71 M Positive 13 0.76 Yes QRN: Sparse, star-like, low clustering, long diameter ICN : Sparse, low-clustering, long diameter CCN: Sparse, low clustering, long diameter Cover: Denser, high clustering, shorter diameter More useful Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Outline Motivation Customer Interaction Networks Application 1: Intent Based Query Clustering Application 2: Item Recommendation Application 3: Critical Queries Conclusion Adhikari, Sondhi, Zhang, Sharma, Prakash

Reminder: Intent Based Query Clustering Important to group queries with similar intents Improve search results Query categories are not available for all queries A natural question How to group queries with the same intents? “Phone” “Mobile” “TV” “Television” “TV” “Television” “Phone” “Mobile” Adhikari, Sondhi, Zhang, Sharma, Prakash

Informal problem On QRN Given A Query Reformulation network An Integer k Find A k-partition of the network Such that Each partition contains queries with the same intent Why use QRN? Network Nature: Similar queries have edge Sparsity: Queries are well separated Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Problem formulation Want to find partitions: Homogenous Each having different Intents Homogeneity: Shortest Paths Each Edge: significant reformulation Short distance: Closer relationship Good candidate for intent similarity Different intents: High in-degree nodes Tend to be general queries In edges are generalizations Good candidates to represent intent of broad region Adhikari, Sondhi, Zhang, Sharma, Prakash

Putting it together Want to find communities: Quality of a partition Homogenous Each have different Intents Quality of a partition Short Distance to center High in-degree center Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Problem Statement Given A Query Reformulation Network G(V, E) An Integer k Find A set S of k general query nodes A set C of k partitions of G Such that Each Partition contains one element of S Minimizes the objective Adhikari, Sondhi, Zhang, Sharma, Prakash

Our Method: Hub-Query-Expansion Our main Idea: Leverage newly discovered properties of QRN Our Method: Detect Communities in each component Assign ‘hubs’ to each Community Use BFS to expand communities (No GCC) (Skewed Degree) (Sparsity) Bottomline Linear Time and Space Complexity Well suited for large networks Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Baseline Methods Some of the methods we considered are: BigClam[Yang+, 2013]: Leverage latent relevance to find overlapping clusters Louvian[Blondel+, 2008]: Modularity based community detection method LouvSmall: Modify Louvian to generate smaller communities Star: Generate Star Shaped Communities Adhikari, Sondhi, Zhang, Sharma, Prakash

Category Based Evaluation Challenge: How to evaluate with partial ground truth (categories)? AIH: Average Intent Homogeneity Intuitively measures category based precision AIS: Average Inverse Spread Intuitively measures category based recall F1: Harmonic mean of AIH and AIS Adhikari, Sondhi, Zhang, Sharma, Prakash

Performance HubQExpansion outperforms the baselines More results in the paper HubQExpansion outperforms the baselines Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Qualitative results Some example query clusters zebra animal            zebra stuffed animal      plush zebra stuffed animal              stuffed animals zebra           zebra stuffed toy               stuffed zebras          zebra stuff animal              stuffed zebra           zebra plush 10 key for laptop             10 key          10key           10 key calculator with paper            10 key calculator with tape             10 key calculator               10 key printing calculator              10 key pad              10-key          10 key keypad           10 key usb keypad               Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Outline Motivation Customer Interaction Networks Application 1: Intent Based Query Clustering Application 2: Item Recommendation Application 3: Critical Queries Conclusion Adhikari, Sondhi, Zhang, Sharma, Prakash

Reminder: item recommendation Many queries have little/no engagement data ``rare’’ queries Problem How to recommend items for such queries? “unicorn” Adhikari, Sondhi, Zhang, Sharma, Prakash

Item Recommendation: Problem For queries with no/little engagement data How to recommend items? Composite Click Network How to recommend items for query 4? Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Our Main Idea Details in the paper Composite Click Network Leverage Random Walk [Craswell+, 2007] Treat a query (eg Q4) as a source node Items as sink nodes Start multiple random walks from Q4 Recommend the products where most of the random walks end Adhikari, Sondhi, Zhang, Sharma, Prakash

Experiments and Results Implemented and deployed A/B test Result 34% improvement in NDGC over current Walmart.com search engine! Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Outline Motivation Customer Interaction Networks Application 1: Intent Based Query Clustering Application 2: Item Recommendation Application 3: Critical Queries Conclusion Adhikari, Sondhi, Zhang, Sharma, Prakash

Reminder: critical queries 34 Reminder: critical queries Queries impact each other Engagement data from one can be used for another A natural question Which queries have the highest cumulative impact on others? Critical Queries Meaningful representatives Improve other queries Measure state of the search system ‘‘iPhone’’ ‘‘Phone’’ Adhikari, Sondhi, Zhang, Sharma, Prakash Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Informal Problem Given A Query Reformulation Network G(V, E) An Integer k Find A set T of k nodes Such that The nodes in T have the highest cumulative impact on other queries Query Reformulation Network How to model the impact of queries? Adhikari, Sondhi, Zhang, Sharma, Prakash

Our Idea: model user interaction RUN (randomized user navigation) model Starts from arbitrary Node Reformulation Log: Q1, Q2, Q3, Q5 Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Formal Problem 𝝓 𝑻 : the expected number of times nodes in 𝑇 appear in reformulation path according to RUN model. NP-Hard Adhikari, Sondhi, Zhang, Sharma, Prakash

Our Method: CriticalQueries Lemma1: 𝜙 𝑇 is submodular and monotonous for 𝐴⊂𝐵 Greedy algorithm which adds new element at each step which maximizes marginal gain has (1-1/e) approximation [Nemhauser+, 1978] Speed up: Sampling technique Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Baseline Methods One can leverage many methods/definitions Most Frequent Queries Queries Appearing in Most Sessions Queries with highest PageRank Queries with highest Eigenvector Centrality Adhikari, Sondhi, Zhang, Sharma, Prakash

Usability Evaluation Metric INFluenced Queries (InfQ): Number of related queries within some radius Sum of related Items (SumI) Number of related items shared with other queries within some radius 𝐼𝑛𝑓𝑄=2 3 2 𝑆𝑢𝑚𝐼=5 Adhikari, Sondhi, Zhang, Sharma, Prakash

Results: Objective Our method maximizes objective function 𝜙(𝑇) Higher is better Our method maximizes objective function 𝜙(𝑇) Adhikari, Sondhi, Zhang, Sharma, Prakash

Results: Usability Our method has the best usability Higher is better Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Outline Motivation Customer Interaction Networks Application 1: Intent Based Query Clustering Application 2: Item Recommendation Application 3: Critical Queries Conclusion Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Take-Aways Query Reformulation Network CINs are useful They have useful properties A1: Intent Based Query Clustering A2: Item Recommendation A3: Critical Queries Formulation and Methods exploit CINs properties scalable Future Work Add content Add query attributes Other applications like type-ahead, query curation, and so on Item Click Network Cover Network Adhikari, Sondhi, Zhang, Sharma, Prakash

Adhikari, Sondhi, Zhang, Sharma, Prakash Any questions? Additional Funding: Slides: http://people.cs.vt.edu/~bijaya/ Bijaya Parikshit Wenke Mohit Aditya Query Clustering Engagement log CINs Critical Queries … Walmart Labs is hiring @ job fair! Adhikari, Sondhi, Zhang, Sharma, Prakash