Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Recommender Systems & Collaborative Filtering
Content-based Recommendation Systems
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Transfer Learning for WiFi-based Indoor Localization
The Research Project - Preliminary Proposal Presentation Contextual Suggestion Track: Travel Plan Recommendation System Based on Open-web Information Presenter:
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
COMP 630L Paper Presentation Javy Hoi Ying Lau. Selected Paper “A Large Scale Evaluation and Analysis of Personalized Search Strategies” By Zhicheng Dou,
1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题:百度 /Google 怎么赚钱?
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
by B. Zadrozny and C. Elkan
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
Wancai Zhang, Hailong Sun, Xudong Liu, Xiaohui Guo.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Personalized Search Cheng Cheng (cc2999) Department of Computer Science Columbia University A Large Scale Evaluation and Analysis of Personalized Search.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Google News Personalization: Scalable Online Collaborative Filtering
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Supporting Privacy Protection in Personalized Web Search.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
2016/2/4Course Introduction1 COMP 4332, RMBI 4330 Advanced Data Mining (Spring 2012) Qiang Yang Hong Kong University of Science and Technology
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Collaborative Competitive Filtering: Learning recommender using context of user choices Shuang Hong Yang Bo Long, Alex Smola, Hongyuan Zha Zhaohui Zheng.
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Improving Collaborative Filtering by Incorporating Customer Reviews Hui Hui Supervisor Prof Min-Yen Kan Dr. Kazunari Sugiyama 1.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Bridging Domains Using World Wide Knowledge for Transfer Learning
Recommender Systems & Collaborative Filtering
TJTS505: Master's Thesis Seminar
Probabilistic Latent Preference Analysis
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Presentation transcript:

Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST

Query Classification and Online Advertisement

QC as Machine Learning Inspired by the KDDCUP’05 competition – Classify a query into a ranked list of categories – Queries are collected from real search engines – Target categories are organized in a tree with each node being a category

Our QC Demo

Personalization The aim of Personalized Query Classification is to classify a user query Q to a ranked list of predefined categories for different users QueriesCategories golfCar Sports Places bassEntertainment/Music Living/Fishing Michael Jordan Information/Research Sports/Basketball Shopping

PQC: Personalized Query Classification classify a user query Q to a ranked list of categories for different users QueriesCategories golfCar Sports Places bassEntertainment/Music Living/Fishing Michael Jordan Information/Research Sports/Basketball Shopping

Question: Can we personalize search without user registration info? Profile based PQC Context based PQC Conclusion

Difficulties Web Queries are – Short, sparse: “adi”, ”cs”, “ps” – Noisy: “contnt”, “gogle” – New words are emerging all the time: “windows7” Training data are hard for human to label – Experts may have different understandings for the same ambiguous query E.g. “Apple”, “Office”, etc.

Method 1: Profile Based Profile (U) = { } in the past – Profile based Personalized Query Classification -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. Michael Jordan

Method 2: Context Based Context = a session of user submitted queries Graphical Model Machine Learning UCB Michael Jordan

Outline Introduction Profile based PQC Context based PQC Conclusion

How to construct a user profile? To achieve personalized query classification, under independence assumption ACM KDDCUP 2005 Solution: estimating: p(q|c) Focus: estimating p(u|c) for personalization Difficulty: sparseness – Too many possible categories – Limited information for each user p(c|q,u) ∝ p(q|c)p(u|c)p(c)

Categorized Clickthrough Data:Too Few! Clickthrough Data Search Engines

Collaborative Classification Leverage information from similar users: user- class matrix C1C2C3C4C5 User A√X√?X User B√√?X√ User CXX√?X User D√?√√X √ interested in X not interested in Also can be a value indicate degree of interests

Extending Collaborative Filtering (CF) Model to Ranking (Liu and Yang, SIGIR 008) Previous method for CF: – Memory based approach: Finding users having similar interests to help predicting missing values – Model based approach: estimating probability based on new user’s known values We propose a collaborative ranking model to improve model based approach – Using preference or ranking instead of values better at estimating the preference for users

Nathan Liu and Qiang Yang. EigenRank: Collaborative Filtering via Rank Aggregation. In ACM SIGIR Conference (ACM SIGIR 08), Singapore, 2008 Predicted Ratings Rating Database Active User Ratings Rating Prediction 1. Item y 2 2. Item y 3 Item List Sort Ranking Collaborative Ranking Framework

Collaborative Ranking for Intention Mining Interest Score Matrix P(U|C) Interest Score Matrix P(U|C) |user, or user group| Preference Matrix |Category| |Preference={(URL1<URL2)}| |User| Our objective is to uncover the interest probability P(U|C) consistent with the given observed preference for each query Input Output |Intention category|

Solution: Automatically Generate Labeled Data (to assist human labelers) Clickthrough – Connects queries and urls – Contains users’ personal interpretation for query url a url a Query url b url b Query User A User B || C1 C2 We need the category information for urls …

Experimental Results: F1 metric

How to enlarge training set? 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. A few human labeled data √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. A HUGE number of clickthrough logs without labels Online Knowledge Bases, such as ODP, Wikipedia

Online Knowledge Base such as WiKi Knowledge Base Plentiful Documents Links Meaningful Ontology

“Label” Retrieval from Online KB Wikipedia Concept Graph Labels on result pages: Shopping: Commercial Sports: non- Commercial Video Games: Commercial Research:non- Commercial Use labeled result pages as “Seeds” to retrieve the most relevant documents as training data Taking Online Commercial Intention as an example

Obtain “Pseudo-Relevance” Data 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. 1…. 2…. 3…. A few human labeled data √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. -…. √ …. - …. √ …. - …. √ …. A HUGE number of clickthrough logs We learn a classifier using the retrieved “labeled” documents We apply the classifier to “label” the HUGE clickthrough log We can use the HUGE “label” clickthrough log for evaluation

Preliminary results on F(URL)  C We evaluated the performance of the classifier trained with the relevant documents retrieved from Wikipedia AOL query data set, 10,000 held out for test F1 for 18 classes on AOL Query Classification task Number of labeled query Seed Training Queries enriched by search snippets Training documents retrieved from Wikipedia 10012%28%(5,000 Instances) 20021%36%(10,000 Instances) 40031%38%(15,000 Instaces)

Outline Introduction Profile based PQC Context based PQC: Hao Hu, Huanhuan Cao, et SIGIR 2009, ACML Conclusion

Context based PQC for Online Commercial Intention The commercial intention of the same query can be identified given its context information Allan Iverson shoes T-short Michael Jordan Commercial! Offer ads! Commercial! Offer ads!

Context based PQC for Online Commercial Intention [Cao etc. SIGIR’09] The commercial intention of the same query can be identified given its context information Graphical Model Machine Learning UCB Michael Jordan Non-Commercial! Redirect to scholar Search! Non-Commercial! Redirect to scholar Search!

Two questions: How do we model query context? How do we detect whether two queries are semantically similar? Feature Generation/Enrichment Graphical Models

Conditional Random Field Motivation: model the query logs as a conditional random field. Therefore, the relationships between consecutive and even skip queries can be modeled. Question: How do we decide whether two “skip queries” (non-consecutive queries) are related and should be linked? Motivation: model the query logs as a conditional random field. Therefore, the relationships between consecutive and even skip queries can be modeled. Question: How do we decide whether two “skip queries” (non-consecutive queries) are related and should be linked?

Semantic Relationship between queries Given Query A and Query B, how do we determine the degrees of relevancy of these two queries in a semantic level? – Send queries to search engines – Obtain search results – Determine distance between search results

Context based PQC for Online Commercial Intention The commercial intention of the same query can be identified given its context information Allan Iverson shoes T-short Michael Jordan Commercial! Offer ads! Commercial! Offer ads!

Context based PQC for Online Commercial Intention The commercial intention of the same query can be identified given its context information Graphical Model Machine Learning UCB Michael Jordan Non-Commercial! Redirect to scholar Search! Non-Commercial! Redirect to scholar Search!

Evaluation Using context information Vs Not using context information

Preliminary Experimental Results of PQC for Online Commercial Intention Dataset – AOL Query Log data – Around ~20M Web Queries – Around 650K Web users – Data is sorted by anonymous UserID and sequentially arranged. Each item of clickthrough log data contains – {AnonID, Query, QueryTime, ItemRank, ClickURL}

Preliminary Results In our preliminary experimental studies, we annotated four users with the OCI (commercial / non-commercial) status in their clickthrough logs. More larger-scale experimental studies to be followed. Evaluation Metric: Standard F1-measure Baseline classifier: the classifier in Dai’s WWW 2006 work ( In our preliminary experimental studies, we annotated four users with the OCI (commercial / non-commercial) status in their clickthrough logs. More larger-scale experimental studies to be followed. Evaluation Metric: Standard F1-measure Baseline classifier: the classifier in Dai’s WWW 2006 work ( F1 for users on AOL Data ModelUser 1User 2User 3User 4 Baseline (non- context) 83.4%82.3%84.0%83.1% Context base PQC 92.7%94.2%91.3%92.6%

Preliminary Results The parameter we tune is the threshold we use to determine whether we add the “skip edges” in the CRF model or not.

Ongoing work: Personalized Query Classification Efficiency More ground truth data for evaluation

PQC and Personalized Search Similar input: – Query Log, Clickthrough Data, IP Address, etc. Different output: – Personalized Search ranked results – PQC Discrete intention categories, Application: advertisements etc.

Conclusions: PQC Have user profile information? Profile = Output=Class Method = Collaborative Ranking Have query stream information? Context = Output=Class Method = CRF-based method

Q & A