1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Optimizing search engines using clickthrough data
Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Problem Semi supervised sarcasm identification using SASI
Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.
Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de Vries Hein Röhrig.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
On Burstiness-Aware Search for Document Sequences Theodoros Lappas Benjamin Arai Manolis Platakis Dimitrios Kotsakos Dimitrios Gunopulos SIGKDD 2009.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
Dependency Network Based Real-time Query Expansion Jiaqi Zou, Xiaojie Wang Center for Intelligence Science and Technology, BUPT.
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Similarity measuress Laboratory of Image Analysis for Computer Vision and Multimedia Università di Modena e Reggio Emilia,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Axial Flip Invariance and Fast Exhaustive Searching with Wavelets Matthew Bolitho.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Dengsheng Zhang and Melissa Chen Yi Lim
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Query Prediction by Currently-Browsed Web Pages and Its Applications
Mining Query Subtopics from Search Log Data
Learning Literature Search Models from Citation Behavior
Intent-Aware Semantic Query Annotation
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Panagiotis G. Ipeirotis Luis Gravano
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query based on Time-based Query Classification

2 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

3 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

4 Why Temporal Intent Detection?  Richard McCreadie SIGIR 2013 Users tend to prefer rankings that integrate tweets or newswire articles soon after an event breaks, and blogs and Wikipedia pages become more useful over time. Automatic temporal intent detection is very significant for time-sensitive information retrieval, temporal diversity etc.!  Hideo Joho WWW % seek for information about the same day as they perform the search; 32.7% look for past information; 8.1% look for future information; 10.9% say that their information needs do not have specific temporal attributes.

5 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

6 In this paper, we propose an approach to identify the different temporal patterns automatically. Different Temporal Patterns Imply Different Temporal Intents Kulkarni A et al. (WSDM 2011) find some temporal patterns of query through mining query logs. However, they do not propose methods to identify those patterns automatically. Query frequency Curves from Google Trend

7 Query Temporal Pattern Taxonomy Java JDK Haiti Earthquake Christmas PresentEarthquake Clearly, we can use spikes to detect query temporal patterns.

8 What is a Spike? A spike is a set of continuous points on the query frequency curve that burst singularly. Generally, it represents an event. Spikes are hard to be detected effectively and precisely. Specially, we found it not effective to learn a cutting line to identify all spikes. Southeast Asia Earthquake Pakistan earthquake China earthquake Haiti earthquake Japan earthquake Virginia earthquake

9 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

10 Query Classification System Query Pattern Detection Framework Training Set Query Log Feature Extraction Query frequency curves Query Classifier (SVM) Query Pattern Preprocess

11 (1). Preprocess Trend Component Seasonal Component Random Component Use polynomial regression to model Trend Component. According to time series analysis, any curve contains three components. This is what we care in this paper. So we should remove Trend Component.

12 We use Student-t Distribution instead of Gaussian Distribution because we do not have exact training data pair (X, m t ). We have to use (X,F) instead. Thus, St and Yt components become noise when training. Student-t Distribution is more robust to noise than Gaussian Distribution. From PRML Student-t Gaussian noise without noise both work well Log likelihood loss function (1). Preprocess

13 Original Query Curve Trend Component Seasonal & Random Component (1). Preprocess

14 (2). Feature Extraction Mean Standard Deviation MR (Max Rate) SR (Spike Rate) Basic Features Curve Distance Features Regression Features For preprocessed query frequency curves, we define following features. D QoT D OQ D AMQ D PMQ Cutoff Spikes PD(Periodic Deviation)

15 MR (Max Rate)

16 SR (Spike Rate) MQ OQQoT m is half the period of a spike.

17 How to determine the value of m? SR (Spike Rate)

18 Distance between Two Curves Fi q :shifting time series Fi by q time units. || ||:the l2 norm. This measure finds the optimal alignment (translation q) and the scaling coefficient α for matching the shapes of the two time series. It is difficult to find the optimum solution. In practice, we shift all possible q to find the approximation solution. Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.

19 Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, Distance between Two Curves

20 D QoT D OQ D AMQ D PMQ D QoT : Average distance from annotated QoT curves. D OQ : Average distance from annotated OQ curves. D AMQ : Average distance from annotated AMQ curves. D PMQ : Average distance from annotated PMQ curves. Similar to KNN but cost much less time.

21 Cutoff Spikes PD What about training data? (F, Cutoff) pair is not known. PD: Measure periodicity …… Spikes: Number of spikes …… Above 8 features are combined to learn a cutting off line We can use annotated pair (F, Pattern Category) to approximate (F, Cutoff). For this curve, because we annotate it as MQ, the cutoff value line in the pink area.

22 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

Experiment Results  5,000 queries from Query Track of TREC.  Corresponding query frequency files from Google Trends.  Manually annotate categories of these queries in terms of their frequency curves.  5-fold Query ClassQoTOQAMQPMQaverage P R F1F Classification Performance Comparison for Different Query Categories AMQ PMQ QoT OQ

24 Feature Effectiveness Analysis

25 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

26 Application – Temporal Diversity Temporal intents of user query are uncertain, we should diversify the search results in time dimension in order to cover more important time unit of user query. Temporal Intent Coverage Subtopic Coverage Novelty

27 Application – Temporal Diversity MMRSIGIR’98 xQuADWWW’10 IA-SelectWSDM’09 LM+T+DSIGIR’13 RM+T+S+DOur method

28 Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

Conclusion  We shift the problem of temporal intents detection to classification problem.  We propose effective features to detect temporal intents effectively.  We imply temporal intents results to temporal diversity and achieve high performance. 29

30 Future Work More Effective Features Data sparse problem for long queries

31 Thanks a lot for your attention!