Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
ECG Signal processing (2)
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Presented by Zeehasham Rasheed
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Scalable Text Mining with Sparse Generative Models
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Universit at Dortmund, LS VIII
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Algorithmic Detection of Semantic Similarity WWW 2005.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Discriminative Frequent Pattern Analysis for Effective Classification
Panagiotis G. Ipeirotis Luis Gravano
Feature Selection for Ranking
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
SVMs for Document Ranking
Using Link Information to Enhance Web Page Classification
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan University 2 Nanjing University of Science and Technology Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Outline 1 The Motivation 2 Existing Solutions & Related Work 3 Our Approach 4 Experimental Analysis 5 Conclusion

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification CQAs cQAs: community Question and Answering services Yahoo! Answers : Multilingual, mainly English Baidu Zhidao : Chinese YA contains more than 10,000,000,000 resolved questions now!

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Community Question Retrieval Since the cQA repository has accumulated many question / answer pairs, one can search the archived questions to find relevant questions and reuse the corresponding answers properly A new question Possible answers ‘resolved’ questions their answers semantically equivalent Question Archives Semantic Gap

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Our first motivation Usually consider the query terms’ global statistics; All terms are treated equally for the input query; Thus it neglects the “local importance” of some term for a specific query Query q: t 1,t 2,…t n ? Archived Question d Question Collections

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Query term’s local importance For a particular question to search, there are some essential meanings hidden in its local context. Q: What are some of the best asian buffets restaurants in the ChicagoLand area? the terms ‘buffets’ and ‘ChicagoLand’ which reflect the ‘essence’ of the local query should be more informative than the other peripheral terms.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification How we measure the local importance? In CQAs, a question usually corresponds to a predefined category, which contains questions with the same topic; The terms with significance in classification is also helpful in retrieval; Thus we obtain the term’s local importance (weight) from the dynamic classification process. good thin crust pizza dillivery in chicago?

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Related work Former researches have proved that question retrieval can be improved by the category information Cao et al., 2010; Ming et al., 2010; Cao et al., 2012 However, they often utilize the category information of the archived questions, not from the real- time classication of the query question.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The second motivation The textual similarity Sim(q, d 1 ) > Sim(q,d 2 ) While q and d 2 share the same category information, d 2 is more relevant to the query.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Question Reranking we propose a reranking method for post processing the retrieval scores, ensuring that all semantic related questions will receive similar scores, further improve the retrieval performance. No other research deals with the question reranking in CQA retrieval !

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Workflow of Our Approach Hierarchical Classification Select out informative query terms Result reranking

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The Contributions of Our Paper  we investigate the techniques of term selection and weighting for question retrieval based on a novel question classification approach;  we propose a general cQA question retrieval model to integrate both of global statistic and local context information;  we are the first to propose a reranking method for question retrieval.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The hierarchical classification method Some Notations: A (i): ancestor of node i, A + (i)= A (i) ∪ i; C (i): children of node i; D (i): descendants of node i; S (i): siblings of node i; Input x ∈ X R d, Output y ∈ Y ={1,2,…,m} Hierarchical SVM: learn m classifiers, each classifier w i ∈ R d for one category node i, i=1,2,…,m

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Hierarchical SVM For hierarchical SVM, we solve the following problem: :slack variables, > 1 for the linearly inseparable cases. (1)

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The hybrid Regularization  Sparse( the first term): use a small subset of compact features which contribute the classifying most  Orthogonal (the second and third terms): make category nodes on different hierarchy level using distinct features to enhance the discriminative power of classifiers;

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Term Selection and Weighting The query was automatically classified to the “Dining out->united States->Chicago” category label path; root Dining Out United States ChicagoNew York Great Britain LondonSports pizzachicago... root Dining Out United states Chicago Term Weight: pizza: 0.5; chicago: 0.7 …… Query: good thin crust pizza dilivery in chicago Hyperplane normal vector

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The ranking score we use a Gibbs-like function to transform the raw weight ψ to be a probability distribution for the local weight of informative query terms, the ranking score of archived question d for query q is Local importance Term frequency Global statistic, can be obtained from the Okapi, LM, TrLM model

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Question Reranking Hypothesis: given the same request, closely related documents should have similar scores Let y={y 1,…,y n } be the initial retrieval score, f={f 1,…,f n } be the regularized score, then minimize S(f): inter-document(question) consistency We linearly combine the two similarities to compute S(f), ε (f): consistency between new scores and old scores

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Setup Question Archives: > 1,000,000 questions from Yahoo! Answers We select 120 questions as queries, – 20 validation queries for tuning parameters, – the rest 100 queries for evaluating the retrieval performance. The relevance of 4147 returned questions by different models is judged by volunters, every result is labeled by at least two people. We use all dataset to train the classifiers, and tune the best performance via cross validation.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Results Evaluate the effect of different components of our model

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Results Compare to methods also utilize the category information LS[7]: using leaf category for query term smoothing; DE[23]: term weighting by exploring three domain evidences of archived questions; QC[9]: multiplying the probabilities of these categories to the initial score in historical questions.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Different Similarities for inter-question consistency In the result reranking method, the consistency between question document i and j is: VecSim denotes document vector similarity CatSim denotes category based similarity Using the combined similarity is significantly better than using either of them alone.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The Effect of Sparse Regularization MAP for the L 2 and the proposed hybrid regularization in the classication framework, using the hybrid regularization term we get sparse classifier parameters(a small subset of indeed helpful features). Thus we can focus on highlighting the really informative query terms for clarifying the ‘essence’ of the search.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Conclusion In this paper, we proposed a sparse hierarchical classification method to mimic the manual labeling process for clarifying the local importance of the query term. We presented a general question retrieval framework for improving the retrieval performance. We are the first to propose a reranking method for smoothing the related questions of the initial results.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification