Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan University 2 Nanjing University of Science and Technology Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Outline 1 The Motivation 2 Existing Solutions & Related Work 3 Our Approach 4 Experimental Analysis 5 Conclusion

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification CQAs cQAs: community Question and Answering services Yahoo! Answers : Multilingual, mainly English Baidu Zhidao : Chinese YA contains more than 10,000,000,000 resolved questions now!

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Community Question Retrieval Since the cQA repository has accumulated many question / answer pairs, one can search the archived questions to find relevant questions and reuse the corresponding answers properly A new question Possible answers ‘resolved’ questions their answers semantically equivalent Question Archives Semantic Gap

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Our first motivation Usually consider the query terms’ global statistics; All terms are treated equally for the input query; Thus it neglects the “local importance” of some term for a specific query Query q: t 1,t 2,…t n ? Archived Question d Question Collections

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Query term’s local importance For a particular question to search, there are some essential meanings hidden in its local context. Q: What are some of the best asian buffets restaurants in the ChicagoLand area? the terms ‘buffets’ and ‘ChicagoLand’ which reflect the ‘essence’ of the local query should be more informative than the other peripheral terms.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification How we measure the local importance? In CQAs, a question usually corresponds to a predefined category, which contains questions with the same topic; The terms with significance in classification is also helpful in retrieval; Thus we obtain the term’s local importance (weight) from the dynamic classification process. good thin crust pizza dillivery in chicago?

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Related work Former researches have proved that question retrieval can be improved by the category information Cao et al., 2010; Ming et al., 2010; Cao et al., 2012 However, they often utilize the category information of the archived questions, not from the real- time classication of the query question.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The second motivation The textual similarity Sim(q, d 1 ) > Sim(q,d 2 ) While q and d 2 share the same category information, d 2 is more relevant to the query.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Question Reranking we propose a reranking method for post processing the retrieval scores, ensuring that all semantic related questions will receive similar scores, further improve the retrieval performance. No other research deals with the question reranking in CQA retrieval !

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Workflow of Our Approach Hierarchical Classification Select out informative query terms Result reranking

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The Contributions of Our Paper  we investigate the techniques of term selection and weighting for question retrieval based on a novel question classification approach;  we propose a general cQA question retrieval model to integrate both of global statistic and local context information;  we are the first to propose a reranking method for question retrieval.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The hierarchical classification method Some Notations: A (i): ancestor of node i, A + (i)= A (i) ∪ i; C (i): children of node i; D (i): descendants of node i; S (i): siblings of node i; Input x ∈ X R d, Output y ∈ Y ={1,2,…,m} Hierarchical SVM: learn m classifiers, each classifier w i ∈ R d for one category node i, i=1,2,…,m

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Hierarchical SVM For hierarchical SVM, we solve the following problem: :slack variables, > 1 for the linearly inseparable cases. (1)

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The hybrid Regularization  Sparse( the first term): use a small subset of compact features which contribute the classifying most  Orthogonal (the second and third terms): make category nodes on different hierarchy level using distinct features to enhance the discriminative power of classifiers;

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Term Selection and Weighting The query was automatically classified to the “Dining out->united States->Chicago” category label path; root Dining Out United States ChicagoNew York Great Britain LondonSports pizzachicago... root 0.10.2... Dining Out 0.50.3... United states 0.20.4... Chicago 0.30.7... 2 1 Term Weight: pizza: 0.5; chicago: 0.7 …… Query: good thin crust pizza dilivery in chicago Hyperplane normal vector

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The ranking score we use a Gibbs-like function to transform the raw weight ψ to be a probability distribution for the local weight of informative query terms, the ranking score of archived question d for query q is Local importance Term frequency Global statistic, can be obtained from the Okapi, LM, TrLM model

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Question Reranking Hypothesis: given the same request, closely related documents should have similar scores Let y={y 1,…,y n } be the initial retrieval score, f={f 1,…,f n } be the regularized score, then minimize S(f): inter-document(question) consistency We linearly combine the two similarities to compute S(f), ε (f): consistency between new scores and old scores

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Setup Question Archives: > 1,000,000 questions from Yahoo! Answers We select 120 questions as queries, – 20 validation queries for tuning parameters, – the rest 100 queries for evaluating the retrieval performance. The relevance of 4147 returned questions by different models is judged by volunters, every result is labeled by at least two people. We use all dataset to train the classifiers, and tune the best performance via cross validation.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Results Evaluate the effect of different components of our model

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Experimental Results Compare to methods also utilize the category information LS[7]: using leaf category for query term smoothing; DE[23]: term weighting by exploring three domain evidences of archived questions; QC[9]: multiplying the probabilities of these categories to the initial score in historical questions.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Different Similarities for inter-question consistency In the result reranking method, the consistency between question document i and j is: VecSim denotes document vector similarity CatSim denotes category based similarity Using the combined similarity is significantly better than using either of them alone.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification The Effect of Sparse Regularization MAP for the L 2 and the proposed hybrid regularization in the classication framework, using the hybrid regularization term we get sparse classifier parameters(a small subset of indeed helpful features). Thus we can focus on highlighting the really informative query terms for clarifying the ‘essence’ of the search.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification Conclusion In this paper, we proposed a sparse hierarchical classification method to mimic the manual labeling process for clarifying the local importance of the query term. We presented a general question retrieval framework for improving the retrieval performance. We are the first to propose a reranking method for smoothing the related questions of the initial results.

CIKM’14 Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification

Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

Similar presentations

Presentation on theme: "Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

Similar presentations

Presentation on theme: "Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan."— Presentation transcript:

Similar presentations

About project

Feedback