Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.

Slides:

Advertisements

Similar presentations

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Evaluating Search Engine

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Movie Review Mining and Summarization Li Zhuang, Feng Jing, and Xiao-Yan Zhu ACM CIKM 2006 Speaker: Yu-Jiun Liu Date : 2007/01/10.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.

Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing.

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

1 Blog site search using resource selection 2008 ACM CIKM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.

Post-Ranking query suggestion by diversifying search Chao Wang.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.

DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Search Result Diversification in Resource Selection for Federated Search Date ： 2014/06/17 Author ： Dzung Hong, Luo Si Source ： SIGIR’13 Advisor: Jia-ling.

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Improving Search Relevance for Short Queries in Community Question Answering Date： 2014/09/25 Author ： Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.

Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen

Learning to Rank with Ties

Preference Based Evaluation Measures for Novelty and Diversity

Presentation transcript:

Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR "12 Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1

2 ▪ Introduction ▪ Query Aspect uery Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

Introduction 3 Current web search engines usually provide a ranked list of URLs to answer a query. It’s becoming increasingly insufficient for queries with vague or complex information need. Users have to visit many pages and try to find relevant parts within long pages. The information may be scattered across documents.

Introduction 4 The search result of the original query may not contain enough information for all aspects that users care about, because the search engine returns documents only considering whether a document is relevant to the query keywords. Users may want to know about a topic from multiple aspects. (1)Understanding user intents. (2) Providing relevant information directly to satisfy searchers’ needs, as opposed to relevant pages. Organizing the web content relevant to a query according to user intents would benefit user exploration.

 For better aspect oriented exploration, the information for different query aspects should be as orthogonal as possible.

6 Introduction Multi-aspect oriented query summarization: Provide specific and informative content to users directly and helps for further exploration. Generate a summary for each query aspect. Given 1. Query 2. Set of aspects Information Gathering Summary Generation Leverages the search engine to proactively gather related information. Using the search result of the original and composite query to collect query aspect related information.

7 ▪ I▪ Introduction ▪ Query Aspect ▪ Q▪ Query Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

Query Aspect 8 Define an aspect as a query qualifier - keywords that are added to an original query to form a specific user intent. reviews actors movie Use search engines provide: 1)query suggestion 2)related searches

9 ▪ I▪ Introduction ▪ Q▪ Query Aspect ▪ Query Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

Author expect to generate both specific and informative summary for each aspect instead of a set of documents so that the users could get relevant information directly. Saving Private Ryan Query: 10 Actor ?? (i)A movie page covers information about new Steven Spielberg movie 'Saving Private Ryan' including actors, film makers and behind the scenes. (ii)Saving Private Ryan cast are listed here including the Saving Private Ryan actresses and actors featured in the film. (iii) The actors of Saving Private Ryan are Tom Hanks as Captain and several men Edward Burns, Barry Pepper... General information-Not specific to the desired aspect. It doesn’t provide relevant information directly. It is both specific and informative-provides direct answers to the desired aspect. (the names of the actors)

11 Information Gathering Collect more aspect related data which may be not contained in original query’s search result. Original query ( Q )Saving Private Ryan Aspect ( A k )actors Composite query ( Q+A k )Saving Private Ryan actors The search result of Q (C Q ) provides overall information about the query Search result of Q + A k ( C Q+Ak ) provides the information about the aspect A k of the query Q. The search result of A k ( C Ak ) provides information about the aspect itself which is query independent

12 ▪ I▪ Introduction ▪ Q▪ Query Aspect ▪ Query Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

13 Summary Generation The words in search results could be divided into 3 categories: Query Common Words Query Aspect Words Occur frequently across multiple aspects “movie”, “TV”, “IMDB” for “Saving Private Ryan” Provide information for an aspect “cast”, “list” and “Tom Hanks” for the aspect “actors” Global Background Words Distribute heavily on the Web Stop words θBθB θkθk θGθG

14 Summary Generation The generative process could be seen as 2 steps: 1.Decide whether this word is from θ G 2.Then decide it comes from θ B or θ k. maximizing the log-likelihood of the collection C Q+Ak Query Common WordsQuery Aspect WordsGlobal Background Words

Document 1 (d1) Document 2 (d2) Document 3 (d3) Document 4 (d4) Document 5 (d5) where d i t = 1 if term t in d i d i t = 0 otherwise t d Maximum Likelihood Estimation : p i = (∑ t d i t ) / N P(w|θ G ) = 0.2 d1 * 0.1 d2 * 0.3 d3 * 0.2 d4 * 0.2 d5 Training !!

16 Search result W 1 : 3 W 2 : 2 W 3 : 2 W 1 : 1 W 2 : 0 W 3 : 1 Search result W 1 : 0 W 2 : 0 W 3 : 1

Expectation Maximization(EM) Algorithm Training Query Aspect Words Model( Θ k ) Find Optimal Parameter 17 Distinguish the query aspect words from the query common words. The words with high probabilities in θ k represent the specific query aspect better.

18 Modeling Search Result Sentence Selection Summary Generation

Candidate Sentences Ranking: specificity redundancy informativeness The top ranked sentences are used as a summary for the desired aspect. Specificity: The actors of ″ Saving Private Ryan″ include Tom Hank, Edward Burns p(t1|θ B ) * p(t2|θ B ) * P(t3|θ B ) * ….. p(t1|θ k ) * p(t2|θ k ) * P(t3|θ k ) * ….. p(t1|θ G ) * p(t2|θ G ) * P(t3|θ G ) * …..  The selected candidate sentences are more specific to the desired aspect. 19

20 Redundancy: 1. Each single sentence is initiated as a cluster. 2. If two clusters are close enough, they are merged. 3. This procedure repeats until the smallest distance between all remaining clusters is larger than a threshold. (0.7) Step 1kitten → sitten (substitution of 's' for 'k') Step 2sitten → sittin (substitution of 'i' for 'e') Step 3sittin → sitting (insertion of 'g' at the end) String 1: "kitten“ String 2: "sitting" Edit distance between “kitten“ and ”sitting” : 3 Similarity: 1/distance =0.33 S1 S3 S4 S6 S2 S7 S5 U(S1).size = 4 U(S7).size = 3

21 Informativeness: QDW: query dependent aspect words : EX: “Tom Hanks” and “Edward Burns” for aspect “actors”. “actor”, “actress”, and “cast” for aspect “actors”. {term|term ∈ C Q+Ak and term ∉ C Ak } β set: 0 ✔ ✘

22 ▪ I▪ Introduction ▪ Q▪ Query Aspect ▪ Q▪ Query Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

23 Experiment Data Sets: Wikipedia (January 2011) Ex: ″Saving Private Ryan ″ ″Plot″, ″Cast″, ″Product″ Sample: 1,000 pages(queries) TREC 50 topics Each topic has 3 to 8 subtopics titlelist of sub section(subheadings) [query] [query aspect]

24 Experiment Compare: evaluate the multi-aspect oriented query summarization.  Proposed method – Q-Composite  Baseline 1 – Ling-2008  Baseline 2 – Snippet Send both the original query and the composite queries to search engine and get the search result documents and snippets. Using ROUGE-1 tool for evaluation on Wikipedia data Higher value → highly consistent with human judgements Need index summaries from TREC and build a small search engine. Judge the quality of generated sentences manually Specific Informative

25 Asking labelers to label the generated sentences for 50 queries. For each system and each query aspect, the labelers had to evaluate the top 3 ranked sentences.  The normalized Discounted Cumulative Gain (nDCG) is used to evaluate the performance.

26 Experiment word number of summary length limit

27 Q-Composite : β = 0 Q-Composite-AVG : β = 0.5 Generate Top 1 Sentence !!

2828 ▪ I▪ Introduction ▪ Q▪ Query Aspect ▪ Q▪ Query Summarization in Multi-Aspects I nformation Gathering S ummary Generation ▪E▪Experiment ▪C▪Conclusion

29 Conclusion  Multi-aspect oriented query summarization task aims to summarize a query from multiple aspects which are aligned to user intents.  Information gathering phase : proposed a composite query based strategy, which proactively gets information based on the search engine.  Summary generation phase : took into consideration the specificity, informativeness and redundancy for sentence selection.  The proposed method attempts to directly provide well organized and relevant information to users, as opposed to relevant documents.