Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,

Slides:



Advertisements
Similar presentations
Panos Ipeirotis Stern School of Business
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Yansong Feng and Mirella Lapata
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Choosing a Topic and Developing Research Questions
Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.
A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.
Applying Crowd Sourcing and Workflow in Social Conflict Detection By: Reshmi De, Bhargabi Chakrabarti 28/03/13.
Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California,
Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Evaluating Search Engine
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Crowdsourcing research data UMBC ebiquity,
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Measuring Information Architecture Marti Hearst UC Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
Basics of Good Documentation Document Control Systems
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
How to Critically Review an Article
Logging the Search Self-Efficacy of Amazon Mechanical Turkers Henry Feild* (UMass) Rosie Jones* (Akamai) Robert Miller (MIT) Rajeev Nayak (MIT) Elizabeth.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Web 2.0: Concepts and Applications 6 Linking Data.
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
Document Control Basics of Good Documentation and
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
EBSCOhost Integrated Search (EHIS) Available Now.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Universit at Dortmund, LS VIII
Small business owners Research April 5 – April 12, 2012 Total Respondents: 500 Research powered by:
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
EQ: How can we learn the basics of formatting a college research paper in Microsoft Word? Mini Unit: Typing a Paper Diogene Date: 4/20/2015 Course: ELA-Grade.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
El Cole WALT: Controlled assessment improvement and preparation about your School WILF: Grade E - detailed description of school, opinions and justifying.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Why document? Documentation is an integral part of the development cycle Benefits users using the system Acts as resource for the helpdesk supporting.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Made by teacher Vera Aexandrovna Kartashova 2014/15
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
International English Language Testing System (IELTS) Listening WritingSpeaking.
Conducting Behavioral Research on Amazon‘s Mechanical Turk
Thanks to Bill Arms, Marti Hearst
Information Retrieval and Web Design
Presentation transcript:

Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley, Powerset, Inc. ACL-08: HLT

Michael Kaisser, Marti Hearst and John B. Lowe Talk Outline  How best to display search results?  Experiment 1: Is there a correlation between response type and response length?  Experiment 2: Can humans predict the best response length?  Summary and Outlook

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Motivation  Web Search result listings today are largely standardized; display a document’s surrogate (Marchionini et al., 2008)  Typically: One header line, two lines text fragments, one line for URL:  But: Is this the best way to present search results? Especially: Is this the optimal length for every query? (Source: Yahoo!)

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Research Question Do different types of queries require responses of different lengths? (And if so, is the preferred response type dependent on the expected semantic response type?)

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Setup Data used:  12,790 queries from Powerset’s query database Contains search engines’ query logs and hand crafted queries disproportionally large number of natural language queries

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Setup Disproportionally large number of natural language queries. Examples: “date of next US election” Hip Hop A synonym for material highest volcano What problems do federal regulations cause? I want to make my own candles industrial music

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk  Amazon web services API for computers to integrate "artificial artificial intelligence"  requesters can upload Human Intelligence Tasks (HITs)  Workers work on these HITs and are paid small sums of money  Examples: can you see a person in the photo? is the document relevant to a query? is the review of this product positive or negative?

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk  Amazon web services API for computers to integrate "artificial artificial intelligence"  requesters can upload Human Intelligence Tasks (HITs)  Workers work on these HITs and are paid small sums of money  Mechanical Turk is/can also be seen as a platform for online experiments

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 Turkers are asked to classify queries by Expected response type Best response length Each query is done by three different subjects.

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Results  Distribution of length categories differs across individual expected response categories.  Some results are intuitive : Queries for numbers want short results Advice queries want longer results  Some results are more surprising: Different length distributions for Person vs. Organization

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Research Question Can human judges correctly predict the preferred result length?

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup  Experiment 1 produced 1099 high-confidence queries (where all three turkers agreed on semantic category and length)  For 170 of these turkers manually created snippets from Wikipedia of different lengths: Phrase Sentence Paragraph Section Article (in this case a link to the article was displayed) Note: Categories differ slightly from first experiment

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Manually created snippets from Wikipedia of different lengths:

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Displayed: Instructions Query One response from one length category Rating scale Each Hit was shown to ten turkers.

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Instructions: Below you see a search engine query and a possible response. We would like you to give us your opinion about the response. We are especially interested in the length of the response. Is it suitable for the query? Is there too much or not enough information? Please rate the response on a scale from 0 (very bad response) to 10 (very good response).

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

ACL-08: HLT Experiment 2 – Significance SlopeStd. Errorp-value Phrase < Sentence < Paragraph < Article < Michael Kaisser, Marti Hearst and John B. Lowe Significance results of unweighted linear regression on the data for the second experiment, which was separated into four groups based on the predicted preferred length.

ACL-08: HLT Experiment 2 – Details  146 queries  5 length categories per query  10 judgments per query  = 7,300 judgments  124 judges  16 judges did more than 146 hits  2 of these 16 were excluded (scammers)  $0.01 per judgment  $73 paid at judges, plus $73 Amazon fees  $146 for Experiment 2 (excluding snippet generation) Michael Kaisser, Marti Hearst and John B. Lowe

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Results:  Human judges can predict the preferred result lengths (at least for a subset of especially clear queries) Experiment 2 – Results

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Results:  Human judges can predict the preferred result lengths (at least for a subset of especially clear queries)  Standard results listings are often too short (and sometimes too long) Experiment 2 – Results

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Outlook Can queries be automatically classified according to their predicted result length? Initial Experiment:  Unigram word counts  805 training queries, 286 test queries  Three length bins (long, short, other)  Weka NaiveBayesMultinomial Initial Result:  78% of queries correctly classified

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Thank you!

ACL-08: HLT MT Demographics - Age Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog:

ACL-08: HLT MT Demographics - Gender Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog:

ACL-08: HLT MT Demographics - Education Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog:

ACL-08: HLT MT Demographics - Income Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog:

ACL-08: HLT MT Demographics - Purpose Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog:

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk Example HIT (not ours):