Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,

Similar presentations


Presentation on theme: "Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,"— Presentation transcript:

1 Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley, Powerset, Inc. ACL-08: HLT

2 Michael Kaisser, Marti Hearst and John B. Lowe Talk Outline  How best to display search results?  Experiment 1: Is there a correlation between response type and response length?  Experiment 2: Can humans predict the best response length?  Summary and Outlook

3 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Motivation  Web Search result listings today are largely standardized; display a document’s surrogate (Marchionini et al., 2008)  Typically: One header line, two lines text fragments, one line for URL:  But: Is this the best way to present search results? Especially: Is this the optimal length for every query? (Source: Yahoo!)

4 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Research Question Do different types of queries require responses of different lengths? (And if so, is the preferred response type dependent on the expected semantic response type?)

5 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Setup Data used:  12,790 queries from Powerset’s query database Contains search engines’ query logs and hand crafted queries disproportionally large number of natural language queries

6 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Setup Disproportionally large number of natural language queries. Examples: “date of next US election” Hip Hop A synonym for material highest volcano What problems do federal regulations cause? I want to make my own candles industrial music

7 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk  Amazon web services API for computers to integrate "artificial artificial intelligence"  requesters can upload Human Intelligence Tasks (HITs)  Workers work on these HITs and are paid small sums of money  Examples: can you see a person in the photo? is the document relevant to a query? is the review of this product positive or negative?

8 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk  Amazon web services API for computers to integrate "artificial artificial intelligence"  requesters can upload Human Intelligence Tasks (HITs)  Workers work on these HITs and are paid small sums of money  Mechanical Turk is/can also be seen as a platform for online experiments

9 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 Turkers are asked to classify queries by Expected response type Best response length Each query is done by three different subjects.

10 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

11 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 1 – Results  Distribution of length categories differs across individual expected response categories.  Some results are intuitive : Queries for numbers want short results Advice queries want longer results  Some results are more surprising: Different length distributions for Person vs. Organization

12 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Research Question Can human judges correctly predict the preferred result length?

13 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup  Experiment 1 produced 1099 high-confidence queries (where all three turkers agreed on semantic category and length)  For 170 of these turkers manually created snippets from Wikipedia of different lengths: Phrase Sentence Paragraph Section Article (in this case a link to the article was displayed) Note: Categories differ slightly from first experiment

14 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Manually created snippets from Wikipedia of different lengths:

15 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Displayed: Instructions Query One response from one length category Rating scale Each Hit was shown to ten turkers.

16 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Experiment 2 – Setup Instructions: Below you see a search engine query and a possible response. We would like you to give us your opinion about the response. We are especially interested in the length of the response. Is it suitable for the query? Is there too much or not enough information? Please rate the response on a scale from 0 (very bad response) to 10 (very good response).

17 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

18 ACL-08: HLT Experiment 2 – Significance SlopeStd. Errorp-value Phrase-0.8500.044<0.0001 Sentence-0.5500.050<0.0001 Paragraph0.3280.049<0.0001 Article0.8560.053<0.0001 Michael Kaisser, Marti Hearst and John B. Lowe Significance results of unweighted linear regression on the data for the second experiment, which was separated into four groups based on the predicted preferred length.

19 ACL-08: HLT Experiment 2 – Details  146 queries  5 length categories per query  10 judgments per query  = 7,300 judgments  124 judges  16 judges did more than 146 hits  2 of these 16 were excluded (scammers)  $0.01 per judgment  $73 paid at judges, plus $73 Amazon fees  $146 for Experiment 2 (excluding snippet generation) Michael Kaisser, Marti Hearst and John B. Lowe

20 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Results:  Human judges can predict the preferred result lengths (at least for a subset of especially clear queries) Experiment 2 – Results

21 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Results:  Human judges can predict the preferred result lengths (at least for a subset of especially clear queries)  Standard results listings are often too short (and sometimes too long) Experiment 2 – Results

22 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Outlook Can queries be automatically classified according to their predicted result length? Initial Experiment:  Unigram word counts  805 training queries, 286 test queries  Three length bins (long, short, other)  Weka NaiveBayesMultinomial Initial Result:  78% of queries correctly classified

23 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Thank you!

24 ACL-08: HLT MT Demographics - Age Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog: http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

25 ACL-08: HLT MT Demographics - Gender Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog: http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

26 ACL-08: HLT MT Demographics - Education Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog: http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

27 ACL-08: HLT MT Demographics - Income Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog: http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

28 ACL-08: HLT MT Demographics - Purpose Michael Kaisser, Marti Hearst and John B. Lowe Survey, data and graphs from Panos Ipeirotis’ blog: http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

29 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe

30 ACL-08: HLTMichael Kaisser, Marti Hearst and John B. Lowe Excursus – Mechanical Turk Example HIT (not ours):


Download ppt "Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,"

Similar presentations


Ads by Google