An Experimental Comparison of Click Position-Bias Models Nick Craswell Onno Zoeter Michael Taylor Bill Ramsey Microsoft Research.

Slides:

Advertisements

Similar presentations

Super Awesome Presentation Dandre Allison Devin Adair.

Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

A Review for Zoology Class

Hypothesis Testing Steps in Hypothesis Testing:

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Chapter 10 Section 2 Hypothesis Tests for a Population Mean

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.

Evaluating Search Engine

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~

Hypothesis testing & Inferential Statistics

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Experimental Evaluation

How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.

© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Unit 1 THE SCIENTIFIC METHOD & VARIABLES. I. The goal of science  To INVESTIGATE! And UNDERSTAND! The natural world...  To explain events in the natural.

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

Statistics made simple Modified from Dr. Tammy Frank’s presentation, NOVA.

Online Search Evaluation with Interleaving Filip Radlinski Microsoft.

Testing Hypotheses.

The Scientific Method.  Theory  Hypothesis  Research  Support the theory OR Refute/Fail.

Copyright © 2009 Pearson Education, Inc. Chapter 28 Analysis of Variance.

1 CS 178H Introduction to Computer Science Research What is CS Research?

Modern Retrieval Evaluations Hongning Wang

Chapter 1 Introduction and Data Collection

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.

Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.

Chapter 8 Introduction to Hypothesis Testing

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.

Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,

Review of Research Methods. Overview of the Research Process I. Develop a research question II. Develop a hypothesis III. Choose a research design IV.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Scientific Methods and Terminology. Scientific methods are The most reliable means to ensure that experiments produce reliable information in response.

1 Where we are going : a graphic: Hypothesis Testing. 1 2 Paired 2 or more Means Variances Proportions Categories Slopes Ho: / CI Samples Ho: / CI Ho:

Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.

Simple examples of the Bayesian approach For proportions and means.

Chapter 21: More About Tests

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Statistical Techniques

Modern Retrieval Evaluations Hongning Wang

Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.

Steps in the Scientific Method 1.Observations - quantitative - qualitative 2.Formulating hypotheses - possible explanation for the observation 3.Performing.

Some Slides from Art Costa on Effective Questioning Challenge yourself to make thinking skill requirements specific to your students.

Estimating a Population Proportion ADM 2304 – Winter 2012 ©Tony Quon.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.

Introduction to Hypothesis Testing: The Binomial Test

Evaluation Anisio Lacerda.

Unit 3 Hypothesis.

Evaluation of IR Systems

Introduction to Hypothesis Testing: The Binomial Test

Central Limit Theorem, z-tests, & t-tests

A Closer Look at Testing

Click Chain Model in Web Search

Efficient Multiple-Click Models in Web Search

Psychological Research Methods and Statistics

Presentation transcript:

An Experimental Comparison of Click Position-Bias Models Nick Craswell Onno Zoeter Michael Taylor Bill Ramsey Microsoft Research

Position Bias Top-ranked search results get more clicks This position bias occurs because: –...users sometimes blindly click on early results? –...users are less likely to view lower ranks? –...users click the first relevant thing they see? A model for position bias allows: – List data  Debiased evaluation of a result – Per-result data  Evaluate a list

Summary A.Four alternate hypotheses for explaining position bias – Including a `cascade’ model B.A large-scale data gathering effort C.Evaluation: Which model best explains data? – Which models fail and how – Cascade model succeeds, at early ranks D.Conclusions

A. HYPOTHESES

Hypothesis 1: No Bias Our baseline – c di is P( Click=True | Document=d, Position=i ) – r d is P( Click=True | Document=d ) Why this baseline? – We know that r d is part of the explanation – Perhaps, for ranks 9 vs 10, it’s the main explanation – It is a bad explanation at rank 1 e.g. Eye tracking Attractiveness of summary ~= Relevance of result

Hypothesis 2: Blind Clicks There are two types of user/interaction 1.Click based on relevance 2.Click based on rank (blindly) A.k.a. the OR model: – Clicks arise from relevance OR position

Hypothesis 3: Examination Users are less likely to look at lower ranks, therefore less likely to click This is the AND model – Clicks arise from relevance AND examination – Probability of examination does not depend on what else is in the list

Hypothesis 4: Cascade Users examine the results in rank order At each document d – Click with probability r d – Or continue with probability (1-r d )

Cascade Model Example 500 users typed a query 0 click on result A in rank click on result B in rank click on result C in rank 3 Cascade (with no smoothing) says: 0 of 500 clicked A  r A = of 500 clicked B  r B = of remaining 400 clicked C  r C = 0.25 This may seem different from the formulation on the previous slide, but is precisely equivalent

B. DATA COLLECTION

Flipping Adjacent Results Do adjacent flips in the top 10 – 9 types of flip: 1-2, 2-3,..., An “experiment”: query, URL A, URL B, rank m A&B originate from m&m+1, though maybe not that order Equally likely to show AB and BA Controlled experiment: We only vary the position 108 thousand experiments with real users – Because it’s real users, adjacent flips Our experiment requires flips, but our models do not

Our Dataset logodds(p)=log(p/(1-p))

Blind-Click & Examination Hypotheses Are “Broken” Blind-Click: Rank 1 might have 0 clicks Examination: Rank 2 might have 100% clicks Learn our parameters to stay within bounds: – Blind-Click: makes no adjustment – Examination: 2  1 is 3.5%, while 4  3 is 9.0%. Something in rank 2 had c d2 =0.966  Need some other way to stay within bounds

Non-Hypothesis: “Logistic” The shape of the data suggests a Logistic model This is related to logistic regression

Measurement Given click information for AB, predict clicks in order BA: – 4 events : Click B, Click A, click both, click neither 10-fold cross validation

C. RESULTS

Main Results Best possible: Given the true click counts for ordering BA

Results by Rank

Cascade Errors Predictions are closer to diagonal, with less spread Not perfect

D. Conclusions + Future Work Surprisingly, we reject the simple AND/OR – Users do not click randomly on rank 1 – Users do not have a fixed examination curve Cascade model works well – Particularly for 1-2 and 2-3 flips Cascade model is basic. In future could model: – Users who click multiple results – Users who abandon their search – Different types of user or search?

THANK YOU