David Kellen, Henrik Singmann, Sharon Chen, and Samuel Winiger

Slides:

Advertisements

Similar presentations

Estimating the dose-toxicity curve from completed Phase I studies Alexia Iasonos, Irina Ostrovnaya Department of Biostatistics Memorial Sloan Kettering.

Advertisements

The Impact of Criterion Noise in Signal Detection Theory: An Evaluation across Recognition Memory Tasks Julie Linzer David Kellen Henrik Singmann Karl.

7-2 Estimating a Population Proportion

Chapter 10: Estimating with Confidence

Writing Goals and Objectives EDUC 490 Spring 2007.

Overview of Robust Methods Analysis Jinxia Ma November 7, 2013.

Measurement and Data Quality

Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

User Study Evaluation Human-Computer Interaction.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Introduction We present a package for the analysis of (non-hierarchical) Multinomial Processing Tree (MPT) models for the statistical programming language.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.

Confidence intervals and hypothesis testing Petter Mostad

Quentin Frederik Gronau, Axel Rosenbruch, Paul Bacher, Henrik Singmann, David Kellen Poster presented at the TeaP, Gießen (2014) Validating a Two-High.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Chapter 2: Signal Detection and Absolute Judgement

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Fuzzy Signal Detection Theory: ROC Analysis of Stimulus and Response Range Effects J.L. Szalma and P.A. Hancock Department of Psychology and Institute.

Estimating standard error using bootstrap

Bootstrap – The Statistician’s Magic Wand

Psychology Part 1 : Download a Specification AS & A Level Part 2 : Making.

Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Quentin Frederik Gronau1

Evaluation of measuring tools: validity

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.

CHAPTER 8 Estimating with Confidence

Statistical Models for Automatic Speech Recognition

Alison Burros, Nathan Herdener, & Mei-Ching Lien

Origins of Signal Detection Theory

A Normalized Poisson Model for Recognition Memory

Henrik Singmann Karl Christoph Klauer Sieghard Beller

Henrik Singmann Karl Christoph Klauer Sieghard Beller

Henrik Singmann David Kellen Karl Christoph Klauer

CHAPTER 8 Estimating with Confidence

Marty W. Niewiadomski University of Toronto at Scarborough

Discrete Event Simulation - 4

CHAPTER 8 Estimating with Confidence

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Using Ensembles of Cognitive Models to Answer Substantive Questions

10701 / Machine Learning Today: - Cross validation,

Measuring Belief Bias with Ternary Response Sets

Confidence Intervals: The Basics

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Signal detection theory

CHAPTER 8 Estimating with Confidence

Chapter 8: Estimating with Confidence

Lecture Slides Elementary Statistics Twelfth Edition

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

CHAPTER 8 Estimating with Confidence

Chapter 8: Estimating with Confidence

Henrik Singmann (University of Warwick)

Deciding the mixed-mode design WP1

Metacognitive Failure as a Feature of Those Holding Radical Beliefs

Presentation transcript:

David Kellen, Henrik Singmann, Sharon Chen, and Samuel Winiger Assumption Violations in Forced-Choice Recognition Judgments Implications from the Area Theorem David Kellen, Henrik Singmann, Sharon Chen, and Samuel Winiger

Response bias: What is one’s overall tendency to answer «old»? Study list Old item New item lion sea One challenge in recognition-memory research is disentangling discriminability and response biases: Discriminability: How well can one distinguish between old and new items? Response bias: What is one’s overall tendency to answer «old»?

Latent strength / familiarity Signal Detection Theory (SDT) provides a solution to this problem Old-New ROC Latent strength / familiarity

In order to sidestep the issue of response bias, researchers often rely on a two-alternative forced choice (2-AFC) paradigm: Left Item Old

Right-Left Difference SDT assumes that individuals evaluate difference between two options. Traditional assumption: Individuals choose most familiar option (MAX rule) Left Item Old Right-Left Difference

Old/new and 2-AFC judgments are connected by the Area Theorem: Proportion of correct 2-AFC judgments (when applying MAX rule) corresponds to area under old-new ROC. This prediction is non-parametric.

Some researchers question empirical accuracy of Area Theorem (Hockley et al., 1984; Jou et al., 2016; Starns et al., 2017) They argue that in some 2-AFC trials, individuals make response based on first item they process (most often left one). If this is generally true, then Area Theorem should be violated. This situation would upend SDT implementations in many domains such as eyewitness testimony.

Goal of present work is to check whether one can observe violations of Area Theorem. To do this, we will analyze 63 participants from Jang, Wixted, and Huber (2009).

Individual-level data (i. e Individual-level data (i.e., no pooling approach) analyzed with 3 SDT models: Baseline model: Gaussian SDT model jointly fitted to both old/new and 2-AFC judgments. Always assumes MAX rule in latter case. Extended Model 1 For proportion 𝝎 of 2-AFC trials, individuals evaluate left item separately. For those items, response is made based on same kind of information and confidence criteria used in O/N judgments Extended Model 2 For proportion 𝝎 of 2-AFC trials individuals evaluate left item separately. If left item surpasses old/new response criterion, then old response response is made based on same kind of information and confidence criteria used in O/N judgments. If left item does NOT surpass old/new response criterion, then response is based on MAX rule.

Contamination of 2-AFC judgments with old/new judgments is expected to reduce 2AFC performance and distort 2-AFC ROC.

All models were fit to individual-level data using maximum-likelihood All models were fit to individual-level data using maximum-likelihood. Difference in summed misfit was not siginificant (both p > .43)

Baseline model gave a good account of 2-AFC performance.

In fact, baseline model was able to make very good predictions regarding 2-AFC performance based on old/new judgments alone line: predicted 2AFC ROC from fit to old/new data points: observed 2AFC data

We conducted a power analysis using semi-parametric bootstrap: 1 – generate bootstrap sample from individual data 2 – fit with extended model 3 – set w=.20 and generate new data 4 – compute difference in fit between baseline and extended models 5 – Repeat steps 1 to 4 for each participant and compute summed difference in fit (and p-value) 6 – Repeat 1-5 1000 times. For both models, statistical power (for α = .05) was .97 and .99.

Previous work (e. g. , Hockley, 1984; Jou et al. , 2016; Starns et al Previous work (e.g., Hockley, 1984; Jou et al., 2016; Starns et al., 2017) reported evidence suggesting that SDT assumption of MAX rule are violated. However, none of these studies fitted a model that directly captured assumption violations. Present results show that baseline SDT model provides good account of data, providing no support for old/new contamination of 2AFC data.

We collected new data in which tested Generalized Area Theorem We collected new data in which tested Generalized Area Theorem. According to this theorem, proportion of correct judgments in an m-AFC task is an estimate of (m-1)th moment of old-new ROC. This means that we can construct old-new ROC based on forced-choice judgments alone and compare it with old/new judgments.

We collected data from 103 participants online (via Figure-Eight) We collected data from 103 participants online (via Figure-Eight). After study phase, each participant responded to old/new trials and m-AFC trials, from m=2 to m=8 (5 trials each)

Thank you.

Other Studies Starns et al., 2017 Jou et al., 2016 Eye-tracking study with words relatively far apart Cost for participants (time) for considering both word in each trial. Jou et al., 2016 Semantically related distractors, not base paradigm