Signal Detection Theory October 10, 2013
Some Psychometrics! Response data from a perception experiment is usually organized in the form of a confusion matrix. Data from Peterson & Barney (1952) Each row corresponds to the stimulus category Each column corresponds to the response category
Detection In a detection task (as opposed to an identification task), listeners are asked to determine whether or not a signal was present in a stimulus. For example--do the following clips contain release bursts? Potential response categories: SignalResponse Hit: Present (in stimulus)“Present” Miss: Present“Absent” False Alarm:Absent“Present” Correct Rejection:Absent“Absent”
Confusion, Simplified For a detection task, the confusion matrix boils down to just two stimulus types and response options… (Response Options) PresentAbsent PresentHitMiss AbsentFalse AlarmCorrect Rejection (Stimulus Types) Notice that a bias towards “present” responses will increase totals of both hits and false alarms. Likewise, a bias towards “absent” responses will increase the number of both misses and correct rejections.
Canned Examples From the text: in session 1, listeners are rewarded for “hits”. The resultant confusion matrix looks like this: PresentAbsent Present8218 Absent4654 The “correct” responses (in bold) = = 136
Canned Examples In session 2, the listeners are rewarded for “correct rejections”… PresentAbsent Present5545 Absent1981 The “correct” responses (in bold) = = 136 Moral of the story: simply counting the number of “correct responses” does not satisfactorily tell you what the listener is doing… And response bias is not determined by what they can or cannot perceive in the signal.
Detection Theory Signal Detection Theory: a “parametric” model that predicts when and why listeners respond with each of the four different response types in a detection task. “Parametric” = response proportions are derived from underlying parameters Assumption #1: listeners base response decisions on the amount of evidence they perceive in the stimulus for the presence of a signal. Evidence = gradient variable. perceptual evidence
The Criterion Assumption #2: listeners respond positively when the amount of perceptual evidence exceeds some internal criterion measure. perceptual evidence criterion ( ) “present” responses “absent” responses evidence > criterion “present” response evidence < criterion “absent” response
The Distribution Assumption #3: the amount of perceived evidence for a particular stimulus includes random variation… and the variation is distributed normally. perceptual evidence FrequencyFrequency The categorization of a particular stimulus will vary between trials.
Normal Facts The normal distribution is defined by two parameters: mean (= “average”) ( ) standard deviation ( ) The mean = center point of values in the distribution The standard deviation = “spread” of values around the mean in the distribution. standard deviation standard deviation
Comparisons Assumption #4: responses to both “absent” and “present” stimuli in a detection task will be distributed normally. Generally speaking: the mean of the “present” distribution will be higher on the evidence scale than that of the “absent” distribution. Assumption #5: both “absent” and “present” distributions will have the same standard deviation. (This is the simplest version of the model.)
Interpretation correct rejections false alarms misseshits criterion Important: the criterion level is the same for both types of stimuli… …but the means of the two distributions differ
Sensitivity The distance (on the perceptual evidence scale) between the means of the distributions reflects the listener’s sensitivity to the distinction. Q: How can we estimate this distance? A: We measure the distance of the criterion from each mean. We can use z-scores to standardize our distance measures! In normal distributions, this distance: determines the proportion of responses on either side of the criterion
Z-Scores Example 1: criterion at the mean Z-score = 0 50% hits, 50% misses Hits Misses
Z-Scores Example 2: criterion one standard deviation below the mean Z-score = % hits, 15.9% misses Hits Misses
Z-Scores Note: P(Hits) = 1-P(Misses) z(P(Hits)) = z(1-P(Misses)) = -z(P(Misses)) In this case: z(84.1) = -z(15.9) = 1 Hits Misses
D-Prime D-prime (d’) is a measure of sensitivity. = perceptual distance between the means of the “present” and “absent” distributions. This perceptual distance is expressed in terms of z- scores. d’ ss nn
D-Prime d’ ss nn Hits d’ combines the z-score for the percentage of hits…
D-Prime z(P(H)) ss nn Hits d’ combines the z-score for the percentage of hits… with the z-score for the percentage of false alarms. False Alarms -z(P(FA)) d’ = z(P(H)) - z(P(FA))
D-Prime Examples 1.PresentAbsent Present8218 Absent4654 d’ = z(P(H)) - z(P(FA)) = z(.82) - z(.46) = (-.1) = PresentAbsent Present5545 Absent1981 d’ = z(P(H)) - z(P(FA)) = z(.55) - z(.19) = (-.878) = Note: there is no absolute meaning to the value of d-prime Also: NORMSINV() is the Excel function that converts percentages to z-scores. (qnorm() works in R)
Near Zero Correction Note: the z-score is undefined at 100% and 0%. Fix: replace perfect scores with a minimal deviation from the limit (.5% or 99.5%) PresentAbsent Present1000 Absent7228 d’ = z(P(H)) - z(P(FA)) = z(.995) - z(.72) = = 1.99
Near Zero Correction Also note that we do not normally deal with sets of responses that total to 100 in our experimental data! Here’s another example of the “fix” in which perfect scores are replaced with scores that are just half a response unit above or below the minimum and maximum scores, respectively. PresentAbsent Present200 Absent614 Replace 20 with 19.5, so P(H) = 19.5/20 =.975 d’ = z(P(H)) - z(P(FA)) = z(.975) - z(.3) = (-.52) = 2.48
Calculating Bias An unbiased criterion would fall halfway between the means of both distributions. No bias (λ u ): P (Hits) = P (Correct Rejections) Bias (λ b ): P (Hits) != P (Correct Rejections) u b
Calculating Bias Bias = distance (in z-scores) between the ideal criterion and the actual criterion Bias ( ) = -1/2 * (z(P(H)) + z(P(FA))) u b
For Instance Let’s say: d’ = 2 An unbiased criterion would be one standard deviation from both means… z(P(H)) = 1z(P(FA)) = -1 z(P(H)) = 1 P(H) = 84.1% z(P(FA)) = -1 P(FA) = 15.9% Bias ( ) = -1/2 * (z(P(H)) + z(P(FA))) = -1/2 * (1 + (-1)) = -1/2 * (0) = 0
Wink Wink, Nudge Nudge Now let’s move the criterion over 1/2 a standard deviation… z(P(H)) = 1.5z(P(FA)) = -.5 z(P(H)) = 1.5 P(H) = 93.3% (cf. 84.1%) z(P(FA)) = -.5 P(FA) = 30.9%(cf. 15.9%) Bias ( ) = -1/2 * (z(P(H)) + z(P(FA))) = -1/2 * (1.5 + (-.5)) = -1/2 * (1) = -.5
Calculating Bias: Examples 1. PresentAbsent Present8218 Absent4654 = -1/2 * (z(P(H)) + z(P(FA)) = -1/2 * (z(.82) + z(.46)) = - 1/2 * ( (-.1)) = PresentAbsent Present5545 Absent1981 = -1/2 * (z(P(H)) + z(P(FA)) = -1/2 * (z(.55) + z(.19)) = - 1/2 * ( (-.878)) =.376 The higher the criterion is set, the more positive this number will be.
Peach Colo(u)rs Listeners could replay stimuli as many times as they liked. Order of pictures was counterbalanced across presentations.
Target identification significantly better than chance (p <.001) Difference in accuracy between IDS and ADS utterances was nearly signification (p =.056).
In terms of sensitivity (d’): Sensitivity significantly greater in IDS utterances! (p =.003) The properties of Infant-directed speech provide cues to syntactic disambiguation.
In terms of bias ( ): IDS utterances induced a significantly greater bias towards NV responses (p =.032) Why? Perhaps duration differences between utterance types provide a clue…