Download presentation
Presentation is loading. Please wait.
Published byAshlynn Welch Modified over 9 years ago
1
Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore, PhD, CCC-SLP 2014 ASHA Convention Orlando, Florida
2
Disclosure The individuals presenting this information are involved in recruiting individuals to complete tasks through AMT or other online platforms. This session may focus on one specific approach, with limited coverage of other alternative approaches. Portions of the research were supported by funding from IES. No other conflicts to disclose.
3
Crowdsourcing for CSD research Case study 2: Obtaining speech ratings Tara McAllister Byun
4
Challenges of obtaining speech ratings Large proportion of speech research, particularly on interventions for speech disorders, involves collection of blinded listeners’ ratings of speech accuracy or intelligibility. Multistep process: Identify potential raters Provide training and/or administer eligibility test Collect ratings Compare raters against each other to establish reliability Can be lengthy, frustrating, expensive.
5
Questions about AMT for speech research IRB issues? Must consider rights of patients/participants whose speech samples will be shared for rating, as well as AMT workers acting as raters. Can't control playback volume, headphone quality, background noise Listeners are nonexpert But previous research suggests that with enough raters, crowdsourced responses will converge with experts’. This study: What is the level of agreement between crowdsourced ratings of speech and ratings obtained from more experienced listeners?
6
Protocol Stimuli: 100 /r/ words collected from 15 children with /r/ misarticulation over course of treatment Roughly half rated correct based on mode across 3 SLP listeners External HIT developed and hosted on Experigen (Becker & Levine, 2010) Training: 20 items with feedback Task: 100 WAV files in random order
7
Raters AMT: 203 listeners, US IP addresses, self-reported native speakers of American English. Received $0.75 for 100-word sample. Ratings were completed in 23 hours. 50 listeners discarded for failure to pass attentional catch trials. Final n = 153. Trained listeners: 26 listeners, self-reported native speakers of American English. Recruited through listservs, social media, conference announcements. All had previous training in CSD: 21/26 reported MS or higher Entered in drawing for $25 gift card. Responses collected over 3 months. 1 listener failed to pass quality control measures; final n = 25.
8
Results Strong correlation between % of experienced listeners, % AMT raters scoring a given item as correct (r =.98). Mode across raters in a group differed for only 7 items. Both groups have poor agreement for some items. AMT listeners slightly more lenient than experienced listeners.
9
Conclusions In a binary rating task, the mode across a large group of AMT listeners yielded the same response as the mode across a smaller group of experienced listeners for 93/100 items. Possible that untrained listeners' judgments may be more naturalistic, functional than trained listeners'. We advocate for further evaluation and awareness of crowdsourcing for speech data rating.
10
Questions? Interested in trying AMT? tara.byun@nyu.edu SADLOF@mailbox.sc.edu mimoore@mail.wvu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.