Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of.

Slides:



Advertisements
Similar presentations
Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman.
Advertisements

1 Promoting Positive Behavior in Learners Through the Modification of Instructional Antecedents John J. Wheeler, Ph.D. Richard S. Bumbalough Tennessee.
You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.
John Clegg. Contents What is CLIL? CLIL objectives What to assess in CLIL Fairness issue Ways of addressing fairness reduce the language demands of the.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Building Student Independence 1. Staying connected 2.
What makes effective questioning? How can you use questioning to differentiate? Questioning.
8. Evidence-based management Step 3: Critical appraisal of studies
Quantitative vs. Qualitative Research Method Issues Marian Ford Erin Gonzales November 2, 2010.
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
User Testing & Experiments. Objectives Explain the process of running a user testing or experiment session. Describe evaluation scripts and pilot tests.
Understanding the Research Process
Evaluation Research COMT 502. topics Evaluation research –Process of determining whether intervention has its intended result.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,
Presentation Methods HPER )Understand Focus Population Assets & Needs  Needs Assessments  Provide direction for matching objectives to the selection.
Assessing Personality
Distinguishing Language Acquisition From Learning Disabilities April 24, 2014.
© Tefko Saracevic, Rutgers University1 Mediation in librarianship & information retrieval Reference interview Human-human interaction Question negotiation.
RESEARCH DESIGN.
Role of Research The OOPS Survey and Types of Educational Research.
Types of Research (Quantitative and Qualitative) RCS /11/05.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Four Managing Marketing Information. Copyright 2007, Prentice Hall, Inc.4-2 The Importance of Marketing Information  Companies need information.
Copyright 2007, Prentice Hall, Inc. 1 1 Principles of Marketing Fall Term MKTG 220 Fall Term MKTG 220 Dr. Abdullah Sultan Dr. Abdullah Sultan.
Measuring Complex Achievement
ATTRIBUTEDESCRIPTION Focal Knowledge, Skills, Abilities The primary knowledge / skills / abilities (KSAs) targeted by this design pattern. RationaleHow/why.
Experimental Research Methods in Language Learning Chapter 1 Introduction and Overview.
Criteria to assess quality of observational studies evaluating the incidence, prevalence, and risk factors of chronic diseases Minnesota EPC Clinical Epidemiology.
BHV 390 Research Design Purpose, Goals and Time Kimberly Porter Martin, Ph.D.
Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.
For ABA Importance of Individual Subjects Enables applied behavior analysts to discover and refine effective interventions for socially significant behaviors.
Chi-Square test. PRESENTED BY: Dr.Zhian Salah Ramzi Head of community and Family medicine/ sulaimani university.
Critical Thoughts about Critical Thinking Fitchburg State University Center for Teaching and Learning Summer Institute August 14, 2013 Laura M. Garofoli,
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
EXPLORING THE EFFECTIVENESS OF RCR EDUCATION IN THE SOCIAL AND BEHAVIORAL SCIENCES Jim Vander Putten Department of Educational Leadership Amanda L. Nolen.
SHOW US YOUR RUBRICS A FACULTY DEVELOPMENT WORKSHOP SERIES Material for this workshop comes from the Schreyer Institute for Innovation in Learning.
Learning Objectives In this chapter you will learn about the elements of the research process some basic research designs program evaluation the justification.
©2010 John Wiley and Sons Chapter 2 Research Methods in Human-Computer Interaction Chapter 2- Experimental Research.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
(c) 2007 McGraw-Hill Higher Education. All rights reserved. Accountability and Teacher Evaluation Chapter 14.
Evaluation Revisiting what it is... Who are Stakeholders and why do they matter? Dissemination – when, where and how?
General remarks Goal of the presentation Illustration of the business in a concise way Visual support for your pitch The prompts and tips should be addressed.
Continual Service Improvement Methods & Techniques.
Contents 1 Session Goals 1 Session Goals 3 Design Levels 3 Design Levels 2 Design Goals 2 Design Goals 4 Known Issues 4 Known Issues 5 Picking a Specific.
© 2011 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium Introducing the Protocol for Review of Instructional.
1 1 Principles of Marketing Spring Term MKTG 220 Spring Term MKTG 220 Dr. Abdullah Sultan Dr. Abdullah Sultan.
 Criterion D: Knowledge and Understanding of Topic Studied  Criterion E: Reasoned Argument  Criterion F: Application of Analytical and Evaluative skills.
Evaluation Of and For Learning
Assessing Musical Behavior
Assessing Personality
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
EDN 203 Foundations Portfolio
Objective 4.3 Discuss the extent to which findings from a single case can be generalized.
Lesson Planning Direct Instruction
Quantitative vs. Qualitative Research Method Issues
1-1 What is Science? What Science Is and Is Not
Testing Productive Skills
Writing A Review CH. 6.
Assessing Personality
CHAPTER 10 Comparing Two Populations or Groups
Critical Appraisal วิจารณญาณ
CHAPTER 10 Comparing Two Populations or Groups
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Presentation transcript:

Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of California, Santa Cruz

Experiments & Pragmatic Processing Each of the critics reviewed some of the movies. – evidence for EIs, with different response choices - no evidence of EIs but not all ? Depending on the study: Worry: How much do methodologies themselves influence judg e ments? Worry: Are we adequately testing the influence of methodologies on our data? Case Study: (Embedded) Implicatures Previous Limitation: Lack of Subjects and Money Crowd-sourcing addresses both problems

Pragmatics of Experimental Situations Teleological Curiosity - Subjects hypothesizing expected behavior, matching an ideal Worry: How much do methodologies themselves influence judg e ments? Evaluation Apprehension – subjects know they are being judged The experiment itself is part of the pragmatic context See Rosenthal & Rosnow. (1975) The Volunteer Subject.

Elements of Experimental Context e.g. True / False, Yes / No, 1-7 scale Response Structure – Response choices available to the subject Worry: How much do methodologies themselves influence judg e ments? Prompt – the Question Protocol – Social Context / Task Specification directions for the Response Structure Immediate Linguistic/Visual Context Our Goal : Explore variations of these elements in a systematic way

Experimental Design Is this an accurate description? Some of the spices have red lids. Linguistic Contexts – All Relevant, All Irrelevant, No Context Protocol Experimental – normal experiment instructions Annotation – checking the work of unaffiliated annotators 4 Implicature Targets, 6 Some/All Controls, 20 Fillers

Experiment 1: Social Context Focus on Protocol Annotation vs Experiment Population: Undergraduates All – IrrelevantNo StoryAll-Relevant Experiment Annotation Accuracy Prompt - Is this an accurate description? Response Categories - Yes, No, Dont Know

Experiment 1: Social Context Finding : Social context even when linguistic context does not. Linguistic Context: No Effect

Experiment 1: Social Context Finding : Social context even when linguistic context does not. Lower SI rate for Annotation (p<0.05)

Experiment 2 Prompt Type Accuracy Prompt - Is this an accurate description? Response Categories - Yes, No, Dont Know Informativity Prompt - How Informative is this sentence? Response Categories - Not Informative Enough Informative Enough Too Much Information False Population: Mechanical Turk Workers Systematic Debriefing Survey

Experiment 2 Prompt Type Effect for Prompt

Experiment 2 Prompt Type Effect for Prompt (p<0.001) Effect for Context (p<0.001)

Experiment 2 Prompt Type Effect for Prompt (p<0.001) Effect for Context (p<0.001) Weak Interaction: Prompt x Context (p<0.06)

Experiment 2 Prompt Type No Effect for Protocol

Experiment 2 Prompt Type Low SI rates overall But the debriefing survey indicates that (roughly) 70% of participants were aware of some/all contrast

Populations Turkers – More sensitive to Linguistic Context Less sensitive to changes in changes in social context/ evaluation apprehension Undergraduates – More sensitive to Protocol

Take Home Points Methodological variables should be explored alongside conventional linguistic variables – Ideal: models of these processes (cf. Schutze 1996) – Crowdsourcing allows for cheap/fast exploration of parameter spaces New Normal: Dont guess, test. – Controls, norming, confounding … all testable online

A potential check on exuberance Undergraduates may be WEIRD*, but crowdsourcing engenders its own weirdness – High evaluation apprehension – Uncontrolled backgrounds, skillsets, focus levels – Unknown motivations Ignorance does not necessarily mean diversity – This requires study if we rely on such participants more * Heinrich et al. (2010) The Weirdest People in the World? BBS

Acknowledgments Thanks Jaye Padgett and to the attendees of two Semantics Lab presentations and the XPRAG conference for their comments, to the HUGRA committee for their generous award and support, and thanks to Rosie Wilson-Briggs for stimuli construction.