1 Are all questions created equal?: Factors that influence cloze question difficulty. Brooke Soden Hensler Carnegie Mellon University (starting graduate.

Slides:



Advertisements
Similar presentations
Vocabulary Development During the Preschool Years:
Advertisements

Scientifically Informed Web- Based Instruction Financial and Intellectual Support: The William and Flora Hewlett Foundation Carnegie Mellon University.
Progress Monitoring. Progress Monitoring Steps  Monitor the intervention’s progress as directed by individual student’s RtI plan  Establish a baseline.
Using Multiple Choice Tests for Assessment Purposes: Designing Multiple Choice Tests to Reflect and Foster Learning Outcomes Terri Flateby, Ph.D.
Margaret D. Anderson SUNY Cortland, April, Federal legislation provides the guidelines that schools must follow when identifying children for special.
Measuring Referring Expressions in a Story Context Phyllis Schneider, Speech Pathology & Audiology, University of Alberta Denyse Hayward, University of.
Mining Data from Randomized Within-Subject Experiments in an Automated Reading Tutor Joseph E. Beck and Jack Mostow Project LISTEN (
Detecting Prosody Improvement in Oral Rereading Minh Duong and Jack Mostow Project LISTEN Carnegie Mellon University The research.
Conclusion Our prediction model did a good job at predict 8 th grade math proficiency. It can be used to estimate 10 th grade score fairly well, too. But.
Lessons from generating, scoring, and analyzing questions in a Reading Tutor for children Jack Mostow Project LISTEN (
Joint Predictive Probabilities of Oral Reading Fluency for Reading Comprehension Young-Suk Kim & Yaacov Petscher Florida State University & Florida Center.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
I want to test a wound treatment or educational program in my clinical setting with patient groups that are convenient or that already exist, How do I.
Measuring Hint Level in Open Cloze Questions Juan Pino, Maxine Eskenazi Language Technologies Institute Carnegie Mellon University International Florida.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
INTRODUCTION TO SCIENCE & THE
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
TEMPLATE DESIGN © Learning Words and Rules Abstract Knowledge of Word Order in Early Sentence Comprehension Yael Gertner.
Improving the Help Selection Policy in a Reading Tutor that Listens Cecily Heiner, Joseph E. Beck, Jack Mostow Project LISTEN
Assessments Matching Assessments to Standards. Agenda ● Welcome ● What do you think of assessment? ● Overview of all types of evidence ● Performance Tasks.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Time for Multi-State Models of Vocabulary Acquisition? Rob Waring
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Objective The current study examined whether the timing of recovery from late onset of productive vocabulary (e.g., either earlier or later blooming) was.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Form Effects on the Estimation of Students’ Progress in Oral Reading Fluency using CBM David J. Francis, University of Houston Kristi L. Santi, UT - Houston.
Recommendation for English multiple-choice cloze questions based on expected test scores 2011, International Journal of Knowledge-Based and Intelligent.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Research Methods Ass. Professor, Community Medicine, Community Medicine Dept, College of Medicine.
Teacher’s English Proficiency Test (TEPT) and Process Skills Test (PST) in Science and Mathematics TEPT-PST: Overview 2015.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
INFO KLEDO HEALTH Nurul Maretia Rahmayanti Knowledge Center Division Kledo Health BY GRE and GMAT Which one should we take?
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
Making Cloze Exercises Easily and Studying English Effectively Online Kenji Kitao Doshisha University Kenichi Kamiya Osaka Institute.
Using Psychometric Analysis to Drive Mathematics Standardized Assessment Decision Making Mike Mazzarella George Mason University.
Talk Boost A targeted intervention for 4-7 year olds with language delay Wendy Lee Professional Director, The Communication Trust Mary Hartshorne Head.
The Reliability of CBM Reading Growth Estimates for Different Student Groups Joseph F. T. Nese B. Jasmine Park Aki Kamata Julie Alonzo Gerald Tindal Behavioral.
Making Cloze Exercises with Cloze Generator Kenji Kitao Doshisha University
3rd Grade NJ ASK Parent Workshop Weston Elementary School 2013.
STAR Reading. Purpose Periodic progress monitoring assessment Quick and accurate estimates of reading comprehension Assessment of reading relative to.
Vocabulary Module 2 Activity 5.
Assessing Students' Understanding of the Scientific Process Amy Marion, Department of Biology, New Mexico State University Abstract The primary goal of.
11. Assessing Grammar & Vocabulary
Concept of Test Validity
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
The Great Fire of London
The Use of Adapted Dialogic Reading Strategies with
Micro-analysis of Fluency Gains in a Reading Tutor that Listens:
Detecting Prosody Improvement in Oral Rereading
Partial Credit Scoring for Technology Enhanced Items
Chapter Eight: Quantitative Methods
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Saidna Zulfiqar bin Tahir STATE UNIVERSITY OF MAKASSAR
Independent versus Computer-Guided Oral Reading:
An Embedded Experiment to Evaluate the Effectiveness of Vocabulary Previews in an Automated Reading Tutor Jack Mostow, Joe Beck, Juliet Bey, Andrew Cuneo,
Addressing the Assessing Challenge with the ASSISTment System
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
Literacy activity: ‘Annotate the Image’
Jack Mostow* and Joseph Beck Project LISTEN (
Educational Data Mining Success Stories
Group Experimental Design
Measuring Student Growth
IERI educational data mining panel
Research Design and Methods
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

1 Are all questions created equal?: Factors that influence cloze question difficulty. Brooke Soden Hensler Carnegie Mellon University (starting graduate school at Florida Center for Reading Research this Fall) Joseph E. Beck Carnegie Mellon University Funding: National Science Foundation Society for the Scientific Study of Reading – July 2006

2 Why Look at Multiple Choice Cloze Questions? Multiple Choice Cloze are widely used assessments of comprehension Problem: outcome measure is typically binary (little information about student). Goal: use multiple choice cloze questions to… More accurately assess students Track student reading development Better understand what makes cloze questions hard

3 Project LISTEN’s Computer Reading Tutor (Mostow & Aist, 2001) Automated Students use throughout year Accompanying paper standardized test scores (pre & post)

4 Student is reading a story aloud to the Reading Tutor…

5 A question appears… *Reading Tutor reads both Question and Response Choices. (Mostow, et al., 2004)

6 Student resumes reading story aloud to the Reading Tutor…

7 Reading Tutor Advantages Well-specified & unbiased question construction (randomly generated) Questions automatically administered, scored, & recorded Longitudinal collection over school year Large N (students & questions)

8 How many Q’s from Whom? Data Description 81,175 Questions 1042 Students 11 = Median number of questions answered (Many students infrequent users of tutor) & School years Diverse population in Pittsburgh area

9 Research Questions Is a particular part of speech (e.g., nouns, verbs, etc.) more difficult for students? If nouns are learned first (Gentner, 1982; Golinkoff, et al., 2000), might students be more proficient at answering noun questions? Which factors influence question difficulty? How can we better assess students using multiple choice cloze questions? Vocabulary researchers have given partial credit for correct part of speech (e.g., Schwanenflugel, et al., 1997)

10 Approach Build logistic regression model to predict individual question performance Terms in model: student identity, part of speech of answer, properties of question (e.g., question length) Advantages of modeling approach Simultaneously estimates impact of question properties and student proficiency on question performance Makes use of all ~80k questions

11 Effect of Parts of Speech NounsVerbs Adverbs Adjectives (p < 0.001) <<< (p < 0.05)

12 Effect of Parts of Speech NounsVerbs Adverbs Adjectives (p < 0.001) easier harder <<< (p < 0.001)(p < 0.05)

13 Impact of other Part of Speech terms Difficulty Significance Most Common p < 0.01 Part of Speech # of Choices  p < with Answer’s POS “Sally had to _______ her lips when she heard the news.” (cloud, purse, holds, magnificent) “Henry read his _______ under the tree.” (cup, dog, book, hair)

14 Difficulty Significance Most Common p < 0.01 Part of Speech # of Choices  p < with Answer’s POS “Henry read his _______ under the tree.” (cup, dog, book, hair) “Sally had to _______ her lips when she heard the news.” (lamp, purse, beautiful, magnificent) Impact of other Part of Speech terms  less common POS = harder  more common POS = easier

15 Difficulty Significance Most Common p < 0.01 Part of Speech # of Choices  p < with Answer’s POS “Henry read his _______ under the tree.” (cup, dog, book, hair) “Sally had to _______ her lips when she heard the news.” (lamp, purse, beautiful, magnificent) Impact of other Part of Speech terms  fewer choices with correct POS  more choices with correct POS = harder = easier (verb) (noun)

16 Impact of other terms Difficulty Significance Question  p < Length Deletion p < Location “We can _______ the stars in the sky despite the bright city lights around us.” (at, with, most, see) “They rode their _______.” (farmer, bikes, play, blue)

17 Impact of other terms Difficulty Significance Question  p < Length Deletion p < Location “We can _______ the stars in the sky despite the bright city lights around us.” (at, with, most, see) “They rode their _______.” (farmer, bikes, play, blue)  longer = harder  shorter = easier

18 Impact of other terms Difficulty Significance Question  p < Length Deletion p < Location “We can _______ the stars in the sky despite the bright city lights around us.” (at, with, most, see) “They rode their _______.” (farmer, bikes, play, blue)  blank earlier = harder  blank later = easier

19 Using model to assess student reading comprehension Model estimates Beta parameter for each student Represents how well student did at answering cloze questions (controlling for difficulty factors) Should correlate with external comprehension measure Compare Beta vs. percent correct for predicting WRMT comprehension composite* Student Beta: r =.644, p <.001 Percent correct: r =.507, p <.001 Reliability of difference in correlations, p <.01 Also provides check on validity of regression model *N = 465, 1 extreme outlier was eliminated from analyses.

20 Conclusions Length of question, location of deleted word, and part of speech of correct answer affect question difficulty. Logistic regression is a strong choice for analyzing cloze data. Multiple-choice cloze questions can assess a student at a more accurate level than current practice.

21 Questions? Nominated for Best Paper Award: Soden Hensler, B., Beck, J. E. (2006). Better student assessing by finding difficulty factors in a fully automated comprehension measure. Intelligent Tutoring Systems. Brooke Soden Hensler Joseph E. Beck Project LISTEN & The Reading Tutor

22 References Gentner, D. (1981). Some interesting differences between verbs and nouns. Cognition and Brain Theory, 4(2). Golinkoff, R.M., Hirsh-Pasek, K., Bloom, L., Smith, L. B., Woodward, A. L., Akhtar, N., Tomasello, M., & Hollich, G. (2000). Becoming a word learner: A debate on lexical acquisition. New York: Oxford University Press. Mostow, J. & Aist, G. (2001). Evaluating tutors that listen: An overview of Project LISTEN. In K. Forbus & P. Feltovich (Eds.), Smart Machines in Education ( ) Menlo Park, CA: MIT/AAAI Press. Mostow, J., Beck, J. E., Bey, J., Cuneo, A., Sison, J., Tobin, B. & Valeri, J. (2004). Using automated questions to assess reading comprehension, vocabulary, and effects of tutorial interventions. Technology, Instruction, Cognition and Learning, 2, p Schwanenflugel, P.J., Stahl, S. A., & McFalls, E. L. (1997). Partial word knowledge and vocabulary growth during reading comprehension. Journal of Literacy Research, 29(4).

23 Additional Slides x

24 Terms in Model FactorsDescription of Term Part of Speech Simplified part of speech classification of the correct answer as Noun, Verb, Adjective, Adverb, or Function Word. Most Common Part of Speech Whether or not the correct answer’s POS is the most common POS the word could take on. POS ConfusabilityThe number of POS the correct answer can take on. Level of Difficulty 4 Levels of Difficulty based on frequency in English or special annotation. Student IdentityUnique Identification for each student. Covariates Question Length Number of characters of the cloze question and the corresponding response choices. Deletion Location Proportion of the sentence that is before the blank (location of word deletion). # Choices with Answer's POS Probability that the student could have answered the question using only part of speech information.

25 Developmental Trends in Learning Parts of Speech

26 Developmental Trends in Learning Parts of Speech p <.001 p =.71 p =.99 p =.52p =.64

27 Syntactic Awareness p =.48 p =.73 p =.01 p =.02 p <.001

28 Effect of Part of Speech *Interpretation: positive Beta means student is more likely to answer question correctly Part of Speech Noun<Verb<Adjective<Adverb< Function Words Beta (comparison point) Significance p <