Jack Mostow* and Joseph Beck Project LISTEN (

Slides:



Advertisements
Similar presentations
National Reading Panel. Formation Congress requested its formation in Asked to assess the status of research-based knowledge about reading and the.
Advertisements

Carbon Chemistry(CC), an 8 th grade Physical Science unit Sharon Stevens Sinclair Martin Luther King, Jr. Middle School 1290 Ivey Ranch Road Oceanside,
Learning decomposition WARNING. Goals Understand what learning decomposition is –And basic intuition See how it was applied to a variety of problems Think.
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
Mining Data from Randomized Within-Subject Experiments in an Automated Reading Tutor Joseph E. Beck and Jack Mostow Project LISTEN (
Detecting Prosody Improvement in Oral Rereading Minh Duong and Jack Mostow Project LISTEN Carnegie Mellon University The research.
Carnegie Mellon Project LISTEN16/29/2004 Which Help Helps? Effects of Various Types of Help on Word Learning in an Automated Reading Tutor that Listens.
1 Evaluating the Effect of Predicting Oral Reading Miscues Satanjeev Banerjee, Joseph Beck, Jack Mostow Project LISTEN ( Carnegie.
Science Instruction by Inquiry: D. Livelybrooks, J. Baxter University of Oregon
New Seats – Block 1. New Seats – Block 2 Warm-up with Scatterplot Notes 1) 2) 3) 4) 5)
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Improving the Help Selection Policy in a Reading Tutor that Listens Cecily Heiner, Joseph E. Beck, Jack Mostow Project LISTEN
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Reasoning in Psychology Using Statistics
Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.
A Missing Ingredient: Oral Reading Fluency Timothy Shanahan Timothy Shanahan University of Illinois at Chicago
NIH and IRB Purpose and Method M.Ed Session 2.
Carnegie Mellon How does the amount of context in which words are practiced affect fluency growth? Experimental results Jack Mostow, Jessica Nelson, Martin.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
Year 2 SATs Workshop for Parents Year 2 SATs Introduction: what are the SATs?  Statutory standardised assessment tests.  Statutory for Year 2.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Experiment Basics: Designs
Teaching reading at Burpham School
Wei Chen, Jack Mostow Gregory Aist Project LISTEN
End of Key Stage SATs Information Afternoon
Experiment Basics: Designs
Experimental Psychology
Statistics for the Social Sciences
Analysis of Algorithms
Building reading skills throughout the year
General principles in building a predictive model
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
SATs parent workshop 15th November 2017.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
A Child Becomes A Reader
Combining Random Variables
Experimental Psychology PSY 433
Micro-analysis of Fluency Gains in a Reading Tutor that Listens:
Detecting Prosody Improvement in Oral Rereading
Experiment Basics: Designs
Reasoning in Psychology Using Statistics
Determining the distribution of Sample statistics
CHAPTER 26: Inference for Regression
Comparing Groups.
Experiment Basics: Designs
Experiment Basics: Designs
Independent versus Computer-Guided Oral Reading:
An Embedded Experiment to Evaluate the Effectiveness of Vocabulary Previews in an Automated Reading Tutor Jack Mostow, Joe Beck, Juliet Bey, Andrew Cuneo,
SATs 2018.
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
What problem are you solving? What are you trying to discover?
Reasoning in Psychology Using Statistics
Multiple comparisons - multiple pairwise tests - orthogonal contrasts
#11 What is the independent variable in this experiment?
Psych 231: Research Methods in Psychology
Educational Data Mining Success Stories
Inferential Statistics
Experimenter-defined measures in a Reading Tutor that Listens
Experiment Basics: Designs
Psych 231: Research Methods in Psychology
IERI educational data mining panel
Reasoning in Psychology Using Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Experiment Basics: Designs
Speaker Identification:
Experiment Basics: Designs
I think the... came first because...
SATs 2019.
Presentation transcript:

Refined Micro-analysis of Fluency Gains in a Reading Tutor that Listens Jack Mostow* and Joseph Beck Project LISTEN (www.cs.cmu.edu/~listen) Carnegie Mellon University * Consultant and Scientific Advisory Board Chair, Soliloquy Learning Society for the Scientific Study of Reading 13th Annual Meeting, July, 2006 Funding: National Science Foundation, Heinz Endowments Add SL affiliation 1

Research questions and approach Guided oral reading builds fluency [NRP 00] Typically repeated oral reading … but how do its benefits vary? How good is repeated vs. wide reading? How good is massed vs. spaced practice? How do the answers vary with student proficiency? Approach: micro-analyze oral reading data Massive: hundreds of children Longitudinal: entire school year Fine-grained: word by word Massive: hundreds of children Fine-grained: individual words Longitudinal: entire school year 2

Project LISTEN’s Reading Tutor: Rich source of guided oral reading data Massive 650 students age 5-14 Mostly grades 1-4 Longitudinal 2003-2004 school year 55,000 sessions Fine-grained 6.9 million words “Heard” by recognizer Get hours from Joe; say less Video at www.cs.cmu.edu/~listen 3

Reading speeds up with practice: example Initial encounter of muttered: I’ll have to mop up all this (5630 ms) muttered Dennis to himself but how 5 weeks later (different word pair in different sentence): Dennis (110 ms) muttered oh I forgot to ask him for the money Word reading time = latency + production time  1/fluency How does word reading time change in general? 4

Learning curve for mean reading time of first 20 encounters, excluding top 50 words Do some types of encounters help more than others? 5

Four types of word encounters New context? First time today? 1. Read muttered in a new story. Wide Spaced 2. Read muttered in another sentence. Massed 3. On a later day, reread sentence 1. Reread 4. Then reread sentence 2. Predict reading time for 770,858 type 1 encounters from prior encounters of all 4 types. 6

Predictor variables Number of word encounters so far of each type Wide vs. reread Spaced vs. massed Word difficulty # of letters # of past help requests (controls for difficulty for that student) Student proficiency WRMT Word Identification grade-equivalent score, e.g. 2.3 Interpolated for each encounter from pre- and post-test scores 7

Exponential model of word reading time = L * # letters + (P * proficiency + constant A) * e - learning rate B *Exposure Define weights for each type of encounter r for rereading vs. 1 for wide reading m for massed vs. 1 for spaced h for help requests Exposure = weighted sum of # of word encounters so far 1 * # of wide, spaced encounters + r * # of reread, spaced encounters + m * # of wide, massed encounters + r * m * # of reread, massed encounters + h * # of help requests [Beck, J. Using learning decomposition to analyze student fluency development. ITS2006 Educational Data Mining Workshop, Taiwan.] 8

Analysis Use SPSS non-linear regression to fit parameters Caveat: 770,858 trials are not independent So be conservative: Split 650 students into 10 groups Fit r, m, … for each group From the 10 estimates of each parameter, compute: Mean ± standard error Differs significantly from 1? A, L, P, B, h, r, m 9

Overall results Wide reading beats rereading r < 1 (p = .007) 2 new stories ≈ 3 old stories Spaced beats massed practice m = .67 ± .13 m < 1 (p = .007) 2 spaced encounters ≈ 3 massed encounters Do these results vary by proficiency? 10

Effects of proficiency Bottom third Middle third Top third Word ID GE 1.8 (0-2.3) 2.7 (2.3-3.1) 4.5 (3.1-10.2) Reread (r) .93 ± .23 .99 ± .23 .79 ± .25 Massed (m) 1.73 ± .46 * .41 ± .08 ** .41 ± .21 ** When does wide reading beat rereading? Maybe only for high readers? Seeing a word again the same day May help low readers more than waiting (p = .058) Helps higher readers less than seeing it later (p < .01) Say significantly different from 1. The lowest group had a WRMT word ID score of 0 to 2.3 with a mean of 1.8 and SD of 0.47. Medium was 2.3 to 3.1 with a mean of 2.7 and SD of 0.21. The high proficiency group scores ranged from 3.1 to 10.2, with a mean of 4.5 and SD of 1.38. 11

Conclusion: type of practice matters! Wide reading beats rereading At least for higher readers Advantage of spaced practice varies with proficiency Low readers: seeing a word again the same day may help more Higher readers: better to wait Fluency growth is slow (learning curve is gradual) So differences in practice quality are hard to detect But possible by micro-analysis of massive, longitudinal, fine-grained data Future work Clarify interaction with proficiency Refine model of fluency practice Test correlational results experimentally Thank you! Questions? See papers & videos at www.cs.cmu.edu/~listen To do: 1. Vet revised conclusions. The results are more intuitive now, but how much should we believe them given the shift due to averaging differently? 2. The by-proficiency results contradict the overall result on rereading. What to say? 3. Thanks for "compare asr with transcription.xls". I was disappointed that the correlation was so low and error was so high. How long did the queries take, especially joining lex_view with aligned_word? I'm curious to compare to text-space false alarm rates. 4. lex_view for 2005_2006? Joe doesn't know, but will email when done. 12

Predictive models of word reading in text SSSR2005 SSSR2006 Predict Growth from word encounter i to i+1 Performance at encounter i+1 Outcome Reading time speedup Reading time, errors, help requests Predictor Encounter i of word Encounters 1..i Model Linear Exponential 13

Outcome variable Combine reading time, errors, help requests Cap reading time at 3 seconds (0.1% of data) Treat error as 3 seconds Treat help request as 3 seconds the following words were excluded: The 50 most frequent words in English Encounters after the first 20 with a word There were multiple reasons to exclude these data, including running out of memory (even being clever about which columns were really needed) and altering the shape of the learning curve due to a poor model of prior encounters. 3. The first word of a sentence. Since we define reading time = latency + production, and latency is undefined for the first word of the sentence, we do not model first word performance. 4, One character "words" as they might be spelling. 14