Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

Similar presentations


Presentation on theme: "Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow."— Presentation transcript:

1 Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow and Gregory Aist Project LISTEN, Carnegie Mellon University http://www.cs.cmu.edu/~listen

2 Carnegie Mellon Mostow 12/7/2015, p. 2 Pilot study in urban elementary school Goals: Analyze extended use of Reading Tutor Analyze extended use of Reading Tutor Identify opportunities for improvement Identify opportunities for improvementProtocol: Principal chose 8 lowest third-grade readers Principal chose 8 lowest third-grade readers Aide took each kid daily to use Reading Tutor in small room Aide took each kid daily to use Reading Tutor in small room Kid chose text to read ( Weekly Reader, poems, …) Kid chose text to read ( Weekly Reader, poems, …)Milestones: Oct. 96: deployed Pentium, trained users, refined design Oct. 96: deployed Pentium, trained users, refined design Nov. 96: school pre-tested individually Nov. 96: school pre-tested individually June 97: school post-tested individually June 97: school post-tested individually

3 Carnegie Mellon Mostow 12/7/2015, p. 3 User may: click Back click Help click Go click word read Tutor may: go on read word recue word read phrase User-Tutor interaction (11/7/96 version used in pilot study)

4 Carnegie Mellon Mostow 12/7/2015, p. 4 Data recorded by Reading Tutor Sessions from Nov. 96 to May 97 (excluding outliers) 29 to 57 sessions per kid, averaging 14 minutes 29 to 57 sessions per kid, averaging 14 minutes Not used during vacations, downtime, absences Not used during vacations, downtime, absences 6 gigabytes of data.WAV files of kids’ spoken utterances.WAV files of kids’ spoken utterances.SEG files of time-aligned speech recognizer output.SEG files of time-aligned speech recognizer output.LOG files of Reading Tutor events.LOG files of Reading Tutor events

5 Carnegie Mellon Mostow 12/7/2015, p. 5 What to evaluate? Usability (can kids use it?) 1993 Wizard of Oz experiments 1993 Wizard of Oz experiments Lab and in-school user tests of successive versions Lab and in-school user tests of successive versions Assistiveness (do kids perform better with than without?) 1994 Reading Coach boosted comprehension by ~20% 1994 Reading Coach boosted comprehension by ~20% But: evaluation obtrusive, costly, sparse, subjective, noisy But: evaluation obtrusive, costly, sparse, subjective, noisy Learning (do kids improve over time?) Within tutor: this talk Within tutor: this talk On unassisted reading: pre-/post-test by school On unassisted reading: pre-/post-test by school More than with alternatives: future studies More than with alternatives: future studies

6 Carnegie Mellon Mostow 12/7/2015, p. 6 How should the Reading Tutor evaluate learning? Evaluation should be Ecologically valid -- based on normal system use Ecologically valid -- based on normal system use Authentic -- student chooses material Authentic -- student chooses material Unobtrusive -- invisible to student Unobtrusive -- invisible to student Automatic -- objective, cheap Automatic -- objective, cheap Fast -- computable in real-time on PC Fast -- computable in real-time on PC Robust -- to student, recognizer, and tutor behavior Robust -- to student, recognizer, and tutor behavior Data-rich -- based on many observations Data-rich -- based on many observations Sensitive -- detect subtle effects Sensitive -- detect subtle effects So estimate improvement in assisted performance

7 Carnegie Mellon Mostow 12/7/2015, p. 7 How to estimate performance? Accuracy = % of text words matched by recognizer output Coarse-grained Coarse-grained Sensitive to missed words Sensitive to missed words Doesn’t penalize requests for help Doesn’t penalize requests for help Inter-word latency = time interval between aligned text words Finer-grained Finer-grained Sensitive to hesitations, insertions Sensitive to hesitations, insertions Robust to many speech recognizer errors Robust to many speech recognizer errors

8 Carnegie Mellon Mostow 12/7/2015, p. 8 Estimation of accuracy and latency (Nov. 96 example from video) Text: If the computer thinks you need help, it talks to you. Student said: if the computer...takes your name...help it...take...s to you Recognizer heard: IF THE COMPUTER THINKS YOU IF THE HELP IT TO TO YOU Tutor estimated 81% accuracy; inter-word latencies: If the computer thinks you need…help, it talks...to you. ? 43 39 1 60 41 226 7 1 242 1 cs

9 Carnegie Mellon Mostow 12/7/2015, p. 9 Improvement in accuracy and latency (same kid reads “help” in May 97) Text: When some kids jump rope, they help other people too. Student said: when some kids jump rope they help other people too Recognizer heard: WHEN SOME KIDS JUMP ROPE THEY HELP OTHER PEOPLE TOO Tutor estimated 100% accuracy; inter-word latencies: When some kids jump rope, they help other people too. ? 1 10 34 19 77 9 1 34 1 cs

10 Carnegie Mellon Mostow 12/7/2015, p. 10 Which performance improvements count? Echoing the sentence doesn’t count. So look only at the first try. So look only at the first try. Picking stories with easier words doesn’t count. So look at changes on the same word. So look at changes on the same word. Memorizing the story doesn’t count. So look only at encounters of words in new contexts. So look only at encounters of words in new contexts. Remembering recent words doesn’t count. So look only at the first time a word is seen that day. So look only at the first time a word is seen that day.

11 Carnegie Mellon Mostow 12/7/2015, p. 11 firstlast Accuracy increased 16% on same word from first to last day seen in new context

12 Carnegie Mellon Mostow 12/7/2015, p. 12 firstlast Latency decreased 35% on same word from first to last day read in new context

13 Carnegie Mellon Mostow 12/7/2015, p. 13 Is accuracy and latency estimation... Ecologically valid? Reading Tutor used in school Authentic? kids choose stories Unobtrusive? evaluate assisted reading invisibly Automatic? align recognizer output against text Fast? real-time on Pentium Robust? to much student, recognizer, and tutor behavior Data-rich? 10498 utterances, 139133 aligned words Sensitive? detects significant but subtle effects (< 0.1 sec)

14 Carnegie Mellon Mostow 12/7/2015, p. 14 Conclusion Does the Reading Tutor help? Yes, with assisted reading Yes, with assisted reading Transfers to unassisted reading! Transfers to unassisted reading! Research questions: Who benefits how much, when, and why? Who benefits how much, when, and why? How should we improve the Tutor? How should we improve the Tutor? For more information: http://www.cs.cmu.edu/~listen http://www.cs.cmu.edu/~listen


Download ppt "Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow."

Similar presentations


Ads by Google