A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo Gerosa 1, Margaret Heritage 3, Abeer Alwan 4, and Shrikanth Narayanan 1 1 Signal Analysis and Interpretation Laboratory, USC 2 PPrice Speech and Language Technology 3 Center for Research on Evaluation, Standards, and Student Testing, UCLA 4 Speech Processing and Auditory Perception Laboratory, UCLA This work supported by the National Science Foundation, IERI award number 0326214

J. Tepperman Did this kid read the word correctly? “lawn” - /la ʊ n/ –What if we know his first language is Spanish? –What if we can hear the word in context? “mop,” “lot”, “frog”, “lawn” /m ɑ p/, /l ɑ t/, /f ɹɑ g/, /la ʊ n/ “dis…trust?” –Reading assessment: not strictly a question of pronunciation!

J. Tepperman Traditional Pronunciation Verification where O k is set of speech observation vectors for word k, M t is the target pronunciation model, and there are N models in all Usually we approximate: likelihood ratio

J. Tepperman But for reading assessment… We need to model several pronunciation categories: –TA: expected variants of the target made by native speakers “can” = /k æn/ or /k ɛ n/ –L1: variants expected based on the child’s first language Mexican Spanish: “can” = / kan / –RD: common pronunciations linked to reading mistakes e.g. make a vowel say its name: “can” = /ke ɪ n/ –SIL: a silence model Not always clear how these combined likelihoods can determine a reading assessment score Other factors besides pronunciation (e.g. demographics) need to be considered

J. Tepperman With HMMs, we aren’t limited to likelihoods Recognition results: –4 binary features over all categories Possibility of overlap in pronunciations Each category’s proportion in n-best list –e.g. 80% TA, 15% RD, 5% L1, 0% SIL Speaker n-best proportions over all K words in a reading test –Indicates, e.g., general L1 influence for a child

J. Tepperman Why use Bayes Nets? Can model “generative” relationships among features –Necessary for reading assessment task High correlation among features –e.g. L1 likelihood and L1 recognition result –Redundant unless dependencies are trained in the model Need to calculate a “soft” reading assessment score –Not really possible with decision trees (previous work)

J. Tepperman Bayes Net Classifier Basics Where Q is a binary class variable –Correct/incorrect reading of one word X 1, X 2, …, X F is the set of features, –Obtained from HMM alignment/recognition of that one word and Pa(X f ) denotes the “parents” of X f –Other features that influence its distribution –Assume independence otherwise

Our Proposed Network Structure:

J. Tepperman Q... L1RD SILTA best hypothesis continuous discrete TAL1RDSIL likelihoods TAL1RDSIL n-best list % (all K words) n-best list % (word k ) TAL1RDSIL... Demo- graphics Item info

J. Tepperman Conditional Node Distributions Linear Gaussian –μ is a weighted sum of parents’ values (linear regression), σ is fixed Table of Gaussians –Separate μ and σ defined for all combinations of parents Multinomial Logistic –Used in Neural Nets –Acts like a soft decision threshold –Parameters iteratively estimated (pseudo-EM training) 1 0 0, 1, …

J. Tepperman Corpus Collected by us at Los Angeles elementary schools Isolated words elicited by animated GUI Real classroom conditions –Background noise Training set: 19 hours –Both native and nonnative –Kindergarten through 2 nd Grade Test set: 29 students, ~15 words each –11 native, 11 nonnative, 7 no response

J. Tepperman Human Evaluations Judge each word as acceptable/unacceptable Subset of 13 students Mean agreement by group: teachersnon-teachersall # of evaluators 5914 Kappa agreement 0.8470.7530.788 Correlation: % acceptable 0.9510.9230.934 Item level: 0 or 1 Speaker level: 0 to 100 upper bound

J. Tepperman Experiments Triphone HMMs trained –3 states, 16 mixtures per state Alignment and recognition features put into proposed Bayes Net: –Accept/reject words: –Estimate : word-level reading score –Ten-fold crossvalidation For comparison: –Naïve Bayes (no parents other than class) –C4.5 Decision Tree –Refined Bayes Net (disconnect worst features from the root node)

J. Tepperman Results Kappa agreementScore correlation C4.50.5350.752 Naïve Bayes0.6170.841 Full Bayes Net0.6410.844 Refined Bayes Net0.6810.921 Mean of for all words by a speaker Disconnected from root: - SIL pronunciation features - Child demographics C4.5 and Naïve Bayes improve without them 0.951 mean inter-teacher correlation Based on for each word

J. Tepperman Results, cont’d. tiers pronunciation categories none Features removed from the full set: Item info demographics

J. Tepperman In Conclusion a Bayes Net that can be used to achieve close to inter-expert correlation in overall speaker scores It outperforms the C4.5 decision tree by 17% correlation and the Naïve Bayes classifier by 8% Helps teachers plan individualized instruction Can be used for tasks besides reading assessment –e.g. Speaker/Language ID

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

Similar presentations

Presentation on theme: "A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

Similar presentations

Presentation on theme: "A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo."— Presentation transcript:

Similar presentations

About project

Feedback