A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Brief introduction on Logistic Regression
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Supervised Learning Recap
The Center for Signal & Image Processing Georgia Institute of Technology Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
What is Statistical Modeling
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
The TBALL Project Data Collection: Making a Young Children's Speech Corpus Abe Kazemzadeh*, Hong You +, Markus Iseli +, Barbara Jones +, Xiaodong Cui +,
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
TBALL ASR Work Summary USC group: Joseph Tepperman, Matt Black, Abe Kazemzadeh, Matteo Gerosa, Sungbok Lee, Shri Narayanan.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Crash Course on Machine Learning
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.
Sum-Product Networks CS886 Topics in Natural Language Processing
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Slides for “Data Mining” by I. H. Witten and E. Frank.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Predicting Children’s Reading Ability using Evaluator-Informed Features Matthew Black, Joseph Tepperman, Sungbok Lee, and Shrikanth Narayanan Signal Analysis.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Deep Feedforward Networks
Artificial Neural Networks
Online Multiscale Dynamic Topic Models
Conditional Random Fields for ASR
For Evaluating Dialog Error Conditions Based on Acoustic Information
Statistical Models for Automatic Speech Recognition
Overview of Supervised Learning
Automatic Fluency Assessment
Hidden Markov Models Part 2: Algorithms
Automating Early Assessment of Academic Standards for Very Young Native and Non-Native Speakers of American English better known as The TBALL Project.
Bayesian Models in Machine Learning
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Statistical Models for Automatic Speech Recognition
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
CSCI 5822 Probabilistic Models of Human and Machine Learning
EE513 Audio Signals and Systems
Chapter14-cont..
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo Gerosa 1, Margaret Heritage 3, Abeer Alwan 4, and Shrikanth Narayanan 1 1 Signal Analysis and Interpretation Laboratory, USC 2 PPrice Speech and Language Technology 3 Center for Research on Evaluation, Standards, and Student Testing, UCLA 4 Speech Processing and Auditory Perception Laboratory, UCLA This work supported by the National Science Foundation, IERI award number

J. Tepperman Did this kid read the word correctly? “lawn” - /la ʊ n/ –What if we know his first language is Spanish? –What if we can hear the word in context? “mop,” “lot”, “frog”, “lawn” /m ɑ p/, /l ɑ t/, /f ɹɑ g/, /la ʊ n/ “dis…trust?” –Reading assessment: not strictly a question of pronunciation!

J. Tepperman Traditional Pronunciation Verification where O k is set of speech observation vectors for word k, M t is the target pronunciation model, and there are N models in all Usually we approximate: likelihood ratio

J. Tepperman But for reading assessment… We need to model several pronunciation categories: –TA: expected variants of the target made by native speakers “can” = /k æn/ or /k ɛ n/ –L1: variants expected based on the child’s first language Mexican Spanish: “can” = / kan / –RD: common pronunciations linked to reading mistakes e.g. make a vowel say its name: “can” = /ke ɪ n/ –SIL: a silence model Not always clear how these combined likelihoods can determine a reading assessment score Other factors besides pronunciation (e.g. demographics) need to be considered

J. Tepperman With HMMs, we aren’t limited to likelihoods Recognition results: –4 binary features over all categories Possibility of overlap in pronunciations Each category’s proportion in n-best list –e.g. 80% TA, 15% RD, 5% L1, 0% SIL Speaker n-best proportions over all K words in a reading test –Indicates, e.g., general L1 influence for a child

J. Tepperman Why use Bayes Nets? Can model “generative” relationships among features –Necessary for reading assessment task High correlation among features –e.g. L1 likelihood and L1 recognition result –Redundant unless dependencies are trained in the model Need to calculate a “soft” reading assessment score –Not really possible with decision trees (previous work)

J. Tepperman Bayes Net Classifier Basics Where Q is a binary class variable –Correct/incorrect reading of one word X 1, X 2, …, X F is the set of features, –Obtained from HMM alignment/recognition of that one word and Pa(X f ) denotes the “parents” of X f –Other features that influence its distribution –Assume independence otherwise

Our Proposed Network Structure:

J. Tepperman Q... L1RD SILTA best hypothesis continuous discrete TAL1RDSIL likelihoods TAL1RDSIL n-best list % (all K words) n-best list % (word k ) TAL1RDSIL... Demo- graphics Item info

J. Tepperman Conditional Node Distributions Linear Gaussian –μ is a weighted sum of parents’ values (linear regression), σ is fixed Table of Gaussians –Separate μ and σ defined for all combinations of parents Multinomial Logistic –Used in Neural Nets –Acts like a soft decision threshold –Parameters iteratively estimated (pseudo-EM training) 1 0 0, 1, …

J. Tepperman Corpus Collected by us at Los Angeles elementary schools Isolated words elicited by animated GUI Real classroom conditions –Background noise Training set: 19 hours –Both native and nonnative –Kindergarten through 2 nd Grade Test set: 29 students, ~15 words each –11 native, 11 nonnative, 7 no response

J. Tepperman Human Evaluations Judge each word as acceptable/unacceptable Subset of 13 students Mean agreement by group: teachersnon-teachersall # of evaluators 5914 Kappa agreement Correlation: % acceptable Item level: 0 or 1 Speaker level: 0 to 100 upper bound

J. Tepperman Experiments Triphone HMMs trained –3 states, 16 mixtures per state Alignment and recognition features put into proposed Bayes Net: –Accept/reject words: –Estimate : word-level reading score –Ten-fold crossvalidation For comparison: –Naïve Bayes (no parents other than class) –C4.5 Decision Tree –Refined Bayes Net (disconnect worst features from the root node)

J. Tepperman Results Kappa agreementScore correlation C Naïve Bayes Full Bayes Net Refined Bayes Net Mean of for all words by a speaker Disconnected from root: - SIL pronunciation features - Child demographics C4.5 and Naïve Bayes improve without them mean inter-teacher correlation Based on for each word

J. Tepperman Results, cont’d. tiers pronunciation categories none Features removed from the full set: Item info demographics

J. Tepperman In Conclusion a Bayes Net that can be used to achieve close to inter-expert correlation in overall speaker scores It outperforms the C4.5 decision tree by 17% correlation and the Naïve Bayes classifier by 8% Helps teachers plan individualized instruction Can be used for tasks besides reading assessment –e.g. Speaker/Language ID