Download presentation
Presentation is loading. Please wait.
Published byJames Ramsey Modified over 9 years ago
1
Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2 Jordan R. Green 3 1 Center for Spoken Language Understanding, Oregon Health & Science University 2 Waisman Center, University of Wisconsin - Madison 3 Department of Special Education & Communication Disorders, University of Nebraska - Lincoln This research is supported by NIDCD grants DC000496 and DC006722
2
2 Outline of Talk Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin Diagnostic Markers for suspected Apraxia of Speech (sAOS) Overview of Automatic Speech Recognition (ASR) Applying ASR to the Lexical Stress Ratio (LSR) Applying ASR to Coefficient of Variation Ratio (CVR) Summary, Current and Future Work
3
3 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) > Omissions < Distortions < Backing > Omissions < Distortions < Backing - - - - - - > M1 values- - - - - - - - - - < F 3 –F 2- - - - - - Speech markers > I-S Gap > Backing - - > Severity - - SD-DYS 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio - - - - - - - - - - V. Diagnostic Markers *Shriberg, Austin, et al. (1997)
4
4 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SD-DYS SDCS- - - - - - - - - - - - - - SDCS- - - - V. Diagnostic Markers *Shriberg, Austin, et al. (1997)
5
5 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) SD-OME SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SDCS- - - - - - - - - - - - - - SDCS- - - -
6
6 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SDCS- - - - - - - - - - - - - - SDCS- - - -
7
7 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) > Omissions < Distortions < Backing > Omissions < Distortions < Backing - - - - - - > M1 values- - - - - - - - - - < F 3 –F 2- - - - - - Speech markers > I-S Gap > Backing - - > Severity - - 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio - - - - - - - - - -
8
8 Diagnostic Markers for suspected Apraxia of Speech (sAOS) Childhood Apraxia of Speech is controversial disorder due to lack of consensus on features that define it and underlying causes. (Guyette & Diedrich, 1981; Shriberg et al., 1997) “suspected Apraxia of Speech” (sAOS) proposed as interim term (Shriberg et al., 1997) Two proposed markers for sAOS: Lexical Stress Ratio (LSR) (Shriberg et al., 2003a) Coefficient of Variation Ratio (CVR) (Shriberg et al., 2003b) This work: Pilot study for complete automation of these markers, to address inherent human variability. Aim was to replicate results of prior work. Techniques from automatic speech recognition (ASR)
9
9 Outline of Talk Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin Diagnostic Markers for suspected Apraxia of Speech (sAOS) Overview of Automatic Speech Recognition (ASR) Applying ASR to the Lexical Stress Ratio (LSR) Applying ASR to Coefficient of Variation Ratio (CVR) Summary, Current and Future Work
10
10 Overview of Automatic Speech Recognition Automatic Speech Recognition (ASR) is mapping from recorded speech signal to words. Words are represented as sequence of phonemes.
11
11 Overview of Automatic Speech Recognition Automatic Speech Recognition (ASR) is mapping from recorded speech signal to words. Words are represented as sequence of phonemes. Don’t know where phonemes begin or end, so (1) break signal into short (10-msec) units, (2) compute the probability of each phoneme at each unit, (3) find most likely phoneme sequence. p(E)=.4 p(s)=.0 p(^)=.2 p(i)=.1 …
12
12 Overview of Automatic Speech Recognition Automatic Speech Recognition (ASR) is mapping from recorded speech signal to words. Words are represented as sequence of phonemes. Don’t know where phonemes begin or end, so (1) break signal into short (10-msec) units, (2) compute the probability of each phoneme at each unit, (3) find most likely phoneme sequence. f 1 n 2 tct 8 kc k s
13
13 Overview of Automatic Speech Recognition from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.
14
14 Overview of Automatic Speech Recognition p(x) x Gaussian Mixture Model (GMM) is a way of estimating probabilities given a feature value = one Gaussian (Normal) distribution with mean µ and standard deviation . µ x
15
15 Overview of Automatic Speech Recognition from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.
16
16 Overview of Automatic Speech Recognition from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.
17
17 Overview of Automatic Speech Recognition Better estimation of phoneme probabilities at each time t results in more accurate ASR performance (correct words). Estimation of probabilities depends on training a phoneme classifier on large amounts of speech data. If the type of data used in training is different from the type of data seen in testing, probabilities will be low and accuracy will be poor. Important to match training and testing conditions as closely as possible. ASR yields two results: (1) most likely word or word sequence (2) locations of each phoneme in recognized word
18
18 Outline of Talk Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin Diagnostic Markers for suspected Apraxia of Speech (sAOS) Overview of Automatic Speech Recognition (ASR) Applying ASR to the Lexical Stress Ratio (LSR) The Lexical Stress Ratio Measuring Fundamental Frequency Computing Probability of Lexical Stress Results Applying ASR to Coefficient of Variation Ratio (CVR) Summary, Current and Future Work
19
19 Applying ASR to the Lexical Stress Ratio: The Lexical Stress Ratio LSR (Shriberg et al., 2003a) measures “inappropriate lexical stress” observed in children with sAOS Inappropriate lexical stress: excessive stress on a syllable, or lack of stress on a syllable that is normally stressed Three factors used to measure lexical stress: F 0, amplitude, and duration of the first and second vowels in trochaic (stress on the first syllable) words Due to problems reliably extracting duration, initial focus of automation on only ratio of F 0 in first and second vowel Either high or low F 0 ratios may be associated with sAOS. “dishes,” reduced stress “chicken,” excessive stress “puppy,” excessive stress
20
20 Data from Shriberg et al.’s 2003a study (LSR corpus): 24 children with speech delay (control data) 11 children with sAOS Recordings of elicited samples of 8 trochaic words Average age: 6 yrs, 4 mo. for children with speech delay, 7 yrs, 1 mo. for children with sAOS. Applying ASR to the Lexical Stress Ratio: Speech Data
21
21 Applying ASR to the Lexical Stress Ratio: Measuring F 0 Fundamental frequency (F 0 ) measured by locating peak of histogram of “strong” outputs from 32 narrow-band filters Comparison with Kay Elemetrics’ CSL algorithm on LSR data: CSL:30 cases of F 0 error > 30 Hz new: 8 cases of F 0 error > 30 Hz
22
22 Applying ASR to the Lexical Stress Ratio: Computing Probability of Lexical Stress Histogram of normalized counts (probabilities) of F 0 ratios of SD subjects and sAOS subjects Ratio of F 0 s in first and second vowel probability given F 0 ratio = sAOS = SD
23
23 Applying ASR to the Lexical Stress Ratio: Computing Probability of Lexical Stress Probability Distribution Functions (PDFs) of F 0 ratios of SD subjects and sAOS subjects using Gamma distribution p(SD|F 0 (w)) p(sAOS|F 0 (w))
24
24 Applying ASR to the Lexical Stress Ratio: Computing Probability of Lexical Stress Probability of Lexical Stress Characteristic of sAOS: Use one formulation of Bayes’ Rule (only two choices): where w is an individual word spoken by a subject Decision criterion: sAOS if p(sAOS) > 0.5
25
25 Applying ASR to the Lexical Stress Ratio: Computing Probability of Lexical Stress Probability of Lexical Stress: Example of 4 observations, equal probabilities: Example of 3 observations, different probabilities:
26
26 Applying ASR to the Lexical Stress Ratio: Results Evaluation of method on data used to build models: Sensitivity/Specificity: 64% / 88% PPV/NPV: 70% / 84% Evaluation of method on new data: essentially chance performance Conclusions: Large difference between characteristics of training and testing data Need more data to develop better models
27
27 Outline of Talk Complex Disease Model… Diagnostic Markers for suspected Apraxia of Speech (sAOS) Overview of Automatic Speech Recognition (ASR) Applying ASR to the Lexical Stress Ratio (LSR) Applying ASR to Coefficient of Variation Ratio (CVR) The Coefficient of Variation Ratio Identifying Speech/Pause Regions Using ASR Computing the CVR Results Summary, Current and Future Work
28
28 Applying ASR to the Coefficient of Variation Ratio: The Coefficient of Variation Ratio CVR (Shriberg et al., 2003b) measures reduction in normal temporal variation of speech, as observed in children with sAOS. Measurement of CVR depends on duration of speech events and duration of pause events Because of reduced variability of speech-event durations in children with sAOS, these children have higher CVR values relative to control group p = standard deviation of pause events p = mean duration of pause events s = standard deviation of speech events s = mean duration of speech events
29
29 Applying ASR to the Coefficient of Variation Ratio: The Coefficient of Variation Ratio In Shriberg et al. 2003b, speech/pause events detected by: (1) displaying speech amplitude envelope using Matlab software (2) human identification of pause event with largest amplitude (3) speech/pause classification using threshold from Step (2) (4) removing speech/pause regions with duration < 100 msec Preliminary results show good agreement between this Matlab-based algorithm and manual measurements from spectrograms (Green et al., 2004)
30
30 Applying ASR to the Coefficient of Variation Ratio: Identifying Speech/Pause Regions Using ASR Can be difficult to identify speech/pause from only energy or amplitude envelope, so investigated speech/pause detection using ASR ASR system trained using 300 utterances from 3 children with speech delay of unknown origin All training data phonetically labeled by hand, time-aligned at the phoneme level ASR system trained to classify 8 broad-phonetic classes related to speech (e.g. “nasal”), instead of specific phonemes State sequence used by ASR system imposed constraints on sequences of phonemic classes to be consistent with English syllable structure
31
31 Applying ASR to the Coefficient of Variation Ratio: Identifying Speech/Pause Regions Using ASR ASR system recognized the following categories of speech: State sequence (grammar) allowed sequences such as.pau clo plo vow nas.pau (e.g. for the isolated-word utterance “can”) but not.pau nas wfrc vow.pau (violates sonority principle).noise non-speech noise (e.g. door slam, breath).pau silence or pause clo stop closure nas nasal plo stop burst sfrc strong fricative vow vowel, liquid, or glide wfrc weak fricative
32
32 Applying ASR to the Coefficient of Variation Ratio: Computing the CVR ASR results (broad phonetic classes with English syllable structure) mapped to “speech” and “pause” events CVR computed as in Shriberg et al. (2003b), except that regions less than 50 msec merged with neighboring regions. phn class: speech/pau: wave: spectrogram:
33
33 Applying ASR to the Coefficient of Variation Ratio: Speech Data Data from Shriberg et al.’s 2003b study (CVR corpus): 30 children with normal speech (NS) (control data) 30 children with speech delay (SD) (control data) 15 children with sAOS Recordings of conversational speech
34
34 Applying ASR to the Coefficient of Variation Ratio: Results The CV-Speech values had ES values of 0.95 and 1.04 for NS/sAOS and SD/sAOS, respectively, although there is the possibility of a confounding age effect. Conclusion: ASR techniques appear to be applicable to the computation of the CVR; support for the percept of isochrony in the sAOS subjects. Shriberg et al.’s 2003b study: mean CVR of 1.05 for NS, 1.04 for SD, and 1.36 for sAOS effect size of 0.72 for NS/sAOS, ES of 0.71 for SD/sAOS. ASR-based method: mean CVR of 1.24 for NS, 1.13 for SD, and 1.42 for sAOS effect size of 0.68 for NS/sAOS, ES of 1.07 for SD/sAOS.
35
35 Outline of Talk Complex Disease Model… Diagnostic Markers for suspected Apraxia of Speech (sAOS) Overview of Automatic Speech Recognition (ASR) Applying ASR to the Lexical Stress Ratio (LSR) Applying ASR to Coefficient of Variation Ratio (CVR) Summary, Current and Future Work
36
36 Summary More data necessary in order to apply statistical models in computation for LSR. Data collection currently under way in separate projects. Agreement between published results and current results indicates potential for ASR-based CVR Improvements necessary for automation: Train ASR system on larger amount of speech data Improve F 0 estimation for children’s speech.
37
37 Current and Future Work Current work focusing on: (a) understanding differences between published CVR values and ASR-based CVR values, (b) extension of CVR to syllable-based measure instead of speech-event-based measure, and (c) extension of LSR to conversational speech.
38
38 Current and Future Work Future work will focus on: (a) applying ASR to measurement of other prosodic factors, such as inter-stress intervals, linguistic rhythm, speaking-rate variation, and glottal-source variation (b) multiple measures of sAOS may be combined for improved sensitivity and specificity (c) evaluating specific factors that influence diagnosis
39
39 References Green, J., Beukelman, D., Ball, L., Ullman, C., and Maassen K. (2004). “Development and Evaluation of a Computer-based System to Measure and Analyze Pause and Speech Events,” Conference on Motor Speech: Motor Speech Disorders, Speech Motor Control, Albuquerque, NM. Guyette, T. W. and Diedrich, W. M. (1981). "A Critical Review of Developmental Apraxia of Speech," in Speech and Language: Advances in Basic Research and Practice, 5, pp. 1-45. Hawley, M. (2003). “Speech Training And Recognition for Dysarthric Users of Assistive Technology (STARDUST) ”, Wales International Conference on Electronic Assistive Technology, Cardiff, Wales, July 2003. Hosom, J. P. (2000). Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information. Ph.D. thesis, Oregon Graduate Institute of Science and Technology, Beaverton, Oregon. Kasi, K. and Zahorian, S. A. (2002). “Yet Another Algorithm for Pitch Tracking,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2002, Orlando, FL, 1, pp. 361-364.
40
40 References Marquardt, T. P., Sussman, H. M., Snow, T., and Jacks, A. (2002). "The Intelligibility of the syllable in developmental apraxia of speech," in Journal of Communication Disorders, 35, pp. 31-49. Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., and Wilson, D. L. (1997). "The Speech Disorders Classification System (SDCS): Extensions and Lifespan Reference Data," in Journal of Speech, Language, and Hearing Research, 40, pp. 723-740. Shriberg, D. L., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003a). A Diagnostic Marker for Childhood Apraxia of Speech: The Lexical Stress Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics. 17.7, pp. 549-574. Shriberg, D. L., Green, J. R., Campbell, T. F., McSweeny, J. L., & Scheer, A. (2003b). “A Diagnostic Marker for Childhood Apraxia of Speech: The Coefficient of Variation Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics, 17.7, pp. 575-595.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.