Download presentation
Presentation is loading. Please wait.
Published byJason Owens Modified over 9 years ago
1
Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems Lawrence D. Shriberg 1 John-Paul Hosom 2 Jordan R. Green 3 1 Waisman Center, University of Wisconsin - Madison 2 Center for Spoken Language Understanding, Oregon Health & Science University 3 Department of Special Education & Communication Disorders, University of Nebraska - Lincoln This research is supported by NIDCD grants DC000496 and DC006722 http://www.waisman.wisc.edu/phonology/Index.htm
2
2 Acknowledgments Phonology Project, Waisman Center, University of Wisconsin - Madison Roger BrownKatherina HaunerJane McSweeny Catherine CoffeyHeather KarlssonConnie Nadler Peter Flipsen Jr. a Ray KentAlison Scheer Jordan Green b Yunjung KimChristie Tilkens Sheryl HallJoan KwiatkowskiDavid Wilson Collaborative Projects Thomas Campbell & colleagues:University of Pittsburgh John-Paul Hosom & colleagues:Oregon Health & Science University Barbara Lewis & colleagues:Case Western Reserve University Christopher Moore & colleagues:University of Washington Rhea Paul & colleagues:Yale Child Studies Center Bruce Pennington & colleagues:University of Colorado Joanne Roberts & Colleagues: University of North Carolina Bruce Tomblin & colleagues:University of Iowa a. University of Tennessee, Knoxville b. University of Nebraska
3
3 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) > Omissions < Distortions < Backing > Omissions < Distortions < Backing - - - - - - > M1 values- - - - - - - - - - < F 3 –F 2- - - - - - Speech markers > I-S Gap > Backing - - > Severity - - SD-DYS 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio - - - - - - - - - - V. Diagnostic Markers *Shriberg, Austin, et al. (1997)
4
4 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SD-DYS SDCS- - - - - - - - - - - - - - SDCS- - - - V. Diagnostic Markers *Shriberg, Austin, et al. (1997)
5
5 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) SD-OME SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SDCS- - - - - - - - - - - - - - SDCS- - - -
6
6 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) SDCS* - - SDCS - - - - - - SDCS- - - - - - - - - - SDCS- - - - - - SDCS - - SDCS - - SDCS- - - - - - - - - - - - - - SDCS- - - -
7
7 Complex Disease Model for Childhood Speech Sound Disorders (SSD) of Unknown Origin Risk and Protective Factors EnvironmentalGenetic Cognitive- Linguistic Auditory- Perceptual Speech Motor Control Psycho- social Phonological Attunement Speech Delay – Genetic (SD-GEN) Speech Delay – Otitis Media with Effusion (SD-OME) Speech Delay Speech Motor Involvement (SD-SMI) Speech Delay – Developmental Psychosocial Involvement (SD-DPI) Speech Errors (SE) SD-GENSD-OMESD-AOSSD-DPISE-/s/SE-/r/ I. Etiological Processes II. Explanatory Processes III. Nosological Entity IV. Trait Markers (phenotypes, endophenotypes) SD-DYS V. Diagnostic Markers *Shriberg, Austin, et al. (1997) > Omissions < Distortions < Backing > Omissions < Distortions < Backing - - - - - - > M1 values- - - - - - - - - - < F 3 –F 2- - - - - - Speech markers > I-S Gap > Backing - - > Severity - - 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio 8 speech markers Lex. Stress Ratio >Coeff. Var. Ratio - - - - - - - - - -
8
8 Diagnostic Markers and Automatic Speech Recognition Childhood Apraxia of Speech is controversial disorder due to lack of consensus on features that define it and etiologic conditions. (Guyette & Diedrich, 1981; Shriberg et al., 1997) “suspected Apraxia of Speech” (sAOS) proposed as interim term (Shriberg et al., 1997) Two proposed markers for sAOS: Lexical Stress Ratio (LSR) (Shriberg et al., 2003a) Coefficient of Variation Ratio (CVR) (Shriberg et al., 2003b) This work: Pilot study for complete automation of these markers, to address inherent human variability. Aim was to replicate results of prior work. Techniques from automatic speech recognition (ASR)
9
9 Outline of Talk Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin Diagnostic Markers for sAOS Applying ASR to the Lexical Stress Ratio (LSR) The Lexical Stress Ratio Identifying Vowel Boundaries Using ASR Computing the LSR Results Applying ASR to Coefficient of Variation Ratio (CVR) Summary & Conclusion
10
10 Applying ASR to the Lexical Stress Ratio: The Lexical Stress Ratio LSR (Shriberg et al., 2003a) measures “inappropriate lexical stress” observed in children with sAOS Inappropriate lexical stress: excessive stress on a syllable, or lack of stress on a syllable that is normally stressed Three factors used to measure lexical stress: frequency area, amplitude area, and duration of the first and second vowels in trochaic words Combine values to a single dimension, defined as stress Both high and low LSR values associated with sAOS. This work: pilot study on 2 subjects from original LSR study.
11
11 Applying ASR to the Lexical Stress Ratio: Identifying Vowel Boundaries Using ASR Primary issue in automating LSR: Determine boundaries of both vowels in known, isolated, two-syllable words (e.g. “ladder”) Vowel boundaries determined by “forced alignment”.pau l @ dc d 3r.pau Forced Alignment
12
12 Applying ASR to the Lexical Stress Ratio: Computing the LSR Vowel duration (D) = (end time of vowel)–(begin time of vowel) Amplitude area (AA) = (average amplitude of vowel (dB))×D Frequency area (FA) = (average F 0 of vowel)×D amp: F0:F0: phon: spec: wave:
13
13 Applying ASR to the Lexical Stress Ratio: Computing the LSR v 1 = first vowel, v 2 = second vowel, N = number of utterances, i = subject LSR computed as described in Shriberg et al. (2003a): Ratios of the frequency area, amplitude area, and duration between the first and second vowel are computed and combined into a single score.
14
14 Applying ASR to the Lexical Stress Ratio: Results Standard error of the mean estimated from data published by Shriberg et al. (2003a): 0.023 Difference in results for Subject 1 is within estimated standard error of the mean; not so for Subject 2 Single gross error from forced alignment procedure, when manually corrected, caused automatic LSR result to become 0.88 (within standard error of the mean) Participant Reported LSR Automatic LSR Absolute Difference Subject 11.651.630.02 Subject 20.890.830.06
15
15 Outline of Talk Complex Disease Model… Diagnostic Markers for sAOS Applying ASR to the Lexical Stress Ratio (LSR) Applying ASR to Coefficient of Variation Ratio (CVR) The Coefficient of Variation Ratio Identifying Speech/Pause Regions Using ASR Computing the CVR Results Summary & Conclusion
16
16 Applying ASR to the Coefficient of Variation Ratio: The Coefficient of Variation Ratio CVR (Shriberg et al., 2003b) measures reduction in normal temporal variation of speech, as observed in children with sAOS Measurement of CVR depends on duration of speech events and duration of pause events Because of reduced variability of speech-event durations in children with sAOS, these children have higher CVR values relative to control group p = standard deviation of pause events p = mean duration of pause events s = standard deviation of speech events s = mean duration of speech events
17
17 Applying ASR to the Coefficient of Variation Ratio: The Coefficient of Variation Ratio In Shriberg et al. 2003b, speech/pause events detected by: (1) displaying speech amplitude envelope (2) human identification of pause event with largest amplitude (3) speech/pause classification using threshold from Step (2) (4) removing speech/pause regions with duration < 100 msec Preliminary results show good agreement between this Matlab-based algorithm and manual measurements from spectrograms (Green et al., this conference)
18
18 Applying ASR to the Coefficient of Variation Ratio: Identifying Speech/Pause Regions Using ASR Can be difficult to identify speech/pause from only energy or amplitude envelope, so investigated speech/pause detection using ASR ASR system trained using 300 utterances from 3 children with speech delay of unknown origin All training data phonetically labeled by hand, time-aligned at the phoneme level ASR system trained to classify 8 broad-phonemic classes related to speech (e.g. “nasal”), instead of phonemes Grammar used by ASR system imposed constraints on sequences of phonemic classes to be consistent with English syllable structure
19
19 Applying ASR to the Coefficient of Variation Ratio: Computing the CVR ASR results (broad phonetic classes with English syllable structure) mapped to “speech” and “pause” events CVR computed as in Shriberg et al. (2003b): phone class: speech/pau: wave: spectrogram:
20
20 Applying ASR to the Coefficient of Variation Ratio: Results ASR-method CVR values within 3% of reported values ParticipantMethod Average CV of pause events Average CV of speech events CVR Subject 1 reported0.5810.4071.43 ASR0.5650.3981.42 Subject 2 reported0.5450.5031.08 ASR0.5090.4601.11
21
21 Summary & Conclusion Agreement between published results and current pilot-study results indicates potential for ASR-based LSR Improvements necessary to automation of LSR: Train speech recognizer on children’s speech data ASR-based CVR results for two subjects considered to be comparable to reported CVR results. Algorithms still require refinement; improvements possible Need to evaluate generalization of methods to other subjects
22
22 References Green, J., Beukelman, D., Ball, L., Ullman, C., and Maassen K. (2004). “Development and Evaluation of a Computer-based System to Measure and Analyze Pause and Speech Events,” Conference on Motor Speech: Motor Speech Disorders, Speech Motor Control, Albuquerque, NM. Guyette, T. W. and Diedrich, W. M. (1981). "A Critical Review of Developmental Apraxia of Speech," in Speech and Language: Advances in Basic Research and Practice, 5, pp. 1-45. Hawley, M. (2003). “Speech Training And Recognition for Dysarthric Users of Assistive Technology (STARDUST) ”, Wales International Conference on Electronic Assistive Technology, Cardiff, Wales, July 2003. Hosom, J. P. (2000). Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information. Ph.D. thesis, Oregon Graduate Institute of Science and Technology, Beaverton, Oregon. Kasi, K. and Zahorian, S. A. (2002). “Yet Another Algorithm for Pitch Tracking,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2002, Orlando, FL, 1, pp. 361-364.
23
23 References Marquardt, T. P., Sussman, H. M., Snow, T., and Jacks, A. (2002). "The Intelligibility of the syllable in developmental apraxia of speech," in Journal of Communication Disorders, 35, pp. 31-49. Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., and Wilson, D. L. (1997). "The Speech Disorders Classification System (SDCS): Extensions and Lifespan Reference Data," in Journal of Speech, Language, and Hearing Research, 40, pp. 723-740. Shriberg, D. L., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003a). A Diagnostic Marker for Childhood Apraxia of Speech: The Lexical Stress Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics. 17.7, pp. 549-574. Shriberg, D. L., Green, J. R., Campbell, T. F., McSweeny, J. L., & Scheer, A. (2003b). “A Diagnostic Marker for Childhood Apraxia of Speech: The Coefficient of Variation Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics, 17.7, pp. 575-595.
24
24 Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems Lawrence D. Shriberg 1 John-Paul Hosom 2 Jordan R. Green 3 1 Waisman Center, University of Wisconsin - Madison 2 Center for Spoken Language Understanding, Oregon Health & Science University 3 Department of Special Education & Communication Disorders, University of Nebraska - Lincoln This research is supported by NIDCD grants DC000496 and DC006722 http://www.waisman.wisc.edu/phonology/Index.htm
25
25
26
26 Applying ASR to the Lexical Stress Ratio: Speech Data Data from Shriberg et al.’s 2003a study (LSR corpus): 24 children with speech delay (control data) 11 children with sAOS Recordings of elicited samples of 8 trochaic words Average age: 6 yrs, 4 mo. for children with speech delay, 7 yrs, 1 mo. for children with sAOS. This study: Pilot study 2 children from LSR corpus
27
27 Applying ASR to the Lexical Stress Ratio: Measuring F 0 and Amplitude Fundamental frequency (F 0 ) measured by computing auto-correlation of relative changes in energy between 200 and 700 Hz (Hosom, 2000) Post-processing of F 0 contour: dynamic programming method to correct pitch-doubling and pitch-halving errors (e.g. Kasi and Zahorian, 2002) 3 Hz average absolute error on single speaker under quiet conditions (cf. electro-glottography measurements) Amplitude computed as the log energy (in decibels) of the signal using a 20-msec Hamming window
28
28 Applying ASR to the Coefficient of Variation Ratio: Speech Data Data from Shriberg et al.’s 2003b study (CVR corpus): 30 children with normal speech acquisition (control data) 30 children with speech delay (control data) 15 children with sAOS Recordings of conversational speech Inclusionary criteria for children with sAOS based on set of provisional speech and prosody-voice markers considered consistent with sAOS. This study: Pilot study 2 children from CVR corpus
29
29 Applying ASR to the Coefficient of Variation Ratio: Results Replicated Shriberg et al.’s CV computation using threshold of amplitude envelope Differences between replicated and reported values for average CVs of pause and speech < 0.01; smallest reported standard error of the mean = 0.015 Comparison of results of individual samples from the ASR and Matlab methods: (A) Matlab CVR method yields some speech events that are “interrupted” by low-amplitude speech. ASR-based CVR may be less sensitive to these interruptions. (B) other differences in way ASR and Matlab techniques detect speech and pause events
30
30 Why LSR = state of art system for adult speech, CVR = broad classes trained on child. speech? LSR forced alignment could be (a) state-of-art system trained on adult speech (b) not-state-of-art system trained on children’s speech Phoneme identities are given, so mismatch between adult and child speech not as severe as in ASR Chose (a) because of ease of implementation For CVR task of speech/pause detection, existing forced alignment system (a) would require software modifications for ASR (b) was not required to identify phonemes Implemented broad-phonetic-category ASR because of presumed robustness and ease of implementation.
31
31 Long-Term Goals ASR may also allow automatic measurement of other prosodic factors, such as syllable duration, inter-stress intervals, and linguistic rhythm Multiple measures of sAOS may be combined for improved sensitivity and specificity Evaluate specific factors that influence diagnosis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.