Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch.

Similar presentations


Presentation on theme: "SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch."— Presentation transcript:

1 SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

2 Contents Introduction Objectives Articulatory Features Speech Material Experimental details –set-up –Results Questions, future plans

3 Speech is usually represented in terms of sequences from a limited set of phone-like symbols (ASR, synthesis, annotation) ‘Beads-on-a-string’ paradigm (Ostendorf, 1999; etc) –Powerful as meta description –Weak to describe articulatory variation, pronunciation variation Research on new descriptions & models of speech –Many proposals for new signal representations (continuity preserving, auditorily inspired) and new models (neural models, long-span models, parallel models) –Here: articulatory features (AF) Introduction

4 To obtain alternative representations that intrinsically better model variation in speech Focus on articulatory/pronunciation variation To investigate the relation between better representations and decoding Objectives

5 AF advantages are twofold: –Allow feature asynchrony –Deal with ‘incompleteness’: incomplete nasalization, voicing –Intrinsically better modelling of continuous processes –Assumed to better model fine phonetic details (FPD) FPD mediate human speech processing (lexical access) [together with indexical information] Articulatory Features (AFs)

6 Each utterance is a path in AF space Distance metric in AF space defines ‘speed’ along path –Compare with delta-features in ASR Speed peak detection impose intrinsic temporal structure Which distances to use? –Three types (L1, L2, cosine) How relates this ‘intrinsic’ temporal structure with external temporal structure e.g. phone boundaries? Distance Metric in AF Space

7 Articulatory Features and Their Values Feature (card)Values manner (6)approximant, fricative, nasal, stop, vowel, silence place (8)(labio)dental, alveolar, velar, high, mid, low, silence voicing (3)voiced, voiceless, silence rounding (4)rounded, unrounded, nil, silence front-back (5)front, central, back, nil, silence static (3)static, dynamic, silence

8 IFAcorpus (Dutch, read + prepared, 8 speakers, 6 used for training and development, 2 for test) Many different rich annotation levels Speech Material TrainingDevelopmentTest Nbr of utt.1978100572 Duration140 min9 min 40 s44 min 10 s

9 AF Classification Results by ANNs Feature/ANNIFAcorpusTIMIT manner (6)84.786.5 place (8)76.778.6 voicing (3)93.592.0 rounding (4)87.486.1 front-back (5)83.683.0 static (3)89.781.0

10 AF-Based Events and Segment Boundaries

11 Alignment Results Time windowAutom. segmentation Hit (%) -20-20ms84 Nbr of hits (detected -> observed) versus time window size: Time windowAF event- segment boundary Hit (%) Exact (-5–5 ms)40 -25–25 ms89 -35–35 ms94 Wesenick & Kipp ‘96

12 Asynchrony and Phonetic Classes Average (in number of frames) and standard deviation of the difference (diff.) between cosine-peak location and manual boundary. Only the transitions with extreme negative and positive distances are shown. Manner transitionavg. (st.dev.) Fricative-fricative -0.57 (1.6) Vowel-vowel-0.31 (1.8) …. Silence-approximant 0.49 (1.8) Approx.-stop 0.63 (1.6) Vowel-silence 0.64 (2.1) Nasal-approx 0.66 (1.0)

13 Open questions 1 To what extent the type of distance (L1, L2, cosine) distinguishes fine detail in the alignment with manual segmentation? –For distances close to 0, all metrics will provide about the same result –The metrics deviate for larger distances, thereby putting more weight to different types of distinctions This means that event parsing along the AF trajectory may result into essentially different segmentations along the trajectory for different metrics.

14 Open questions 2 What about the cue trading (by using weights)? –Difficult, depends on phone What about the precise quantification of asynchrony? –The variation of observed AF vectors around a canonical AF vector = feature asynchrony + the variation in the classifier output

15 Near-future plans Exploit phenomena described here in terms of design principles for alternative procedures for data-driven annotation and unit selection Design word recognition framework based on AF representation of speech Study usability for memory-prediction models

16 Thank you for your attention


Download ppt "SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch."

Similar presentations


Ads by Google