Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge IAFPA 2006
Speaker characteristics and static features of speech Most previous research has focussed on static features - instantaneous, average Straightforward to measure Natural progression from other research areas – delineation of different languages and language varieties
Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population look to dynamic (time-varying) features Speaker characteristics and static features of speech
More information than static Reflect movement of a person’s speech organs as well as dimensions - people move in individual ways for skilled motor activities - walking, running, … and speech Dynamic features of speech
can view speech as achievement of a series of linguistic ‘targets’ speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways examine formant frequency dynamics
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) 10% Formant dynamics
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics
How do speakers’ formant dynamics reflect individual differences in the production of the sequence / /? How can this dynamic information be captured to characterise individual speakers? Research Questions
bike hike like mike spike / ba I k / / ha I k / / la I k / / ma I k / / spa I k / Target words: /aIk//aIk/
e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject Data set
5 adult male native speakers of Australian English (A, B, C, D, E) aged Brisbane/Gold Coast, Queensland Subjects
Speaker A “bike” (normal-nuclear)
1 2 Speaker A “bike” (normal-nuclear)
% Speaker A “bike” (normal-nuclear)
% Speaker A “bike” (normal-nuclear) F3 F2 F1 F3 F2 F1
F1 normal-nuclear Frequency (Hz) +10% step of / a /
F2 normal-nuclear Frequency (Hz) +10% step of / a /
F3 normal-nuclear Frequency (Hz) +10% step of / a /
Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)
Discriminant Analysis fast-nuclear Function Function ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour
Discriminant Analysis fast-nuclear Function Function ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /a ɪ k /
Discriminant Analysis fast-nuclear Function Function ABCDEABCDE DA constructs discriminant functions which maximise differences between speakers (each function is a linear combination of the formant frequency predictors)
Discriminant Analysis fast-nuclear Function Function ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage…
Discriminant Analysis fast-nuclear Function Function ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage… 95%
Discriminant Analysis 95% 88% 95% 89%
Discussion DA scatterplots and classification rates promising However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information Recall: individuals’ F1 contours of /a ɪ k/ …
F1 normal-nuclear Frequency (Hz) +10% step of / a /
A new approach … Differences in location in frequency range Differences in curvature – location of turning points, convex/concave, steep/shallow Need to capture most defining aspects of the contours efficiently linear regression to parameterise curves with polynomial equations
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept gradient
Linear regression Can also be used for curvilinear relationships y x
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y x
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept y x
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept determine shape and direction of curve y x
Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5
Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5
/a k/ data fit F1, F2, F3 contours with polynomial equations test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a 0 + a 1 t + a 2 t 2 Cubic: y = a 0 + a 1 t + a 2 t 2 + a 3 t 3
actual data points Quadratic fit: y = t t 2 Cubic fit: y = t t t 3 “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t
actual data points Quadratic fit: y = t t 2 R = Cubic fit: y = t t t 3 R = “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t
“bike”, Speaker A (normal-nuclear token 1) actual data points Quadratic fit: y = t t 2 R = Cubic fit: y = t t t 3 R = Frequency (Hz) Normalised time F2 contour y t
DA on polynomial coefficents Quadratic 3 formants x 3 coefficients = 9 predictors Cubic 3 formants x 4 coefficients = 12 predictors Cubic + duration of /a / = 13 predictors
Comparison of Classification Rates % Correct Classification
No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
% Correct Classification 96%92%89%90% No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
Summary of findings Comparing polynomial-based tests & direct measurement-based tests: reduction in classification accuracy small in return for much smaller no. of predictors required Future: aim to develop this approach to enable inclusion of additional information parametrise other dynamic aspects of speech to capture a dense amount of speaker-specific info with a small no. of predictors
Conclusion Differences in formant dynamics reflect differences in articulatory strategies (& VT dimensions) among speakers e.g. speaker-specificity of / a k / formant dynamics - differences in shape and frequency for F1, F2 and F3 - preserved across changes in speaking rate and stress
Conclusion Trialled new technique for characterising individuals’ formant contours using polynomial equations on / a k / data Able to capture almost same amount of speaker-specific information with far fewer predictors Polynomial approach using formant dynamics should make an important contribution to speaker characterisation techniques in future
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge IAFPA 2006