Download presentation
Presentation is loading. Please wait.
1
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge kem37@cam.ac.uk IAFPA 2006
2
Speaker characteristics and static features of speech Most previous research has focussed on static features - instantaneous, average Straightforward to measure Natural progression from other research areas – delineation of different languages and language varieties
3
Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population look to dynamic (time-varying) features Speaker characteristics and static features of speech
4
More information than static Reflect movement of a person’s speech organs as well as dimensions - people move in individual ways for skilled motor activities - walking, running, … and speech Dynamic features of speech
5
can view speech as achievement of a series of linguistic ‘targets’ speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways examine formant frequency dynamics
6
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics
7
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) 10% Formant dynamics
8
Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics
9
How do speakers’ formant dynamics reflect individual differences in the production of the sequence / /? How can this dynamic information be captured to characterise individual speakers? Research Questions
10
bike hike like mike spike / ba I k / / ha I k / / la I k / / ma I k / / spa I k / Target words: /aIk//aIk/
11
e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject Data set
12
5 adult male native speakers of Australian English (A, B, C, D, E) aged 22-28 Brisbane/Gold Coast, Queensland Subjects
13
Speaker A “bike” (normal-nuclear)
14
1 2 Speaker A “bike” (normal-nuclear)
15
1 2 10 20 30 40 50 60 70 80 90% Speaker A “bike” (normal-nuclear)
16
1 2 10 20 30 40 50 60 70 80 90% Speaker A “bike” (normal-nuclear) F3 F2 F1 F3 F2 F1
17
F1 normal-nuclear Frequency (Hz) +10% step of / a /
18
F2 normal-nuclear Frequency (Hz) +10% step of / a /
19
F3 normal-nuclear Frequency (Hz) +10% step of / a /
20
Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)
21
Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour
22
Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /a ɪ k /
23
Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE DA constructs discriminant functions which maximise differences between speakers (each function is a linear combination of the formant frequency predictors)
24
Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage…
25
Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage… 95%
26
Discriminant Analysis 95% 88% 95% 89%
27
Discussion DA scatterplots and classification rates promising However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information Recall: individuals’ F1 contours of /a ɪ k/ …
28
F1 normal-nuclear Frequency (Hz) +10% step of / a /
29
A new approach … Differences in location in frequency range Differences in curvature – location of turning points, convex/concave, steep/shallow Need to capture most defining aspects of the contours efficiently linear regression to parameterise curves with polynomial equations
30
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
31
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
32
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x
33
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x
34
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept
35
Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept gradient
36
Linear regression Can also be used for curvilinear relationships y x
37
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y x
38
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept y x
39
Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept determine shape and direction of curve y x
40
Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5
41
Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5
42
/a k/ data fit F1, F2, F3 contours with polynomial equations test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a 0 + a 1 t + a 2 t 2 Cubic: y = a 0 + a 1 t + a 2 t 2 + a 3 t 3
43
actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t 2 Cubic fit: y = 478.85 - 46.07t + 35.62t 2 - 3.46t 3 “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t
44
actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t 2 R = 0.879 Cubic fit: y = 478.85 - 46.07t + 35.62t 2 - 3.46t 3 R = 0.978 “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t
45
“bike”, Speaker A (normal-nuclear token 1) actual data points Quadratic fit: y = 876.01 - 53.24t + 22.46t 2 R = 0.985 Cubic fit: y = 825.49 + 55.64t - 13.63t 2 + 3.01t 3 R = 0.991 Frequency (Hz) Normalised time F2 contour y t
46
DA on polynomial coefficents Quadratic 3 formants x 3 coefficients = 9 predictors Cubic 3 formants x 4 coefficients = 12 predictors Cubic + duration of /a / 12 + 1 = 13 predictors
47
Comparison of Classification Rates % Correct Classification
48
No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
49
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
50
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
51
% Correct Classification 96%92%89%90% No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
52
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
53
% Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates
54
Summary of findings Comparing polynomial-based tests & direct measurement-based tests: reduction in classification accuracy small in return for much smaller no. of predictors required Future: aim to develop this approach to enable inclusion of additional information parametrise other dynamic aspects of speech to capture a dense amount of speaker-specific info with a small no. of predictors
55
Conclusion Differences in formant dynamics reflect differences in articulatory strategies (& VT dimensions) among speakers e.g. speaker-specificity of / a k / formant dynamics - differences in shape and frequency for F1, F2 and F3 - preserved across changes in speaking rate and stress
56
Conclusion Trialled new technique for characterising individuals’ formant contours using polynomial equations on / a k / data Able to capture almost same amount of speaker-specific information with far fewer predictors Polynomial approach using formant dynamics should make an important contribution to speaker characterisation techniques in future
57
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge kem37@cam.ac.uk IAFPA 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.