YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless.

YAO UC BERKELEY YAOYAO@BERKELEY.EDU http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Overview 2 Background Data Methodology  Algorithm  Tuning the model  Testing Results General Discussion

Background 3 Purpose of the study  To find the point of burst in a word initial voiceless stop (i.e. [p], [t], [k]) Existing approach  Detecting the point of maximal energy change (cf. Niyogi and Ramesh, 1998; Liu, 1996) closereleasevowel onset

Background 4 Our approach  Compare the spectrogram of the target token at each point against that of fricatives and silence  Assess how “fricative-like” and “silence-like” the spectrogram is at each time point  Find the point where “fricative-ness” suddenly rises and “silence-ness” suddenly drops  point of burst

Background 5 Our approach (cont’d)  What do we need?  Spectral features of a given time frame  Spectral templates of fricatives and silence Specific to speaker and the recording environment  Measure and compare fricative-ness and silence-ness  An algorithm to find the most likely point for release  Advantage  Easy to implement  No worries about change in the environment and individual differences

Data 6 Buckeye corpus (Pitt, M. et al. 2005) 40 speakers  All residents of Columbus, Ohio  Balanced in gender and age  One-hour interview  Transcribed at word and phone level  19 used in the current study Target tokens  Transcribed word-initial voiceless stops (e.g. [p], [t], [k])

Methodology: spectral measures 7 Spectral vector  20ms Hamming window  Mel scale  1 × 60 array Spectral template  Speaker-specific, phone-specific  Ignore tokens shorter than average duration of that phone of the speaker  For the remaining tokens  Calculate a spectral vector for the middle 20ms window  Average over the spectral vectors

Methodology: spectral template 8 [a] of F01[f] of F01Silence of F01

Methodology: similarity scores 9 Similarity between spectral vectors x and u  D x,u =  S x,u = e -0.005Dx,u Comparing the given acoustic data against any spectral templates of that speaker  Stepsize = 5ms

Similarity scores Formulae: D x,t = S x,t = e -0.005Dx,t Step size = 5ms 10 - [s] score - score

Methodology: finding the release point 11 Basic idea  Near the release point  - Fricative similarity score rises  - Silence similarity score drops ClosureBurst Fricative-nessLowHigh Silence-nessHighLow closereleasevowel onset Q1: Which fricative to use? Q2: Which period of rise or drop to pick?

Methodology : finding the release point 12 [h] [s] [sh] similarity scores Slope is a better predictor than absolute score value The end point of a period with maximal slope  the release point Which fricative? [sh] score is more consistent than other fricatives

Initial [t] in "doing" Initial [k] in “countries” 13 Methodology : finding the release point [h] [s] [sh] [h] [s] [sh]

Methodology : finding the release point 14 Original algorithm  Find the end point of a period of fastest increase in score  Find the end point of a period of fastest decrease in score  Return the middle point of the two end points as the point of release  If either or both end points cannot be found within the duration of the stop, return NULL.

Methodology : finding the release point 15 Select two speakers’ data to tune the model  Hand-tag the release point for all tokens in the test set.  If the stop doesn’t appear to have a release point on the spectrogram, mark it as a problematic case, and take the end point of the stop as the release point, for calculating error. SpeakerAgeGenderSpeaking rate# of tokens # of test tokens F07OldFemaleSlow (4.022 syll/s)231 M08YoungMaleFast (6.434 syll/sec618261

Methodology : problematic cases 16 no burst no closure weak and double release(??) [sh] [sh]

Methodology : finding the release point 17  Calculate the difference between hand-tagged release point and the estimated one (i.e. error) for each case.  RMS (Root Mean Square) of error is used to measure the performance of the algorithm.

F07 ( n=231 tokens) M08 (n=261 tokens) Methodology : error analysis 18 real release-estimate Add 5ms to the estimation RMS = 7.22ms 4.85ms RMS = 13.11ms 14.ms

Methodology: tuning the algorithm 19 1 st Rejection Rule -- A target token will be rejected if the changes in scores are not drastic enough.  E.g. Insignificant rise  Reject! [sh] [sh]

Methodology: tuning the algorithm 20 Applying 1 st Rejection Rule  Rejecting 4 cases inF07  RMS(+5ms) = 4.19ms  Rejecting 28 cases in M08  covering most of the problematic cases  RMS(+5ms)=9.27ms Error analysis in M08 after 1 st rejection rule RMS(+5ms) = 14ms 9.27ms

Methodology : tuning the algorithm 21 Still a problem…  Multiple releases  Each might corresponds to a rise/drop of the scores Initial [k] in “cause” of M08 [sh] [sh]

Methodology: tuning the algorithm 22 2 nd Rejection Rule -- A target token will be dropped If the points found in and scores are too far apart. (>20ms)  Partly solves the multiple release problem  The ideal way would to identify all candidate release points, and return the first one.

Methodology: tuning the algorithm 23 Applying 2 nd Rejection Rule  Rejecting 3 cases inF07  RMS(+5ms) = 3.22ms  Rejecting 20 cases in M08  Only 2 problematic cases remain  RMS(+5ms) = 3.44ms Error analysis in M08 after 2 nd rejection rule RMS(+5ms) = 9.26ms 3.44ms Compare: Optimal error is 2.5ms given the 5ms step size…

Methodology: tuning the algorithm # of cases RMS (+5ms) Original26113.1114 After 1 st rejection 2339.279.26 After 2 nd rejection 2135.643.44 # of cases RMS (+5ms) Original2317.224.85 After 1 st rejection 2276.814.19 After 2 nd rejection 2246.023.22 24 F07M08 Rejection rate: 3.03% Rejection rate: 15.05%

Methodology: testing the algorithm 25 Select a random sample of 50 tokens from all speakers  Hand-tag the release point  Use the current algorithm together with two rejection rules to find the estimated release.  Compare the hand-tagged point and the estimated one  4 rejected by the 1 st rule (3 were legitimate)  3 rejected by the 2 nd rule (2 were legitimate)  43 accepted cases. RMS(error) <5ms

Methodology: summary 26 Calculate score and score Calculate the slope in score and score In a labeled voiceless stop span, (i)find the time point of largest positive slope in score, and store in p1; (ii)find the time point of smallest negative slope in score, and store in p2 return (p1+p2)/2+0.005 p1 = null or p2 = null |p1–p2|>=0.02 s slope (p1)<0.02 and slope (p2)>0.04 reject the case N N N Y Y Y

Results: grand means 27 Rejection rates (2 rules combined)  Varies from 3. 03% to 30.5% (mean = 13.3%,sd= 8.6%) across speakers. VOT and closure duration [p][t][k] Closure (ms)69.548.954.9 VOT (ms)4851.257.9

Results: VOT by speaker 28

General Discussion 29 Echoing previous findings  Byrd (1993): Closure duration and VOT in read speech  Shattuck-Hufnagel & Veilleux (2007): 13% of missing landmarks in spontaneous speech [p][t][k] Closure (ms)69 (69.5)53 (48.9)60 (54.9) VOT (ms)44 (48)49 (51.2)52 (57.9)

General Discussion 30 Future work  Fine-tune the 2 nd rejection rule  Generalize the exemplar-based method for other automatic phonetic processing problem?

Acknowledgement 31 Anonymous speakers Buckeye corpus developers Prof. Keith Johnson Members of the phonology lab in UC Berkeley Thank you! Any comments are welcome.

References 32 Byrd, D. (1993) 54,000 American stops. UCLA Working Papers in Phonetics. No 83, pp: 97-116. Johnson, K. (2006) Acoustic attribute scoring: A preliminary report. Liu, S. (1996) Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Amer. Vol 100, pp 3417-3430. Niyogi, P., Ramesh, P. (1998) Incorporating voice onset time to improve letter recognition accuracies. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98. Vol 1, pp: 13-16. Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95 Shattuck-Hufnagel, S., Veilleux, N.M. (2007) Robustness of acoustic landmarks in spontaneously-spoken American English. Proceedings of International Congress of Phonetic Science 2007, Saarbrucken, August 2007. Zue, V.W. (1976) Acoustic Characteristics of stop consonants: A controlled study. Sc. D. thesis. MIT, Cambridge, MA.

YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless.

Similar presentations

Presentation on theme: "YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless.

Similar presentations

Presentation on theme: "YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless."— Presentation transcript:

Similar presentations

About project

Feedback