Download presentation
Presentation is loading. Please wait.
Published bySydney Harper Modified over 9 years ago
1
Speech Recognition Raymond Sastraputera
2
Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch ◦ Correlation and Candidate ◦ Optimal Candidate ◦ Buffer Delay Added Bias Test and Result Conclusion
3
Estimates the pitch on a speech Written in C++
4
Frame segment are shifted with no overlap Frame segment Buffer
5
Initial detection of silent |max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| Threshold Value (50dB) XYZ
6
Correlation of two vectors
7
Correlation P(x,y) Calculate for different window size (n m ) ◦ Window size will be the pitch value (in sample) ◦ Correlation value above threshold become candidate with score 1 XYZ Vector xVector y nmnm nmnm
8
Correlation P(y,z) Calculate for different n m ◦ Only for window size in candidate score 1 ◦ Correlation value above threshold become candidate with score 2 XYZ Vector yVector z nmnm nmnm
9
Correlation Q(n,m) Calculate for different n m ◦ n MAX is maximum n m in the candidate Optimal Candidate ◦ if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm XYZ Vector xVector z n MAX nmnm
10
Candidate score 1 Correlation P(x,y) ◦ No candidate silence ◦ Single candidate compute P(y,z) Score stays at 1 hold Score 2 estimated pitch ◦ Multi candidate compute P(y,z) Candidate score 2 Correlation P(y,z) ◦ No candidate compute Q(n,m) candidate score1 ◦ Single candidate estimated pitch ◦ Multi candidate compute Q(n,m) Optimal Pitch Correlation Q(n,m)
11
Single candidate with score 2 From Q(n,m) of ◦ Candidate score 2 ◦ Candidate score 1 On hold, and next frame estimated pitch is neither silence nor on hold.
12
Delay the returning value of estimated pitch ◦ Needed to limit the duration of on hold
13
Conditions: ◦ Two previous frame is not silent ◦ Previous frame is not on hold ◦ Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch
14
P(x,y) is doubled
15
correlation_threshold_silent(0.88) Qnm_optimal_multiplier(0.77) sample_rate(20000.0F) max_pitch(400) min_pitch(50) pitch_buffer_size(20) bias_max_frequency(7/4) bias_min_frequency(5/8) silent_threshold(50.0F)
19
Some improvement can be done to increase the performance of the estimated pitch. ◦ Reduce the search space ◦ Adding 1 st order derivaiton of the pitch ◦ Filtering the outlier / noise Current algorithm might not be fast enough to perform in real time
20
Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.