Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.

Speech Recognition Raymond Sastraputera

 Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal Candidate ◦ Buffer Delay  Added Bias  Test and Result  Conclusion

 Estimates the pitch on a speech  Written in C++

 Frame segment are shifted with no overlap Frame segment Buffer

 Correlation of two vectors

 Correlation P(x,y)  Calculate for different window size (n m ) ◦ Window size will be the pitch value (in sample) ◦ Correlation value above threshold become candidate with score 1 XYZ Vector xVector y nmnm nmnm

 Correlation P(y,z)  Calculate for different n m ◦ Only for window size in candidate score 1 ◦ Correlation value above threshold become candidate with score 2 XYZ Vector yVector z nmnm nmnm

 Correlation Q(n,m)  Calculate for different n m ◦ n MAX is maximum n m in the candidate  Optimal Candidate ◦ if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm XYZ Vector xVector z n MAX nmnm

 Candidate score 1  Correlation P(x,y) ◦ No candidate  silence ◦ Single candidate  compute P(y,z)  Score stays at 1  hold  Score 2  estimated pitch ◦ Multi candidate  compute P(y,z)  Candidate score 2  Correlation P(y,z) ◦ No candidate  compute Q(n,m) candidate score1 ◦ Single candidate  estimated pitch ◦ Multi candidate  compute Q(n,m)  Optimal Pitch  Correlation Q(n,m)

 Single candidate with score 2  From Q(n,m) of ◦ Candidate score 2 ◦ Candidate score 1  On hold, and next frame estimated pitch is neither silence nor on hold.

 Delay the returning value of estimated pitch ◦ Needed to limit the duration of on hold

 Conditions: ◦ Two previous frame is not silent ◦ Previous frame is not on hold ◦ Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch

 P(x,y) is doubled

 correlation_threshold_silent(0.88)  Qnm_optimal_multiplier(0.77)  sample_rate(20000.0F)  max_pitch(400)  min_pitch(50)  pitch_buffer_size(20)  bias_max_frequency(7/4)  bias_min_frequency(5/8)  silent_threshold(50.0F)

 Some improvement can be done to increase the performance of the estimated pitch. ◦ Reduce the search space ◦ Adding 1 st order derivaiton of the pitch ◦ Filtering the outlier / noise  Current algorithm might not be fast enough to perform in real time

 Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).

Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.

Similar presentations

Presentation on theme: "Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.

Similar presentations

Presentation on theme: "Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal."— Presentation transcript:

Similar presentations

About project

Feedback