Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Advanced Speech Enhancement in Noisy Environments

Hilal Tayara ADVANCED INTELLIGENT ROBOTICS 1 Depth Camera Based Indoor Mobile Robot Localization and Navigation.

Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.

A Novel Finger Assignment Algorithm for RAKE Receivers in CDMA Systems Mohamed Abou-Khousa Department of Electrical and Computer Engineering, Concordia.

MICE TPG RECONSTRUCTION Tracking efficiency Olena Voloshyn Geneva University.

Automatic Feature Extraction for Multi-view 3D Face Recognition

AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

A Study of Approaches for Object Recognition

2000/10/31Chin-Kai Wu, CS, NTHU1 The Effect of Waveform Substitution on the Quality of PCM Packet Communications Ondria J. Wasem, David J. Goodman, Charles.

Localized Key-Finding: Algorithms and Applications Ilya Shmulevich, Olli Yli-Harja Tampere University of Technology Tampere, Finland October 4, 1999.

Elastic Burst Detection: Applications Discovering intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed.

CSSE463: Image Recognition Day 30 Due Friday – Project plan Due Friday – Project plan Evidence that you’ve tried something and what specifically you hope.

Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.

1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

Inputs to Signal Generation.vi: -Initial Distance (m) -Velocity (m/s) -Chirp Duration (s) -Sampling Info (Sampling Frequency, Window Size) -Original Signal.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

Isolated-Word Speech Recognition Using Hidden Markov Models

Data Processing Functions CSC508 Techniques in Signal/Data Processing.

CSSE463: Image Recognition Day 30 This week This week Today: motion vectors and tracking Today: motion vectors and tracking Friday: Project workday. First.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

An efficient method of license plate location Pattern Recognition Letters 26 (2005) Journal of Electronic Imaging 11(4), (October 2002)

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.

Multimedia Technology Image Technology Krich Sintanakul Multimedia and Hypermedia.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Detection and estimation of abrupt changes in Gaussian random processes with unknown parameters By Sai Si Thu Min Oleg V. Chernoyarov National Research.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.

Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

JASON BANICH ADVISOR: DR. JOHN SENG Crosswalk Detection via Computer Vision.

1 Eye Detection in Images Introduction To Computational and biological Vision Lecturer : Ohad Ben Shahar Written by : Itai Bechor.

Channel Independent Viterbi Algorithm (CIVA) for Blind Sequence Detection with Near MLSE Performance Xiaohua(Edward) Li State Univ. of New York at Binghamton.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

Kfir Wolfson & Adi Barchan Final project, autumn 2006.

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

Doc.: IEEE /0125r0 Submission July 2006 Slide 1 Huawei Interference Detection for Sensing IEEE P Wireless RANs Date: Authors:

Tracking Groups of People for Video Surveillance Xinzhen(Elaine) Wang Advisor: Dr.Longin Latecki.

Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Fast face localization and verification J.Matas, K.Johnson,J.Kittler Presented by: Dong Xie.

Frank Bergschneider February 21, 2014 Presented to National Instruments.

Coincidence algorithm and an optimal unification of GW – detectors A.V.Gusev, V.N.Rudenko SAI MSU, Moscow, Russia Moscow University Physics Bulletin, 2009,

DETECTION OF COPY MOVE FORGERY IN DIGITAL IMAGES.

Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.

V4 – Video Tracker for Extremely Hard Night Conditions

Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing

Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.

What you will learn today

Multimodal Caricatural Mirror

CSSE463: Image Recognition Day 30

An Investigation on SYNC Detector False Alarms

CSSE463: Image Recognition Day 30

Govt. Polytechnic Dhangar(Fatehabad)

CSSE463: Image Recognition Day 30

Automatic Prosodic Event Detection

Presentation transcript:

Speech Recognition Raymond Sastraputera

 Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal Candidate ◦ Buffer Delay  Added Bias  Test and Result  Conclusion

 Estimates the pitch on a speech  Written in C++

 Frame segment are shifted with no overlap Frame segment Buffer

 Initial detection of silent  |max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)|  Threshold Value (50dB) XYZ

 Correlation of two vectors

 Correlation P(x,y)  Calculate for different window size (n m ) ◦ Window size will be the pitch value (in sample) ◦ Correlation value above threshold become candidate with score 1 XYZ Vector xVector y nmnm nmnm

 Correlation P(y,z)  Calculate for different n m ◦ Only for window size in candidate score 1 ◦ Correlation value above threshold become candidate with score 2 XYZ Vector yVector z nmnm nmnm

 Correlation Q(n,m)  Calculate for different n m ◦ n MAX is maximum n m in the candidate  Optimal Candidate ◦ if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm XYZ Vector xVector z n MAX nmnm

 Candidate score 1  Correlation P(x,y) ◦ No candidate  silence ◦ Single candidate  compute P(y,z)  Score stays at 1  hold  Score 2  estimated pitch ◦ Multi candidate  compute P(y,z)  Candidate score 2  Correlation P(y,z) ◦ No candidate  compute Q(n,m) candidate score1 ◦ Single candidate  estimated pitch ◦ Multi candidate  compute Q(n,m)  Optimal Pitch  Correlation Q(n,m)

 Single candidate with score 2  From Q(n,m) of ◦ Candidate score 2 ◦ Candidate score 1  On hold, and next frame estimated pitch is neither silence nor on hold.

 Delay the returning value of estimated pitch ◦ Needed to limit the duration of on hold

 Conditions: ◦ Two previous frame is not silent ◦ Previous frame is not on hold ◦ Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch

 P(x,y) is doubled

 correlation_threshold_silent(0.88)  Qnm_optimal_multiplier(0.77)  sample_rate( F)  max_pitch(400)  min_pitch(50)  pitch_buffer_size(20)  bias_max_frequency(7/4)  bias_min_frequency(5/8)  silent_threshold(50.0F)

 Some improvement can be done to increase the performance of the estimated pitch. ◦ Reduce the search space ◦ Adding 1 st order derivaiton of the pitch ◦ Filtering the outlier / noise  Current algorithm might not be fast enough to perform in real time

 Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).