An Algorithm for Determining the Endpoints for Isolated Utterances

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

Describing Quantitative Variables

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.

Chunyi Peng, Guobin Shen, Yongguang Zhang, Yanlin Li, Kun Tan BeepBeep: A High Accuracy Acoustic Ranging System using COTS Mobile Devices.

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Evaluation of Speech Detection Algorithm Project 1b Due October 11.

Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.

Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.

IT-101 Section 001 Lecture #8 Introduction to Information Technology.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

Chih-Hsing Lin, Jia-Shiuan Tsai, and Ching-Te Chiu

Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.

Detecting Speech Project 1. Outline Motivation Problem Statement Details Hints.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Objective of Computer Vision

Project 1 Speech Detection Due: Sunday, February 1 st, 11:59pm.

Evaluation of Speech Detection Algorithm Project 1b Due February 14th.

1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.

Speech Detection Project 1. Outline Motivation Problem Statement Details Hints.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

REMEMBER: A number can be negative or positive as shown in the number line below: Negative Numbers (-) Positive Numbers (+)

Today: Central Tendency & Dispersion

HOW TO ADD INTEGERS REMEMBER:

1 I.Introduction to Algorithm and Programming Algoritma dan Pemrograman – Teknik Informatika UK Petra 2009.

LE 460 L Acoustics and Experimental Phonetics L-13

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

By Sarita Jondhale1 Pattern Comparison Techniques.

IT253: Computer Organization

Learning of Word Boundaries in Continuous Speech using Time Delay Neural Networks Colin Tan School of Computing, National University of Singapore.

Speech Enhancement Using Spectral Subtraction

Voice Recognition for Wheelchair Control Theo Theodoridis, Xin Liu, and Huosheng Hu.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Significant Figures Always record data as accurately as you can (as many sig. figs. as method allows) The last digit of the value that you record should.

Binary A double zero educational presentation. Binary Basics Binary is the language computers use Only 1’s and 0’s can be found in Binary Very large numbers.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

Skip Counting Practice

Digital Image Processing Lecture 17: Segmentation: Canny Edge Detector & Hough Transform Prof. Charlene Tsai.

CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.

CIVET seminar Presentation day: Presenter : Park, GilSoon.

Stand alone system. Team members PC dependent Speech recognition.

CHAPTER 5: Representing Numerical Data

Another Example: Circle Detection

Are Skittles Evenly Distributed?

Control Charts Definition:

Miguel Tavares Coimbra

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee

Speech Analysis TA:Chuan-Hsun Wu

Random walk initialization for training very deep feedforward networks

Computer Vision Lecture 9: Edge Detection II

Linear Predictive Coding Methods

Packet loss concealment using audio morphing

Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.

Chapter 2 Exploring Data with Graphs and Numerical Summaries

By Mr. BZ Engineering and Design Fall 2011

Honors Statistics Review Chapters 4 - 5

Endpoint Detection ( 端點偵測)

Quantitative Data Who? Cans of cola. What? Weight (g) of contents.

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.

Presentation transcript:

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb. 1975, pp. 297-315

Outline Intro to problem Solution Algorithm Summary

Motivation Word recognition needs to detect word boundaries in speech Recognizing silence can reduce: Processing load (Network not identified as savings source) Easy in sound proof room, with digitized tape

Visual Recognition “Eight” Easy Note how quiet beginning is (tape)

Slightly Tougher Visual Recognition “Six” “sss” starts crossing the ‘zero’ line, so can still detect

Tough Visual Recognition “Four” Eye picks ‘B’, but ‘A’ is real start /f/ is a weak fricative

Tough Visual Recognition “Five” Eye picks ‘A’, but ‘B’ is real endpoint V becomes devoiced

Tough Visual Recognition “Nine” Difficult to say where final trailing off ends

The Problem Noisy computer room with background noise Weak fricatives: /f, th, h/ Weak plosive bursts: /p, t, k/ Final nasals Voiced fricatives becoming devoiced Trailing off of sounds (ex: binary, three) Simple, efficient processing Avoid hardware costs

The Solution Two measurements: Simple, fast, accurate Energy Zero crossing rate Simple, fast, accurate

Energy Sum of magnitudes of 10 ms of sound, centered on interval: E(n) =  i=-50 to 50 |s(n + i)|

Zero (Level) Crossing Rate Number of zero crossings per 10 ms Normal number of cross-overs during silence Increase in cross-overs during speech

The Algorithm: Startup At initialization, record sound for 100ms Assume ‘silence’ Measure background noise Compute average (IZC’) and std dev () of zero crossing rate Choose Zero-crossing threshold (IZCT) Threshold for unvoiced speech IZCT = min(25 / 10ms, IZC’ * 2 )

The Algorithm: Thresholds Compute energy, E(n), for interval Get max, IMX Have silence, IMN I1 = 0.03 * (IMX – IMN) + IMN (3% of peak energy) I2 = 4 * IMN (4x silent energy) Get energy thresholds (ITU and ITL) ITL = MIN(I1, I2) ITU = 5 * ITL

The Algorithm: Energy Computation Search sample for energy greater than ITL Save as start of speech, say s Search for energy greater than ITU s becomes start of speech If energy falls below ITL, restart Search for energy less than ITL Save as end of speech Results in conservative estimates Endpoints may be outside

The Algorithm: Zero Crossing Computation Search back 250 ms Count number of intervals where rate exceeds IZCT If 3+, set starting point, s, to first time Else s remains the same Do similar search after end

The Algorithm: Example (Word begins with strong fricative)

Algorithm: Examples “Half” Caught trailing /f/

Algorithm: Examples “Four” Notice how different each “four” is

Evaluation: Part 1 54-word vocabulary Read by 2 males, 2 females No gross errors (off by more than 50ms) Some small errors Losing weak fricatives None affected recognition

Evaluation: Part 2 10 speakers Count 0 to 9 No errors at all

Evaluation 3: Your Project 1

Future Work Three classes of speech: Silence Unvoiced speech Voiced speech May be more computationally intensive solutions that are more effective