An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Slides:

Advertisements

Similar presentations

The Fully Networked Car Geneva, 4-5 March Jean-Pierre Jallet Car Active Noise Cancellation for improved car efficiency, From/In/To car voice communication.

Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Acoustic Echo Cancellation for Low Cost Applications

College of Information Technology & Design

Describing Quantitative Variables

Digital Signal Processing

STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.

Evaluation of Speech Detection Algorithm Project 1b Due October 11.

Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.

Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.

Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.

Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.

SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.

IT-101 Section 001 Lecture #8 Introduction to Information Technology.

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 4 By Herb I. Gross and Richard A. Medeiros next.

Chih-Hsing Lin, Jia-Shiuan Tsai, and Ching-Te Chiu

INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.

Detecting Speech Project 1. Outline Motivation Problem Statement Details Hints.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Project 1 Speech Detection Due: Sunday, February 1 st, 11:59pm.

Auditory User Interfaces

Evaluation of Speech Detection Algorithm Project 1b Due February 14th.

1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.

Speech Detection Project 1. Outline Motivation Problem Statement Details Hints.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

REMEMBER: A number can be negative or positive as shown in the number line below: Negative Numbers (-) Positive Numbers (+)

Today: Central Tendency & Dispersion

HOW TO ADD INTEGERS REMEMBER:

LE 460 L Acoustics and Experimental Phonetics L-13

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

By Sarita Jondhale1 Pattern Comparison Techniques.

Computer Some basic concepts. Binary number Why binary? Look at a decimal number: 3511 Look at a binary number: 1011 counting decimal binary

IT253: Computer Organization

Learning of Word Boundaries in Continuous Speech using Time Delay Neural Networks Colin Tan School of Computing, National University of Singapore.

Podcasts in the Classroom A short intro in using GarageBand Get this presentation here:

Voice Recognition for Wheelchair Control Theo Theodoridis, Xin Liu, and Huosheng Hu.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Dither and Noise Shaping

Fall 2002CMSC Discrete Structures1 Enough Mathematical Appetizers! Let us look at something more interesting: Algorithms.

Quick and Easy Binary to dB Conversion George Weistroffer, Jeremy Cooper, and Jerry Tucker Electrical and Computer Engineering Virginia Commonwealth University.

1 Introduction to Information Technology LECTURE 6 AUDIO AS INFORMATION IT 101 – Section 3 Spring, 2005.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Significant Figures Always record data as accurately as you can (as many sig. figs. as method allows) The last digit of the value that you record should.

Understanding English Variation Connected Speech Processes What are connected speech processes? Connected speech processes are changes in the pronunciation.

Binary A double zero educational presentation. Binary Basics Binary is the language computers use Only 1’s and 0’s can be found in Binary Very large numbers.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

Turning a Mobile Device into a Mouse in the Air

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

Digital Image Processing Lecture 17: Segmentation: Canny Edge Detector & Hough Transform Prof. Charlene Tsai.

DATA REPRESENTATION: SOUNDS GCSE Computing. Learning Objective ■ To understand how sounds are represented in Binary ■ To be able to convert a sound wave.

CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.

Copyright © Cengage Learning. All rights reserved. 4 Integrals.

Stand alone system. Team members PC dependent Speech recognition.

Control Charts Definition:

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee

Packet loss concealment using audio morphing

Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.

Honors Statistics Review Chapters 4 - 5

Endpoint Detection ( 端點偵測)

Quantitative Data Who? Cans of cola. What? Weight (g) of contents.

An Algorithm for Determining the Endpoints for Isolated Utterances

Presentation transcript:

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb. 1975, pp

Outline Intro to problem Solution Algorithm Summary

Motivation Word recognition needs to detect word boundaries in speech Recognizing silence can reduce: – Processing load – (Network not identified as savings source) – (Hands-free operation not identified as convenience) Relatively easy in sound proof room, with digitized tape

Visual Recognition Easy Note how quiet beginning is (tape) “Eight”

Slightly Tougher Visual Recognition “sss” starts crossing the ‘zero’ line, so can still detect “Six”

Tough Visual Recognition Eye picks ‘B’, but ‘A’ is real start – /f/ is a weak fricative “Four”

Tough Visual Recognition Eye picks ‘A’, but ‘B’ is real endpoint – V becomes devoiced “Five”

Tough Visual Recognition Difficult to say where final trailing off ends “Nine”

The Problem Noisy computer room with background noise – Weak fricatives: /f, th, h/ – Weak plosive bursts: /p, t, k/ – Final nasals (ex: “nine”) – Voiced fricatives becoming devoiced (ex: “five”) – Trailing off of sounds (ex: “binary”, “three”) Need to do with simple, efficient processing – Avoid hardware costs

The Solution Two measurements: – Energy – Zero crossing rate Show: simple, fast, accurate

Energy Sum of magnitudes of 10 ms of sound, centered on interval: – E(n) =  i=-50 to 50 |s(n + i)|

Zero (Level) Crossing Rate Remember, digital audio values are changes in air pressure (higher or lower than base) Base/midpoint is “zero” – But is always positive if unsigned (e.g., 127 if unsigned byte) Zero crossing rate is number of zero crossings per 10 ms – Normal number of cross-overs during silence – Increase in cross-overs during speech

The Algorithm: Startup At initialization, record sound for 100ms – A measure background noise – Assume ‘silence’ Compute average (IZC’) and std dev (  ) of zero crossing rate Choose zero-crossing threshold (IZCT) – Threshold for unvoiced speech – IZCT = min(25 / 10ms, IZC’ + 2  )

The Algorithm: Thresholds Compute energy, E(n), for interval – Get max, IMX – Have ‘silence’ energy, IMN – Compute to values: I1 = 0.03 * ( IMX – IMN ) + IMN (3% of peak energy) I2 = 4 * IMN (4x silent energy) Get energy thresholds (ITU and ITL) – ITL = MIN(I1, I2) – ITU = 5 * ITL

The Algorithm: Energy Computation Search sample for energy greater than ITL – Save as start of speech, say s Search for energy greater than ITU – s becomes start of speech – If energy falls below ITL, restart Search for energy less than ITL – Save as end of speech Results in conservative estimates – Endpoints may be outside

The Algorithm: Zero Crossing Computation Search back 250 ms – Count number of intervals where rate exceeds IZCT If 3+, set starting point, s, to first time Else s remains the same Do similar search after end

The Algorithm: Example (Word begins with strong fricative)

Algorithm: Examples Caught trailing /f/ “Half”

Algorithm: Examples “Four” (Notice how different each “four” is)

Evaluation: Part 1 54-word vocabulary Read by 2 males, 2 females No gross errors (off by more than 50 ms) Some small errors – Losing weak fricatives – None affected recognition

Evaluation: Part 2 10 speakers Count 0 to 9 No errors at all

Evaluation: Part 3 Your Project 1b…