핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.

M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Confidence Measures for Speech Recognition Reza Sadraei.

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Vision-Based Biometric Authentication System by Padraic o hIarnain Final Year Project Presentation.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.

Exact Indexing of Dynamic Time Warping

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.

Speech Processing Laboratory

7-Speech Recognition Speech Recognition Concepts

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

HOUGH TRANSFORM Presentation by Sumit Tandon

Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI,

Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.

Fast Census Transform-based Stereo Algorithm using SSE2

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Speed improvements to information retrieval-based dynamic time warping using hierarchical K-MEANS clustering Presenter: Kai-Wun Shih Gautam Mantena 1,2.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.

Automated Evaluation of Physical Therapy Exercises by Multi-Template Dynamic Time Warping of Wearable Sensor Signals Aras Yurtman and Billur Barshan.

V4 – Video Tracker for Extremely Hard Night Conditions

Image Processing.

Online Multiscale Dynamic Topic Models

2 Research Department, iFLYTEK Co. LTD.

Interactive Offline Tracking for Color Objects

Artificial Intelligence for Speech Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

CSSE463: Image Recognition Day 11

Accelerometer-Based Character Recognition Pen

Supervised Time Series Pattern Discovery through Local Importance

Chen Jimena Melisa Parodi Menashe Shalom

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.

CSSE463: Image Recognition Day 11

Objective of This Course

Automated Evaluation of Physical Therapy Exercises by Multi-Template Dynamic Time Warping of Wearable Sensor Signals Aras Yurtman and Billur Barshan.

Dynamic Time Warping and training methods

Handwritten Characters Recognition Based on an HMM Model

Connected Word Recognition

A maximum likelihood estimation and training on the fly approach

Accelerometer-Based Character Recognition Pen

CSSE463: Image Recognition Day 11

CSSE463: Image Recognition Day 11

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Measuring the Similarity of Rhythmic Patterns

Keyword Spotting Dynamic Time Warping

An Algorithm for Determining the Endpoints for Isolated Utterances

Auditory Morphing Weyni Clacken

Presentation transcript:

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology

Contents Keyword Spotting Dynamic Time Warping (DTW) Meaning & Necessity Problems Dynamic Time Warping (DTW) Advantages of DTW Some conventional types & Proposed DTW type Experimental Results Verification of proposed DTW performance Standard threshold setting Results of various conditions Conclusions

Keyword Spotting Meaning Necessity Detection of pre-defined keywords in the continuous speech Example) Keywords : ‘open’, ‘window’ Input : “um…okay, uh… please open the…uh…window” Necessity Human may say OOV(Out Of Vocabulary), sometimes stammer But machine only needs some specific words for recognition

Problems & Goal Difficulties Goal of process of implementation End-Point-Detection of speech segment Rejection of OOVs of implementation A big load of calculations Complex algorithm Hard to build up a real hardware system Goal Simple & Fast Algorithm

DTW for Keyword Spotting Hidden Markov Model (HMM) A statistical model : need large number of datum for training Complex algorithm : hard to implement a hardware system Many parameters : can cause memory problem Dynamic Time Warping (DTW) Advantages Small number of datum for training Simple algorithm (addition & multiplication) Small number of stored datum Weak points Need EPD process, Many calculations

General DTW Process Known both End Points Repetition of searches Finding corresponding frames

Advanced DTW Myers, Rabiner and Rosenberg No EPD Process Series of small area searches Global search in one area Setting next area around the best match point of local area Reducing amount of calculations but still much Tested in isolated word recognition

Proposal – Shape & Weights No EPD process Only one path Select the best match point and search again at the point Less computations Modifying weights To compensate weight-sum differences For search For distance accumulation

Proposal – End Point Small search area End condition Successive local searches Start search at one point End condition When the point is on the last frame of Ref. pattern Setting up End Point automatically

Proposal – Distance Modifying distance Using differences of pattern lengths Pattern lengths of same words are similar each other

DTW – Computation Loads 3 types

Data Base & EX-SET DB SET construction RoadRally Usages For keyword spotting Based on telephone channel Usages 11 keywords (Total 434 occurrences) 40 male speakers read speech (Total 47 min.) in Stonehenge SET construction 4 sub-set (about 108 keywords / set) 3 set for training , 1 set for test 2 reference patterns / keyword / set

Verification Result Isolated Word Recognition Test Set 3 set for training , 1 set for test Test Set Recognition Rate (%) General DTW Proposed DTW 1 96.3 98.2 2 100.0 99.1 3 95.4 4 97.2 Avg. 97.5

Experimental Setup Assumption Threshold Result presentation Any frame can be the last frame of keywords Threshold To reject OOV 1 threshold / ref. Standard threshold : no false alarm in training set Result presentation ROC (Receiver Operator Characteristic) X-axis : false alarm / hour / keyword Y-axis : recognition rate

Thresholds Setting & Recognition Rate of Training Set Training set = Test set (No false alarm) Keyword Right Total % Mountain 21 40 52.5 Secondary 38 95.0 Middleton 27 37 73.0 Boonsboro 32 39 82.1 Conway 33 82.5 Thicket 30 77.0 Keyword Right Total % Primary 34 40 85.0 Minus 25 39 64.1 Interstate 37 92.5 Waterloo 35 87.5 Retrace 36 90.0 368 434 84.8

Result – DTW & HMM ROC Curve

Changing Conditions No. of Keywords No. of References

Conclusion Proposed DTW Keyword Spotting Advantages Good performance Simple structure : addition & multiplication (good for hardware) No EPD processing Very small computation load Small stored datum : small memory Only keyword information Good performance Keyword Spotting Better than HMM in the case of small training datum