Spoken Digit Recognition

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
CMSC Assignment 1 Audio signal processing
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Digital Signal Processing A Merger of Mathematics and Machines 2002 Summer Youth Program Electrical and Computer Engineering Michigan Technological University.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
A PRESENTATION BY SHAMALEE DESHPANDE
Advisor: Prof. Tony Jebara
Representing Acoustic Information
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Kinect Player Gender Recognition from Speech Analysis
Eng. Shady Yehia El-Mashad
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Speech and Language Processing
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
CS 445/656 Computer & New Media
Recognition of bumblebee species by their buzzing sound
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Processing AEGIS RET All-Hands Meeting
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Customer Satisfaction Based on Voice
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Leigh Anne Clevenger Pace University, DPS ’16
PROJECT PROPOSAL Shamalee Deshpande.
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Popular Music Vocal Analysis
Audio and Speech Computers & New Media.
Richard M. Stern demo January 12, 2009
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Julia Hirschberg and Sarah Ita Levitan CS 6998
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Presentation transcript:

Spoken Digit Recognition Yi-Pei Chen 5/2/2016

Motivation

Steps Speech Acquisition Signal Preprocessing Feature Extraction Continuous speech waveform Elimination of background noise Framing Windowing MFCC Classifier Output Preprocessing is elimination of back ground noise, framing and windowing. Back ground noise is removed from the data. Continuous speech has been separated into frames. That method is known as framing. Windowing is used to determine the portion of the speech signal. Feature Extraction identify the components of the audio signal that are good for identifying the linguistic content MLP KNN SVM

Mel Frequency Cepstral Coefficients Speech: sounds generated by a human are filtered by the shape of the vocal tract including tongue, teeth etc. Process: Take the Fourier transform of (a windowed excerpt of) a signal. Map the powers of the spectrum onto the mel scale, using triangular overlapping windows. Take the logs of the powers at each of the mel frequencies. Take the discrete cosine transform of the list of mel log powers, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum. This shape determines what sound comes out. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced.

Dataset: Spoken Arabic Digit 44 males and 44 females’ native Arabic speakers collected by the Laboratory of Automatic and Signals, University of Badji- Mokhtar Annaba, Algeria 8800(10 digits x 10 repetitions x 88 speakers) time series of 13 MFCCs Sampling rate: 11025 Hz, 16 bits Window applied: hamming window

Current State Working on choosing the best classifier

References M.Kalamani, Dr.S.Valarmathy, S.Anitha. “Automatic Speech Recognition using ELM and KNN Classifiers” Vol 3, Issue 4, April 2015, IJIRCCE RICHARD P. LIPPMANN. “Neural Network Classifiers for Speech Recognition” The Lincoln Laboratory Journal, Volume 1, Number 1 (1988), 1-18, MIT Jean Hennebert, Martin Hasler and Hervé Dedieu , “Neural Networks in Speech Recognition” Department of Electrical Engineering, Swiss Federal Institute of Technology Abdul Ahad, Ahsan Fayyaz, Tariq Mehmood. “Speech Recognition using Multilayer Perceptron” p.103- 109, Vol.1, Students Conference, 2002. ISCON '02. Proceedings. IEEE Issam Bazzi. “Using Support Vector Machines for Spoken Digit Recognition” p.48-49, MIT Laboratory for Computer Science Spoken Language Systems Group MFCC tutorial: http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel- frequency-cepstral-coefficients-mfccs/