Eng. Shady Yehia El-Mashad

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

ECG Signal processing (2)
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Speaker Adaptation for Vowel Classification
1 Security problems of your keyboard –Authentication based on key strokes –Compromising emanations consist of electrical, mechanical, or acoustical –Supply.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Radial-Basis Function Networks
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Presented by Tienwei Tsai July, 2005
7-Speech Recognition Speech Recognition Concepts
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Predicting Voice Elicited Emotions
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Recognition of bumblebee species by their buzzing sound
Deep Learning Amin Sobhani.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Radial Basis Function G.Anuradha.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
EE513 Audio Signals and Systems
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Support Vector Machines 2
Presentation transcript:

Eng. Shady Yehia El-Mashad Speaker Independent Arabic Speech Recognition Using Support Vector Machine By Eng. Shady Yehia El-Mashad Supervised By Ass. Prof. Dr. Hala Helmy Zayed Dr. Mohamed Ibrahim Sharawy

Agenda Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

Recognition Is one of the basic memory tasks. It involves identifying objects or events that have been encountered before. It is the easiest of the memory tasks. It is easier to recognize something, than to come up with it on your own

Speech Recognition System Also Known as Automatic Speech Recognition or Computer Speech Automatic Speech Recognition (ASR) is the process of converting captured speech signals into the corresponding sequence of words in text. ASR systems accomplish three basic tasks: 1- Pre-processing 2- Recognition 3-Communication

How do humans do it? Articulation produces sound waves which the ear conveys to the brain for processing

How might computers do it? Acoustic Signal Acoustic Waveform Speech Recognition

Characteristics of Speech Signal Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

Types of Speech Recognition There are two main types of speaker models: Speaker independent Speaker independent models recognize the speech patterns of a large group of people. (2) Speaker dependent Speaker dependent models recognize speech patterns from only one person.

Speech Recognition Usually Concern Three Types of Speech (1) Isolated Word Recognition Is the simplest speech type because it requires the user to pause between each word. (2) Connected Word Recognition Is capable of analyzing a string of words spoken together, but not at normal speech rate. (3) Connected Speech Recognition (Continuous Speech Recognition) Allows for normal conversational speech.

Factors that affect the speech signal - Speaker gender - Speaker identity - Speaker language - Psychological conditions - Speaking style - Environmental conditions

Some of the difficulties related to speech recognition - Digitization: Converting analogue signal into digital representation - Signal processing: Separating speech from background noise - Phonetics: Variability in human speech - Continuity: Natural speech is continuous.

The Three-State Representation Three-state representation is one way to classify events in speech. The events of interest for the three-state representation are: • Silence (S) - No speech is produced. • Unvoiced (U) - Vocal cords are not vibrating, resulting in an aperiodic or random speech waveform. • Voiced (V) - Vocal cords are vibrating periodically, resulting in a speech waveform that is quasi-periodic.

Fig. Three State Speech Representation

Applications of Speech Recognition Security (2) Education (3) Control (4) Diagnosis (5) Dictation

History of Speech & Previous Research Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

History of Speech

Previous Research(Arabic Speech) Title Combined Classifier Based Arabic Speech Recognition Comparative Analysis of Arabic Vowels using Formants and an Automatic Speech Recognition System HMM AUTOMATIC SPEECH RECOGNITION SYSTEM OF ARABIC ALPHADIGITS Phonetic Recognition of Arabic Alphabet letters using Neural Networks Source INFOS2008, March 27-29, 2008 Cairo-Egypt © 2008 Faculty of Computers & Information-Cairo University International Journal of Signal Processing, Image Processing and Pattern Recognition- Vol. 3, No. 2; June-2010 the Arabian Journal for Science and Engineering, Volume 35, Number 2C; December- 2010 International Journal of Electric & Computer Sciences IJECS-IJENS, Vol: 11, No: 01; February-2011 The technique used ANN HMM Type of neural network combined classifier ----------------- PCA The scope of speech 6 isolated word from Holy Quran Arabic vowels (10 words) Alpha Digits “Saudi Accented “ Arabic Alphabet Performance 93% 91.6% 76% 96%

Previous Research(Arabic Digits) Title Recognition of Spoken Arabic Digits Using Neural Predictive Hidden Markov Models Efficient System for Speech Recognition using General Regression Neural Network Speech Recognition System of Arabic Digits based on A Telephony Arabic Corpus Radial Basis Functions With Wavelet Packets For Recognizing Arabic Speech Source The International Arab Journal of Information Technology, Vol. 1; July-2004 International Journal of Intelligent Systems and Technologies 1; 2 © www.waset.org Spring 2006 Intensive Program on Computer Vision (IPCV'08), Joensuu, Finland; August-2008 CSECS '10 Proceedings of the 9th WSEAS international conference on Circuits, systems, electronics, control and signal processing; 2010 The technique used Neural Network and Hidden Markov Model ANN HMM Type of neural network MLP general regression neural network (GRNN) --------------- RBF The scope of speech Arabic Digits “Saudi accented “ Performance 88% 85- 91% 93.72% 87-93 %

The Proposed System Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

The Proposed System

The Proposed System 1.Recording System

The Proposed System 2. Data Set Creating of a speech database is important for the development researcher. For English language: we don't need to create a database because there is already more than one have been created to help the researcher on their research like sphinx1,2,3&4 and Australian English For Arabic language, we should try to create a database that help us.  

The Proposed System 2. Data Set

The Proposed System 3. The Segmentation System Segmentation process is implemented by two techniques; semi-automatic and fully-automatic. Semi-automatic technique: We adopt the segmentation parameters which are window size, minimum amplitude, minimum frequency, maximum frequency, minimum silence, minimum speech, and minimum word manually by trial and error. In this technique, we achieve only 70 percent performance, which is not very high and with this technique we can’t continue in our system because we still have two stages after that which is the feature extraction and the recognition

The Proposed System 3. The Segmentation System 0 1 6 5 3 2 0 4 0 3 X 0 1 6 5 3 2 0 4 0 3 -

The Proposed System 3. The Segmentation System Fully-automatic techniques These parameters are set automatically to get better performance by using the K-Mean clustering. By this technique we achieve nearly 100 percent in the segmentation of the digits. The K-Means Algorithm process is as follows: The dataset is partitioned into K clusters and the data points are randomly assigned to the clusters. For each data point: Calculate the distance from the data point to each cluster. If the data point is closest to its own cluster, we leave it, and if not move it into the closest cluster. Repeating the above step until a complete pass through all the data points resulting in no data point moving from one cluster to another.

The Proposed System 4. Feature Extraction When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called Feature extraction. The feature vector must contain information that is - useful to identify and differentiate speech sounds - identify and differentiate between speakers There are some methods such as FFT, LPC, Real Cepstrum and MFCC.

The Proposed System Mel Frequency Cepstrum Coefficients (MFCC): Take the Fourier transform of a signal. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. Take the logs of the powers at each of the mel frequencies. Take the discrete cosine transform of the list of mel log powers, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum.

The Proposed System 5. Neural Network Classifier There are many Neural Models, Each model has advantages and disadvantages depending on the application. According to our application we choose Support Vector Machine (SVM)

The Proposed System Support Vector Machine (SVM): A Support Vector Machine (SVM) is implemented using the kernel Adatron algorithm which constructs a hyperplane or set of hyperplanes in a high dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.

Support Vector Machine (SVM): The Proposed System Support Vector Machine (SVM): H3 (green) doesn’t separate the two classes; H1 (blue) separates the two classes but with a small margin and H2 (red) separates the two classes with the maximum margin.

Results and Conclusions Introduction Characteristics of Speech Signal History of Speech & Previous Research The Proposed System Results and Conclusions

Results and Conclusions Training and Testing Support Vector Machine (SVM) We use the SVM network and adapting its parameter as follows: no hidden layers. The output layer has 10 neurons. And we train with maximum epochs of 1000. We have 10000 samples, we divide them into: Training: 70% Cross Validation: 15% Testing: 15%

Results and Conclusions Cross Validation Confusion Matrix of the SVM 1 2 3 4 5 6 7 8 9 100.00 0.00 96.00 4.00 90.00 10.00 94.00 2.00 7.00 89.00 92.00 8.00 6.00

Results and Conclusions The Testing Confusion Matrix of the SVM Output / Desired 1 2 3 4 5 6 7 8 9 255 313 122 105 135 90 83 119 95

Results and Conclusions Performance = (255+313+122+105+135+90+83+119+95+95) / 1500 = 1412 / 1500 = 94.13 %

Results and Conclusions A spoken Arabic digits recognizer is designed to investigate the process of automatic digits recognition. The Segmentation process is implemented by two techniques; semi-automatic and fully-automatic. The feature extracted by using MFCC technique. This system is based on NN and by using Colloquial Egyptian dialect within a noisy environment and carried out by neuro solution tools. The performance of the system is nearly 94% when we use (SVM).

THANK YOU!