Speaker Classification through Deep Learning

Slides:

Advertisements

Similar presentations

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

Advertisements

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.

Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.

Artificial Neural Networks (ANNs)

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Neural Networks Lab 5. What Is Neural Networks? Neural networks are composed of simple elements( Neurons) operating in parallel. Neural networks are composed.

Eng. Shady Yehia El-Mashad

Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

1 st Neural Network: AND function Threshold(Y) = 2 X1 Y X Y.

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

NEURAL NETWORKS FOR DATA MINING

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

A Hierarchical Deep Temporal Model for Group Activity Recognition

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.

Machine Learning Supervised Learning Classification and Regression

Big data classification using neural network

Intrusion Detection using Deep Neural Networks

ECE 417 Lecture 1: Multimedia Signal Processing

Outline Problem Description Data Acquisition Method Overview

Environment Generation with GANs

Deep Learning Amin Sobhani.

Data Mining, Neural Network and Genetic Programming

ANN-based program for Tablet PC character recognition

ARTIFICIAL NEURAL NETWORKS

Recurrent Neural Networks for Natural Language Processing

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

Progress Report WANG XUN 2015/10/02.

Conditional Random Fields for ASR

Intro to NLP and Deep Learning

Basic machine learning background with Python scikit-learn

Policy Compression for MDPs

ASAP and Deep ASAP: End-to-End Audio Sentiment Analysis Pipelines

Speech Recognition Christian Schulze

Introduction to Deep Learning for neuronal data analyses

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

Bird-species Recognition Using Convolutional Neural Network

A critical review of RNN for sequence learning Zachary C

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

A First Look at Music Composition using LSTM Recurrent Neural Networks

شبکه عصبی تنظیم: بهروز نصرالهی-فریده امدادی استاد محترم: سرکار خانم کریمی دانشگاه آزاد اسلامی واحد شهرری.

Introduction to Deep Learning with Keras

Ungraded quiz Unit 4.

network of simple neuron-like computing elements

[Figure taken from googleblog

Neural Networks Geoff Hulten.

Introduction to Radial Basis Function Networks

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Deep neural networks for spike sorting: exploring options

Automatic Handwriting Generation

The Updated experiment based on LSTM

LHC beam mode classification

Andrew Karl, Ph.D. James Wisnowski, Ph.D. Lambros Petropoulos

Ungraded quiz Unit 3.

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Speaker Classification through Deep Learning Jacob Morris Alex Douglass Luke Woodbury 1

Overview Goals Potential Applications Learn more about deep learning! Create a neural network that will classify voice recordings based on gender, age, natural language, etc. Potential Applications Research Security 2

Software Dependencies Python 2.7 Keras 1.2.2 Theano Matplotlib 3

Hardware GeForce TitanX (Pascal) 12gb memory 4 https://6lli539m39y3hpkelqsm3c2fg-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Natoli-CPUvGPU-peak-DP-600x.png 4

Speech Accent Archive WAV files Categorizations 2300+ different speakers All recorded speaking same paragraph Categorizations Age Gender English Residence Natural Language Country Learning Style Etc. 5

The Essence of Deep Learning http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/img/spiral.1-2.2-2-2-2-2-2.gif

Artificial Neural Networks (ANN) http://cs231n.github.io/assets/nn1/neural_net2.jpeg 7

Recurrent Networks Layer "remembers" data 8 http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png 8

LSTM Long Short Term Memory 9 http://deephash.com/2016/10/16/lstm-journey-tensorflow/ 9

Problem Type Sequence Classification Supervised Learning Assign classification label(s) to input sequences Supervised Learning Each training sample includes the correct output for that sample 10

Variations of Model Topologies Inputs Sequence of amplitudes Discrete Fourier transform of the segment Hidden Layers Variable Outputs Any subset of data categories 11

Training Challenges Process of Exploration Many parameters to tune Results vague, must be interpreted Days required to train a new model 12

Terminology Sample Batch Epoch Base unit of training data 1/100 of a second of audio Batch Group of samples 4 seconds of consecutive samples Epoch Number of batches required to train on entire training data set In our case, 2310 batches

Terminology Sample Batch Epoch Base unit of training data 1/100 of a second of audio Batch Group of samples 4 seconds of consecutive samples Epoch Number of batches required to train on entire training data set In our case, 2310 batches

Loss Measure of how close an output signal is to its expected value Categorical Cross Entropy Emphasizes correct answer

Learning Rate Determines how big of adjustments to make for given loss values

Accuracy Considered correct if the expected output neuron’s activation value is the greatest among all neurons for that category

Initial Attempts Features Short sample lengths WAV inputs only Trained on training set of only 2 speakers 18

Results 19

False Hope Features Changes Short sample lengths Trained on training set of only 2 speakers Changes Both input types 20

Results 21

Hope Features Changes Short sample lengths Both input types Trained on training set of only 2 speakers Changes Train on single batch per speaker per pass through training set Reduced learning rate 22

Results 23

Confirmation Features Changes Short sample lengths Both input types Train on single batch per speaker per pass through training set Changes Trained on full training set of 2300+ speakers 24

Results 25

Refinement Features Changes Short sample lengths Both input types Train on single batch per speaker per pass through training set Changes True Validation Decaying learning rate Epoch duration increased 26

Results 27

UI 28

Conclusion 29

Works Cited Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu 30