Speaker Classification through Deep Learning

Speaker Classification through Deep Learning
Jacob Morris Alex Douglass Luke Woodbury 1

Overview Goals Potential Applications Learn more about deep learning!
Create a neural network that will classify voice recordings based on gender, age, natural language, etc. Potential Applications Research Security 2

Software Dependencies
Python 2.7 Keras 1.2.2 Theano Matplotlib 3

Hardware GeForce TitanX (Pascal) 12gb memory 4
4

Speech Accent Archive WAV files Categorizations
2300+ different speakers All recorded speaking same paragraph Categorizations Age Gender English Residence Natural Language Country Learning Style Etc. 5

The Essence of Deep Learning

Artificial Neural Networks (ANN)
7

Recurrent Networks Layer "remembers" data 8
8

LSTM Long Short Term Memory 9
9

Problem Type Sequence Classification Supervised Learning
Assign classification label(s) to input sequences Supervised Learning Each training sample includes the correct output for that sample 10

Variations of Model Topologies
Inputs Sequence of amplitudes Discrete Fourier transform of the segment Hidden Layers Variable Outputs Any subset of data categories 11

Training Challenges Process of Exploration Many parameters to tune
Results vague, must be interpreted Days required to train a new model 12

Terminology Sample Batch Epoch Base unit of training data
1/100 of a second of audio Batch Group of samples 4 seconds of consecutive samples Epoch Number of batches required to train on entire training data set In our case, 2310 batches

Loss Measure of how close an output signal is to its expected value
Categorical Cross Entropy Emphasizes correct answer

Learning Rate Determines how big of adjustments to make for given loss values

Accuracy Considered correct if the expected output neuron’s activation value is the greatest among all neurons for that category

Initial Attempts Features Short sample lengths WAV inputs only
Trained on training set of only 2 speakers 18

Results 19

False Hope Features Changes Short sample lengths
Trained on training set of only 2 speakers Changes Both input types 20

Results 21

Hope Features Changes Short sample lengths Both input types
Trained on training set of only 2 speakers Changes Train on single batch per speaker per pass through training set Reduced learning rate 22

Results 23

Confirmation Features Changes Short sample lengths Both input types
Train on single batch per speaker per pass through training set Changes Trained on full training set of speakers 24

Results 25

Refinement Features Changes Short sample lengths Both input types
Train on single batch per speaker per pass through training set Changes True Validation Decaying learning rate Epoch duration increased 26

Results 27

Conclusion 29

Works Cited Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from 30

Speaker Classification through Deep Learning

Similar presentations

Presentation on theme: "Speaker Classification through Deep Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speaker Classification through Deep Learning

Similar presentations

Presentation on theme: "Speaker Classification through Deep Learning"— Presentation transcript:

Similar presentations

About project

Feedback