Article and Work by: Justin Salamon and Juan Pablo Bello

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

A brief review of non-neural-network approaches to deep learning
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
ImageNet Classification with Deep Convolutional Neural Networks
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Speaker Adaptation for Vowel Classification
Applications of Wavelet Transform and Artificial Neural Network in Digital Signal Detection for Indoor Optical Wireless Communication Sujan Rajbhandari.
ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Image Enhancement Objective: better visualization of remotely sensed images visual interpretation remains to be the most powerful image interpretation.
Lecture 3b: CNN: Advanced Layers
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Lecture 4b Data augmentation for CNN training
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Welcome deep loria !.
Tenacious Deep Learning
The Relationship between Deep Learning and Brain Function
Summary of “Efficient Deep Learning for Stereo Matching”
Compact Bilinear Pooling
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Data Mining, Neural Network and Genetic Programming
ARTIFICIAL NEURAL NETWORKS
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
Ajita Rattani and Reza Derakhshani,
ECE 6504 Deep Learning for Perception
Lecture 5 Smaller Network: CNN
Training Techniques for Deep Neural Networks
Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis
Deep Belief Networks Psychology 209 February 22, 2013.
CS 698 | Current Topics in Data Science
Urban Sound Classification with a Convolution Neural Network
Urban Sound Classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Introduction to Deep Learning for neuronal data analyses
Bird-species Recognition Using Convolutional Neural Network
Computer Vision James Hays
Introduction to Neural Networks
The Open World of Micro-Videos
Deep learning Introduction Classes of Deep Learning Networks
Object Classification through Deconvolutional Neural Networks
Smart Robots, Drones, IoT
network of simple neuron-like computing elements
CSC 578 Neural Networks and Deep Learning
[Figure taken from googleblog
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural Networks Geoff Hulten.
On Convolutional Neural Network
Outline Background Motivation Proposed Model Experimental Results
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Analysis of Trained CNN (Receptive Field & Weights of Network)
John H.L. Hansen & Taufiq Al Babba Hasan
cs638/838 - Spring 2017 (Shavlik©), Week 7
RCNN, Fast-RCNN, Faster-RCNN
Coding neural networks: A gentle Introduction to keras
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Convolutional Neural Networks
Mihir Patel and Nikhil Sardana
ImageNet Classification with Deep Convolutional Neural Networks
Advances in Deep Audio and Audio-Visual Processing
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CSC 578 Neural Networks and Deep Learning
Reuben Feinman Research advised by Brenden Lake
Automatic Handwriting Generation
Introduction to Neural Networks
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Image recognition.
Presentation transcript:

Deep Convolutional Neural Networks and Data augmentation for Environmental sound classification Article and Work by: Justin Salamon and Juan Pablo Bello Presented by : Dhara Rana

Trained Convolutional Neural Network Overall Goal of Paper Create a way to classify environmental sound given an audio clip Other methods of sound classification: (1) dictionary learning and (2) wavelet filter banks Author solution: Deep Convolutional Neural Network with data augmentation Data Augmentation & segmentation: Log-mel spectrogram Trained Convolutional Neural Network Input: Sound Clip Output: Dog Bark

Data Urban Sound 8k Size: 8732 labeled sound clips Duration: ~ 4 seconds 10 Classes: 0 = air_conditioner 1 = car_horn 2 = children_playing 3 = dog_bark 4 = drilling 5 = engine_idling 6 = gun_shot 7 = jackhammer 8 = siren 9 = street_music All excerpts are taken from field recordings uploaded to www.freesound.org. The files are pre-sorted into ten folds (folders named fold1-fold10) to help in the reproduction of and comparison with the automatic classification results reported in the article above.

Data Augmentation Application of one or more deformation to a collection of annotated training samples which results new, additional training data Types of Audio data augmentation: (1) Time stretching (2) Pitch Shifting (3) Dynamic range compression (4) Background Noise Time stretching : Slow down or speed up the audio signal, while keeping pitch unchanged Pitch Shifting Raise or lower the pitch of audio sample Dynamic range Compression Compress the dynamic range of the sample using 4 parameterizations ??? Background noise Mix the sample with another recording containing background sounds from different type of acoustic scenes Cat Image from: https://towardsdatascience.com/image-augmentation-for-deep-learning-histogram-equalization-a71387f609b2

Data processing: Spectrogram Short-time fourier transform Image from: https://cycling74.com/tutorials/the-phase-vocoder-%E2%80%93-part-I https://utkarsh15.files.wordpress.com/2015/03/stft.png Hop Size: 1014 s/s Fames: 128 Freq. Component: 128 Sampling Frequency: 44100 sample/sec Window size: 1024 samples/sec; ~23 ms

Spectrogram: Another Representation

Proposed Deep CNN (aka SB-CNN) Layer 1 Convolutional layer: 24 filters with receptive field (5,5) Pool layer: Max pooling (4,2) Rectified linear unit (ReLU) activation: h(x)=max(x,0) Layer 2 Convolutional layer: 48 filters with receptive field (5,5) Layer 3 Convolutional layer: 48 filters with receptive field (5,5) Rectified linear unit (ReLU) activation: h(x)=max(x,0) Layer 4 Fully Connected Layer: 64 Hidden Layer Layer 5 Fully Connected Layer: 10 Hidden Layer Softmax Activation ~ 0-1 The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1 (check it on the figure above). Constant Learning rate of : 0.01 Dropout is applied to the input of the last 2 layers wwith probability of 0.5 L2 regularization is applied to weights of the last 2 layers with penalty factor of 0.001 Model is trained

Tuning the Deep CNN CNN is implemented in Python using Lasagne Constant Learning rate of 0.01 Dropout is applied to the input of last 2 layers with probability of 0.5 L2 regularization applied to the last 2 layers Model is trained for 50 epochs Constant Learning rate of : 0.01 Dropout is applied to the input of the last 2 layers wwith probability of 0.5 L2 regularization is applied to weights of the last 2 layers with penalty factor of 0.001 Model is trained Image from: https://simplelivingover50.com/2015/04/18/fine-tuning-and-making-adjustments-to-my-diet-and-workout-routine/

Why deep Convolutional Neural Networks? (1) Small receptive fields of convolutional kernels (filters) = Better learning and identification of different sound classes (2) Capable of capturing energy modulation patterns across time and frequency of the spectrogram (1) CNN are capable of capturing energy modulation patterns across time and frequency when applied to spectrogram like inputs

Results: CNN with and Without Data Augmentation SB-CNN performs comparably to SKM and PiczakCNN when training on original dataset Mean accuracy: SKM—0.74 PiczakCNN—0.73 SB-CNN—0.73 With data augmentation, SB-CNN significantly outperforms SKM (p=0.0003) SB-CNN—0.79 NOTE; The CNN model cannot outperform the SKM approach is because the original data set is not large/ varied enough P value is measured using two-sided t-test Increasing the capacity of the SKM model (by increasing the size of the k=2000 to k=4000) DID NOT yield any further improvement in classification accuracy

Results: Confusion Matrix classification Off the diagonal, Negative values (Red) = Confusion reduced with augmentation Positive values (Blue) = Confusion increased with augmentation Along Diagonal, Positive Values (Blue)= Overall classification improved for all classes with augmentation Augmentation can have detrimental effect on the confusion between specific pairs of class Idle engine and air condition

Results: Audio Data Augmentation Accuracy Most classes are affected positively by most augmentation types but there are exceptions Air conditioner class is negatively affected by dynamic range compression and background noise. Pitch augmentation Greatest positive impact on performance Only augmentation that did not have a negative impact on any of the classes Half of the classes benefit from applying all augmentation than a subset of augmentation

Future Works and Applications Use validation set to identify which argumentations improve the model’s classification accuracy for class Then selectively augment the training data accordingly Different Heart conditions , such as detecting valve defects, results in murmurs Image from : Applications: Heart Sound Classification Snoring Sound Classification

Reference Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3), 279-283. Schnupp, J., Nelken, I., & King, A. (2011). Auditory neuroscience: Making sense of sound. MIT press. Data Augmentation: https://www.kaggle.com/CVxTz/audio-data-augmentation Data Augmentation: https://github.com/drscotthawley/audio-classifier-keras- cnn/blob/master/augment_data.py Dokur, Z., & Ölmez, T. (2008). Heart sound classification using wavelet transform and incremental self-organizing map. Digital Signal Processing, 18(6), 951-959. Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Freitag, M., Pugachevskiy, S., ... & Schuller, B. (2017, August). Snore sound classification using image-based deep spectrum features. In Proc. of INTERSPEECH (Vol. 17, pp. 2017-434).

Image Reference https://towardsdatascience.com/image-augmentation-for-deep-learning-histogram- equalization-a71387f609b2 https://cycling74.com/tutorials/the-phase-vocoder-%E2%80%93-part-I https://utkarsh15.files.wordpress.com/2015/03/stft.png https://simplelivingover50.com/2015/04/18/fine-tuning-and-making-adjustments- to-my-diet-and-workout-routine/ https://en.wikipedia.org/wiki/Heart_sounds