Article and Work by: Justin Salamon and Juan Pablo Bello

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

A brief review of non-neural-network approaches to deep learning

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

ImageNet Classification with Deep Convolutional Neural Networks

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

Speaker Adaptation for Vowel Classification

Applications of Wavelet Transform and Artificial Neural Network in Digital Signal Detection for Indoor Optical Wireless Communication Sujan Rajbhandari.

ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.

Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.

Image Enhancement Objective: better visualization of remotely sensed images visual interpretation remains to be the most powerful image interpretation.

Lecture 3b: CNN: Advanced Layers

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Lecture 4b Data augmentation for CNN training

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Welcome deep loria !.

Tenacious Deep Learning

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Compact Bilinear Pooling

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Data Mining, Neural Network and Genetic Programming

ARTIFICIAL NEURAL NETWORKS

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

Ajita Rattani and Reza Derakhshani,

ECE 6504 Deep Learning for Perception

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Deep Belief Networks Psychology 209 February 22, 2013.

CS 698 | Current Topics in Data Science

Urban Sound Classification with a Convolution Neural Network

Urban Sound Classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Introduction to Deep Learning for neuronal data analyses

Bird-species Recognition Using Convolutional Neural Network

Computer Vision James Hays

Introduction to Neural Networks

The Open World of Micro-Videos

Deep learning Introduction Classes of Deep Learning Networks

Object Classification through Deconvolutional Neural Networks

Smart Robots, Drones, IoT

network of simple neuron-like computing elements

CSC 578 Neural Networks and Deep Learning

[Figure taken from googleblog

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Neural Networks Geoff Hulten.

On Convolutional Neural Network

Outline Background Motivation Proposed Model Experimental Results

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Analysis of Trained CNN (Receptive Field & Weights of Network)

John H.L. Hansen & Taufiq Al Babba Hasan

cs638/838 - Spring 2017 (Shavlik©), Week 7

RCNN, Fast-RCNN, Faster-RCNN

Coding neural networks: A gentle Introduction to keras

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Convolutional Neural Networks

Mihir Patel and Nikhil Sardana

ImageNet Classification with Deep Convolutional Neural Networks

Advances in Deep Audio and Audio-Visual Processing

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

CSC 578 Neural Networks and Deep Learning

Reuben Feinman Research advised by Brenden Lake

Automatic Handwriting Generation

Introduction to Neural Networks

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Image recognition.

Presentation transcript:

Deep Convolutional Neural Networks and Data augmentation for Environmental sound classification Article and Work by: Justin Salamon and Juan Pablo Bello Presented by : Dhara Rana

Trained Convolutional Neural Network Overall Goal of Paper Create a way to classify environmental sound given an audio clip Other methods of sound classification: (1) dictionary learning and (2) wavelet filter banks Author solution: Deep Convolutional Neural Network with data augmentation Data Augmentation & segmentation: Log-mel spectrogram Trained Convolutional Neural Network Input: Sound Clip Output: Dog Bark

Data Urban Sound 8k Size: 8732 labeled sound clips Duration: ~ 4 seconds 10 Classes: 0 = air_conditioner 1 = car_horn 2 = children_playing 3 = dog_bark 4 = drilling 5 = engine_idling 6 = gun_shot 7 = jackhammer 8 = siren 9 = street_music All excerpts are taken from field recordings uploaded to www.freesound.org. The files are pre-sorted into ten folds (folders named fold1-fold10) to help in the reproduction of and comparison with the automatic classification results reported in the article above.

Data Augmentation Application of one or more deformation to a collection of annotated training samples which results new, additional training data Types of Audio data augmentation: (1) Time stretching (2) Pitch Shifting (3) Dynamic range compression (4) Background Noise Time stretching : Slow down or speed up the audio signal, while keeping pitch unchanged Pitch Shifting Raise or lower the pitch of audio sample Dynamic range Compression Compress the dynamic range of the sample using 4 parameterizations ??? Background noise Mix the sample with another recording containing background sounds from different type of acoustic scenes Cat Image from: https://towardsdatascience.com/image-augmentation-for-deep-learning-histogram-equalization-a71387f609b2

Data processing: Spectrogram Short-time fourier transform Image from: https://cycling74.com/tutorials/the-phase-vocoder-%E2%80%93-part-I https://utkarsh15.files.wordpress.com/2015/03/stft.png Hop Size: 1014 s/s Fames: 128 Freq. Component: 128 Sampling Frequency: 44100 sample/sec Window size: 1024 samples/sec; ~23 ms

Spectrogram: Another Representation

Proposed Deep CNN (aka SB-CNN) Layer 1 Convolutional layer: 24 filters with receptive field (5,5) Pool layer: Max pooling (4,2) Rectified linear unit (ReLU) activation: h(x)=max(x,0) Layer 2 Convolutional layer: 48 filters with receptive field (5,5) Layer 3 Convolutional layer: 48 filters with receptive field (5,5) Rectified linear unit (ReLU) activation: h(x)=max(x,0) Layer 4 Fully Connected Layer: 64 Hidden Layer Layer 5 Fully Connected Layer: 10 Hidden Layer Softmax Activation ~ 0-1 The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1 (check it on the figure above). Constant Learning rate of : 0.01 Dropout is applied to the input of the last 2 layers wwith probability of 0.5 L2 regularization is applied to weights of the last 2 layers with penalty factor of 0.001 Model is trained

Tuning the Deep CNN CNN is implemented in Python using Lasagne Constant Learning rate of 0.01 Dropout is applied to the input of last 2 layers with probability of 0.5 L2 regularization applied to the last 2 layers Model is trained for 50 epochs Constant Learning rate of : 0.01 Dropout is applied to the input of the last 2 layers wwith probability of 0.5 L2 regularization is applied to weights of the last 2 layers with penalty factor of 0.001 Model is trained Image from: https://simplelivingover50.com/2015/04/18/fine-tuning-and-making-adjustments-to-my-diet-and-workout-routine/

Why deep Convolutional Neural Networks? (1) Small receptive fields of convolutional kernels (filters) = Better learning and identification of different sound classes (2) Capable of capturing energy modulation patterns across time and frequency of the spectrogram (1) CNN are capable of capturing energy modulation patterns across time and frequency when applied to spectrogram like inputs

Results: CNN with and Without Data Augmentation SB-CNN performs comparably to SKM and PiczakCNN when training on original dataset Mean accuracy: SKM—0.74 PiczakCNN—0.73 SB-CNN—0.73 With data augmentation, SB-CNN significantly outperforms SKM (p=0.0003) SB-CNN—0.79 NOTE; The CNN model cannot outperform the SKM approach is because the original data set is not large/ varied enough P value is measured using two-sided t-test Increasing the capacity of the SKM model (by increasing the size of the k=2000 to k=4000) DID NOT yield any further improvement in classification accuracy

Results: Confusion Matrix classification Off the diagonal, Negative values (Red) = Confusion reduced with augmentation Positive values (Blue) = Confusion increased with augmentation Along Diagonal, Positive Values (Blue)= Overall classification improved for all classes with augmentation Augmentation can have detrimental effect on the confusion between specific pairs of class Idle engine and air condition

Results: Audio Data Augmentation Accuracy Most classes are affected positively by most augmentation types but there are exceptions Air conditioner class is negatively affected by dynamic range compression and background noise. Pitch augmentation Greatest positive impact on performance Only augmentation that did not have a negative impact on any of the classes Half of the classes benefit from applying all augmentation than a subset of augmentation

Future Works and Applications Use validation set to identify which argumentations improve the model’s classification accuracy for class Then selectively augment the training data accordingly Different Heart conditions , such as detecting valve defects, results in murmurs Image from : Applications: Heart Sound Classification Snoring Sound Classification

Reference Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3), 279-283. Schnupp, J., Nelken, I., & King, A. (2011). Auditory neuroscience: Making sense of sound. MIT press. Data Augmentation: https://www.kaggle.com/CVxTz/audio-data-augmentation Data Augmentation: https://github.com/drscotthawley/audio-classifier-keras- cnn/blob/master/augment_data.py Dokur, Z., & Ölmez, T. (2008). Heart sound classification using wavelet transform and incremental self-organizing map. Digital Signal Processing, 18(6), 951-959. Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Freitag, M., Pugachevskiy, S., ... & Schuller, B. (2017, August). Snore sound classification using image-based deep spectrum features. In Proc. of INTERSPEECH (Vol. 17, pp. 2017-434).

Image Reference https://towardsdatascience.com/image-augmentation-for-deep-learning-histogram- equalization-a71387f609b2 https://cycling74.com/tutorials/the-phase-vocoder-%E2%80%93-part-I https://utkarsh15.files.wordpress.com/2015/03/stft.png https://simplelivingover50.com/2015/04/18/fine-tuning-and-making-adjustments- to-my-diet-and-workout-routine/ https://en.wikipedia.org/wiki/Heart_sounds