Tenacious Deep Learning

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Music Genre Classification Alex Stabile. Example File
Mentor Prof. Amitabha Mukerjee Deepak Pathak Kaustubh Tapi 10346
PREDICTING SONG HOTNESS
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
DeepMIDI: Music Generation
Convectional Neural Networks
Classify A to Z Problem Statement Technical Approach Results Dataset
Big data classification using neural network
References and Related Work
Machine Learning for Big Data
An Artificial Intelligence Approach to Precision Oncology
Computer Science and Engineering, Seoul National University
DeepCount Mark Lenson.
Observations by Dance Move
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Understanding and Predicting Image Memorability at a Large Scale
Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
Intro to NLP and Deep Learning
Natural Language Processing of Knee MRI Reports
Issues in Decision-Tree Learning Avoiding overfitting through pruning
A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李 弘
Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis
Brian Whitman Paris Smaragdis MIT Media Lab
Convolutional Networks
CS 698 | Current Topics in Data Science
Urban Sound Classification with a Convolution Neural Network
Article and Work by: Justin Salamon and Juan Pablo Bello
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
Using Tensorflow to Detect Objects in an Image
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Bird-species Recognition Using Convolutional Neural Network
Feature Film Features: Applying machine learning to movie genre identification  CSCI 5622 Group L: Grant Baker, John Dinkel, Derek Gorthy, Jeffrey Maierhofer,
Musical Style Classification
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
Deep Learning Tutorial
Deep Learning Hierarchical Representations for Image Steganalysis
Dog/Cat Classifier Christina Stiff.
Very Deep Convolutional Networks for Large-Scale Image Recognition
Similarity based on Shape and Appearance
On Convolutional Neural Network
Neural Speech Synthesis with Transformer Network
Vinit Shah, Joseph Picone and Iyad Obeid
Lip movement Synthesis from Text
John H.L. Hansen & Taufiq Al Babba Hasan
CSSE463: Image Recognition Day 18
ImageNet Classification with Deep Convolutional Neural Networks
Advances in Deep Audio and Audio-Visual Processing
Sketch Object Prediction
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
CIS 519 Recitation 11/15/18.
Automatic Handwriting Generation
DRC with Deep Networks Tanmay Lagare, Arpit Jain, Luis Francisco,
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Neural Machine Translation using CNN
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Week 3 Volodymyr Bobyr.
LHC beam mode classification
Food Dish Classification using Convolutional Neural Networks
Tschandl P1,2, Argenziano G3, Razmara M4, Yap J4
Presented By: Firas Gerges (fg92)
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Tenacious Deep Learning Matthew Farkas, Sai Haran, Michael Kaesler, Aditya Venkataraman| MSiA 490-30 Deep Learning | Spring 2017 | Northwestern University Problem Statement Technical Approach Results With the rise in popularity of music streaming services, making users’ lives easier via machine learning has grown in importance. Recommendation has been on the forefront of most companies’ efforts, but automated metadata assignment is just as important for both users and engineers alike. Genre identification via deep learning has been done successfully, but rather than improve another classifier’s accuracy, we sought out to identify how a machine would identify a genre by opening the model. Converted 30 second audio clips to five Mel-spectrograms in order to adapt the problem to an image classification problem Ran a convolutional neural network (CNN) using the VGG framework, with several adjustments Utilized 3 VGG layers and only 1 fully connected layer with 128 nodes Regularized with 0.4 dropout, 0.1 L1 regularization, and batch normalization Used a learning rate of .00001 Rather than the standard 224x224x3 VGG input, we chopped our spectrograms into five pieces each, to push the model to learn features over short spans of time. The resulting spectrogram clips were 200x90x3 in size. Utilized a 3x3 kernel convolving in two dimensions Our model achieved up to 72% test accuracy after only 1500 epochs of training We believe the model made accurate decisions because it focused on certain defining frequencies of the classes, such as heavy bass for hip-hop and electronic Baseline human accuracy was around 65% on a small sample, and the model was able to surpass that. This shows that a machine can pick up features which define a genre, even with short 30 second samples Hip Hop Electronic After training, we were able to run spectrogram clips through our model to produce classifications We stitched together clips and the heatmaps produced in order to get a holistic view of how the model is classifying the spectrograms Dataset Conclusion 60,000 30 second clips of music from hip-hop, country, electronic, and metal artists, obtained from Spotify’s developer API Subsequently converted into Mel-spectrogram, and divided images into 5 pieces each Unrepresentative samples of genre, such as remixes and commentary, removed via cleaning script Genre classification can sometimes be fuzzy even for a human, so picking representative artists and samples was of the utmost importance Dataset was initially too large, had to cut it down to ensure homogenous classes in training Shifted spectrograms up to 50% horizontally so as to improve generalization capability of model These heatmaps visualize which parts of the image the model is looking at during classification When looking at low frequencies and ignoring middle frequencies, the model is 50.6% sure the clip is electronic When looking at all frequency ranges, the model is 23.7% sure the clip is hip- hop Genre classification can be performed by focusing mainly on frequency ranges The model can change its classification of genre based on which part of the song it’s looking at – this is why separating the song into short clips can help create a more robust classifier In the future, we can augment the model by implementing a voting system for each song – the class with the most “votes” per song is the final classification We’d like to experiment with using raw audio rather than spectrograms. This approach has been proven to work in Google Deepmind’s Wavenet paper, which utilizes raw audio as the input for a CNN References and Related Work Hip-hop Country Electronic Metal http://benanne.github.io/2014/08/05/spotify-cnns.html https://chatbotslife.com/finding-the-genre-of-a-song-with-deep- learning-da8f59a61194 https://deepmind.com/blog/wavenet-generative-model-raw-audio/