Tenacious Deep Learning

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Music Genre Classification Alex Stabile. Example File

Mentor Prof. Amitabha Mukerjee Deepak Pathak Kaustubh Tapi 10346

PREDICTING SONG HOTNESS

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

DeepMIDI: Music Generation

Convectional Neural Networks

Classify A to Z Problem Statement Technical Approach Results Dataset

Big data classification using neural network

References and Related Work

Machine Learning for Big Data

An Artificial Intelligence Approach to Precision Oncology

Computer Science and Engineering, Seoul National University

DeepCount Mark Lenson.

Observations by Dance Move

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Understanding and Predicting Image Memorability at a Large Scale

Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

Intro to NLP and Deep Learning

Natural Language Processing of Knee MRI Reports

Issues in Decision-Tree Learning Avoiding overfitting through pruning

A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李弘

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Brian Whitman Paris Smaragdis MIT Media Lab

Convolutional Networks

CS 698 | Current Topics in Data Science

Urban Sound Classification with a Convolution Neural Network

Article and Work by: Justin Salamon and Juan Pablo Bello

CS6890 Deep Learning Weizhen Cai

Machine Learning: The Connectionist

Using Tensorflow to Detect Objects in an Image

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Bird-species Recognition Using Convolutional Neural Network

Feature Film Features: Applying machine learning to movie genre identification CSCI 5622 Group L: Grant Baker, John Dinkel, Derek Gorthy, Jeffrey Maierhofer,

Musical Style Classification

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

Deep Learning Tutorial

Deep Learning Hierarchical Representations for Image Steganalysis

Dog/Cat Classifier Christina Stiff.

Very Deep Convolutional Networks for Large-Scale Image Recognition

Similarity based on Shape and Appearance

On Convolutional Neural Network

Neural Speech Synthesis with Transformer Network

Vinit Shah, Joseph Picone and Iyad Obeid

Lip movement Synthesis from Text

John H.L. Hansen & Taufiq Al Babba Hasan

CSSE463: Image Recognition Day 18

ImageNet Classification with Deep Convolutional Neural Networks

Advances in Deep Audio and Audio-Visual Processing

Sketch Object Prediction

Natural Language Processing (NLP) Systems Joseph E. Gonzalez

CIS 519 Recitation 11/15/18.

Automatic Handwriting Generation

DRC with Deep Networks Tanmay Lagare, Arpit Jain, Luis Francisco,

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Neural Machine Translation using CNN

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Week 3 Volodymyr Bobyr.

LHC beam mode classification

Food Dish Classification using Convolutional Neural Networks

Tschandl P1,2, Argenziano G3, Razmara M4, Yap J4

Presented By: Firas Gerges (fg92)

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Tenacious Deep Learning Matthew Farkas, Sai Haran, Michael Kaesler, Aditya Venkataraman| MSiA 490-30 Deep Learning | Spring 2017 | Northwestern University Problem Statement Technical Approach Results With the rise in popularity of music streaming services, making users’ lives easier via machine learning has grown in importance. Recommendation has been on the forefront of most companies’ efforts, but automated metadata assignment is just as important for both users and engineers alike. Genre identification via deep learning has been done successfully, but rather than improve another classifier’s accuracy, we sought out to identify how a machine would identify a genre by opening the model. Converted 30 second audio clips to five Mel-spectrograms in order to adapt the problem to an image classification problem Ran a convolutional neural network (CNN) using the VGG framework, with several adjustments Utilized 3 VGG layers and only 1 fully connected layer with 128 nodes Regularized with 0.4 dropout, 0.1 L1 regularization, and batch normalization Used a learning rate of .00001 Rather than the standard 224x224x3 VGG input, we chopped our spectrograms into five pieces each, to push the model to learn features over short spans of time. The resulting spectrogram clips were 200x90x3 in size. Utilized a 3x3 kernel convolving in two dimensions Our model achieved up to 72% test accuracy after only 1500 epochs of training We believe the model made accurate decisions because it focused on certain defining frequencies of the classes, such as heavy bass for hip-hop and electronic Baseline human accuracy was around 65% on a small sample, and the model was able to surpass that. This shows that a machine can pick up features which define a genre, even with short 30 second samples Hip Hop Electronic After training, we were able to run spectrogram clips through our model to produce classifications We stitched together clips and the heatmaps produced in order to get a holistic view of how the model is classifying the spectrograms Dataset Conclusion 60,000 30 second clips of music from hip-hop, country, electronic, and metal artists, obtained from Spotify’s developer API Subsequently converted into Mel-spectrogram, and divided images into 5 pieces each Unrepresentative samples of genre, such as remixes and commentary, removed via cleaning script Genre classification can sometimes be fuzzy even for a human, so picking representative artists and samples was of the utmost importance Dataset was initially too large, had to cut it down to ensure homogenous classes in training Shifted spectrograms up to 50% horizontally so as to improve generalization capability of model These heatmaps visualize which parts of the image the model is looking at during classification When looking at low frequencies and ignoring middle frequencies, the model is 50.6% sure the clip is electronic When looking at all frequency ranges, the model is 23.7% sure the clip is hip- hop Genre classification can be performed by focusing mainly on frequency ranges The model can change its classification of genre based on which part of the song it’s looking at – this is why separating the song into short clips can help create a more robust classifier In the future, we can augment the model by implementing a voting system for each song – the class with the most “votes” per song is the final classification We’d like to experiment with using raw audio rather than spectrograms. This approach has been proven to work in Google Deepmind’s Wavenet paper, which utilizes raw audio as the input for a CNN References and Related Work Hip-hop Country Electronic Metal http://benanne.github.io/2014/08/05/spotify-cnns.html https://chatbotslife.com/finding-the-genre-of-a-song-with-deep- learning-da8f59a61194 https://deepmind.com/blog/wavenet-generative-model-raw-audio/