IWPR18: LSTM music classification, WR58

Slides:



Advertisements
Similar presentations
KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Advertisements

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Department of Computer Science University of California, San Diego
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
A PRESENTATION BY SHAMALEE DESHPANDE
Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado
Representing Acoustic Information
Understanding 20th and 21st Century
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Health Diagnosis through Voice Analysis Sahil Loomba & Shamiek Mangipudi, Department of Electronics and Electrical Engineering, IIT Guwahati Deepest appreciation.
MACHINE LEARNING TECHNIQUES FOR MUSIC PREDICTION S. Grant Lowe Advisor: Prof. Nick Webb.
Music Genres by Shaune Royer
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
 …reflects and determines the people’s way of life.  … does a lot to change the outlook of the people, to make peace, to bring some positive social.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Musical Genres and Styles. Exercise One You are in charge of a CD department in a music store. You must decide whether the following selections go in--
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Music Classification Using Neural Networks Craig Dennis ECE 539.
Performance Comparison of Speaker and Emotion Recognition
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
V-Cert Music Technology Producing Dance Music UNIT 6 – Stage 1 NAME: Gemma Mitchell.
By: Olivia Miller. The product is promoted to create awareness. (traditional rhythm and blues music) Competitors are attracted into the market with very.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Korean Phoneme Discrimination
Olivier Siohan David Rybach
Late 20th/21st Centaury Music
Ch. 3: Feature extraction from audio signals
CS 445/656 Computer & New Media
Tenacious Deep Learning
Ch. 2 : Preprocessing of audio signals in time and frequency domain
The Relationship between Deep Learning and Brain Function
Music Technology Part 1 : Download a Specification
Ch. 5: Speech Recognition
Speaker Classification through Deep Learning
ARTIFICIAL NEURAL NETWORKS
Recurrent Neural Networks for Natural Language Processing
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
MUSIC.
Brian Whitman Paris Smaragdis MIT Media Lab
Urban Sound Classification with a Convolution Neural Network
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
PROJECT PROPOSAL Shamalee Deshpande.
SOUND.
Musical Style Classification
A First Look at Music Composition using LSTM Recurrent Neural Networks
Dog/Cat Classifier Christina Stiff.
Audio and Speech Computers & New Media.
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
John H.L. Hansen & Taufiq Al Babba Hasan
Advances in Deep Audio and Audio-Visual Processing
Automatic Handwriting Generation
Measuring the Similarity of Rhythmic Patterns
Semantic Segmentation
Recurrent Neural Networks
LHC beam mode classification
Presentation transcript:

IWPR18: LSTM music classification, WR58 Music genre classification using a hierarchical long short term memory (LSTM) model For the paper: Chun Pui Tang, Ka Long Chui, Ying Kin Yu, Zhiliang Zeng, Kin Hong Wong, "Music Genre classification using a hierarchical Long Short Term Memory (LSTM) model", International Workshop on Pattern Recognition IWPR 2018 ,University of Jinan, Jinan, China, May 26-28, 2018. IWPR18: LSTM music classification, WR58

IWPR18: LSTM music classification, WR58 Introduction IWPR18: LSTM music classification, WR58

What we have done Built a Neural Network Model to classify 10 different music genres The project can divided into 3 parts Data collection Preprocessing Building and training the neural network

Motivation Why we need music genre classification A huge amount of music on the Internet People are more likely to browse or search music by genre Why choosing this topic Machine learning is the hotest topic recently Less studied than others types of media

Deep learning Projects Related to Music Increase number of projects related to music in recent years IWPR18: LSTM music classification, WR58 Figure 4 - Number of articles per year (DL4M) (Yann Bayle, 2017)

IWPR18: LSTM music classification, WR58 Data set IWPR18: LSTM music classification, WR58

Dataset GTZAN Genre Collection (by George Tzanetakis) Typically representative of the styles of that genre Duration Number of Genre Number of soundtrack Format GTZAN data set 30 Seconds 10 100/genre 22050Hz Mono 16bit audio file in wave format Genre: Classic, Hip-hop, Jazz, Metal, Pop, Reggae, blues, Rock, Country Folk, Disco

Music sample Hiphop hiphop_001.au Jazz .au Metal Blues Pop Country Folk Reggae Disco Rock Classic Add the music sample later click and then will play the corresponding music

Grouping Present Training 75/genre Testing 25/genre Because of small size of dataset(1000) We only define the dataset into training set and testing set (without validation set) Larger training set -> Provide better accuracy Larger testing set -> Result have higher reliability

IWPR18: LSTM music classification, WR58 Preprocessing IWPR18: LSTM music classification, WR58

Why we need to do preprocessing Example to explain to importance of preprocessing Result: also apply to music

Mel-Frequency Cepstral Coefficient(MFCC) Represent sound by approximate human auditory system MFCC (Mel-frequency cepstrum) popular for Speech recognition Also used in Music genre classification and Similarity measurement

Mel filter bands (found by psychological and instrumentation experiments) Freq. lower than 1 KHz has narrower bands (and in linear scale) Higher frequencies have larger bands (and in log scale) More filter below 1KHz Less filters above 1KHz Filter output Feature extraction, v.7a http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

Cepstrum C(n)=IDFT[log10 |S(w)|]= IDFT[ log10{|E(w)|} + log10{|H(w)|} ] In C(n), you can see E(n) and H(n) at two different positions Application: useful for (i) glottal excitation removal (ii) vocal tract filter analysis windowing DFT Log|x(w)| IDFT X(n) X(w) N=time index w=frequency I-DFT=Inverse-discrete Fourier transform S(n) C(n) Feature extraction, v.7a

To compute Mel-Scale Cepstral Coef. MFCC Block Diagram of MFCC alogorithm (Michael Lutter,2014)

IWPR18: LSTM music classification, WR58 Extracted Feature Features Typical Number of MFCCs 13 IWPR18: LSTM music classification, WR58

IWPR18: LSTM music classification, WR58 Neural network model A brief introduction IWPR18: LSTM music classification, WR58

Common types of Neural Network CNN (convolution neural net) Recurrent neural network A form of RNN : LSTM (Long short-term memory) CNN, RNN and the model we are using IWPR18: LSTM music classification, WR58

Long Short-Term Memory(LSTM) Network http://colah.github.io/posts/2015-08-Understanding-LSTMs/

IWPR18: LSTM music classification, WR58 Forget Gate Update Gate IWPR18: LSTM music classification, WR58

IWPR18: LSTM music classification, WR58 Do the forget and update Action Filter the Cell state, Deceide what to output IWPR18: LSTM music classification, WR58

Our LSTM model 2 LSTM layer with density of 128 and 32 Output : 10 genres(one- hot) I will draw a better picture for the model later

IWPR18: LSTM music classification, WR58 Long short term memory IWPR18: LSTM music classification, WR58

IWPR18: LSTM music classification, WR58 10 different genres IWPR18: LSTM music classification, WR58

Classification of music genres in GTZAN dataset IWPR18: LSTM music classification, WR58

7 LSTM(Divide and Conquer) Input Output To LSTM1 750(All in training set) strong or mild 2a,2b LSTM2a 375(Strong) strong 1 or 2 3a,3b LSTM2b 375(Mild) mild 1 or 2 3c,3d

Result in Early stage The result in LSTM3a indicates there is some problem we have missed

Arrange the dataset Before After Testing 0-29 4,8,12,16…(Multiple of 4) Training 30-99 The rest The reason that LSTM3a performs way worse than others The consecutive soundtracks are often perform by the same singer Original method to divide the dataset might have biased

Discussion: Result after rearrangement Increase accuracy in most of the genres Great improve can be seen in LSTM3a and LSTM3d

Result of Approach 1 Accuracy: 50%

IWPR18: LSTM music classification, WR58 Discussion IWPR18: LSTM music classification, WR58

Reflection on 7 LSTM The reason that affect 7 LSTM accuracy: The division of the group Strong and Mild is rough for instance disco is assigned to the Mild group, but it included song with strong and mild Too many layer more layer imply accuracy will further decrease, if the accuracy of each layer is not high enough Classify among similar songs

Bigger data set Million song dataset(25 genres): Big Band, Blues Contemporary, Country Traditional, Dance, Electronica, Experimental, Folk International, Gospel, Grunge Emo, Hip Hop Rap, Jazz Classic, Metal Alternative, Metal Death, Metal Heavy, Pop Contemporary, Pop Indie, Pop Latin, Punk, Reggae, RnB Soul, Rock Alternative, Rock College, Rock Contemporary, Rock Hard, Rock Neo Psychedelia 50000 labelled examples (2000 per genre) are provided for training, with a further 10000 unlabelled examples (400 per genre) used for testing The highest accuracy achieved was 35.34% (2017 Kaggle)

Network Structure Possibility of using other network structure CNN-LSTM Auto-encoder 2. Adding other new features Chroma feature, Spectral Central

Chroma features Represent a music audio into 12 distinct semitones of the musical octave

Spectral Contrast Provide better performance than MFCC in music genre classification the spectral contrast is calculated as the difference between the maximum and minimum magnitudes in the spectrum

Conclusion Developed a 10 music Genres classification system. Accuracy is 50% which is comparable to the state-of-that resulsts.

IWPR18: LSTM music classification, WR58 The End Q & A IWPR18: LSTM music classification, WR58