A First Look at Music Composition using LSTM Recurrent Neural Networks Written by: Douglas Eck, Jurgen Schmidhuber Presented by: Danit Itzkovich, Mor Yemini
Recurrent Neural Network (RNN) Designed to learn sequential or time varying patterns A neural network with feedback (closed loop) connections
Recurrent Neural Network (RNN)
Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM)
Music Composition using LSTM example
compose music with computers Music generation can be formulated as a time-series prediction problem: the note played at each time can be regarded as a prediction given the notes played before. Music is a complex structure that has both short-term and long-term dependency just as a language model makes recurrent neural network (RNN) an ideal structure for this task.
compose music with RNN Note by note approach : Single step predictor. Predicts note at time t+1 using notes at time t. At test time, one input note is needed then the network generates a novel composition.
The problem Vanishing gradient The network is impossible of learning long term dependencies, which are critical in music composition. In the case of music, long-term dependencies are at the heart of what defines a particular style.
Compose music with LSTM The solution Compose music with LSTM
A First Look at Music Composition using LSTM Recurrent Neural Networks By: Douglas Eck, Jurgen Schmidhuber
Training data A range of 12 notes were possible for chords. A range of 13 notes were possible for melodies. A single song length was 96 notes. The chords used did not vary from song to song. Only quarter notes were used. No rests were used.
Training data
Data representation One input/target unit per note, with 1.0 representing on and 0.0 representing off. The Network input represented the whole song.
Experiment 1 – learning Chords The goal: Determine whether LSTM can learn to reproduce a musical chord structure with the absence of melody. Predict the probability for a given note to be on or off.
Experiment 1 – learning Chords Network topology and experimental parameters: Four cell blocks containing two cells each were fully connected to each other and to the input layer. The output layer is fully connected to all cells and to the input layer. Forget gate, input gate and output gate biases for the four blocks were set at -0.5, -1.0, -1.5 and -2.0.
Experiment 1 – learning Chords Network topology and experimental parameters: Learning rate was set at 0.00001. Momentum rate was set at 0.9. Reset weight: error burn existing weights reset the input pattern and clear partial derivatives, activations and cell states. Output activation function was sigmoid.
Experiment 1 – learning Chords Training: This task is similar to multi-label classification problem The network was trained using cross-entropy function. The error function Target value Output activation predicted value Training was stopped after the network successfully predicted the entire chord sequence.
Experiment 1 – learning Chords Testing: The testing phase is also the composition phase, where the predictions could be recorded to form new songs. The network was tested by starting it with the inputs from the first time step and then using network predictions for ensuing time steps. Chord notes were predicted using a decision threshold of 0.5.
Experiment 1 – learning Chords Results: LSTM easily handled the task under a wide range of learning rates and momentum rates.
Experiment 2 – Learning Melody and Chords The goal: Determine whether LSTM could learn chord structure and melody structure and then use that structure to advantage when composing new songs. Learn melody and chords simultaneously. Predict the probability for a given note to be on or off.
Experiment 2 – Learning Melody and Chords Network topology and experimental parameters: Some cell blocks processed chord information while other cell blocks processed melody information. Eight cell blocks containing two cells each are used: Four of the cell blocks are fully connected to the input units for chords. Four cell blocks are fully connected to the input units for melody. The chord cell blocks have recurrent connections to themselves and to the melody cell blocks. Melody cell blocks are only recurrently connected to melody cell blocks.
Experiment 2 – Learning Melody and Chords Network topology and experimental parameters: Output units for chords are fully connected to cell blocks for chords and to input units for chords. Output units for melody are fully connected to cell blocks for melody and to input units for melody. Forget gate, input gate and output gate biases for the four blocks were set at -0.5, -1.0, -1.5 and -2.0. Learning rate was set at 0.00001. Momentum rate was set at 0.9.
Experiment 2 – Learning Melody and Chords Training: For chords – the same training algorithm as Experiment 1 was used. For melodies – melody output activations summed to 1.0 to choose a single note at any given time step. The network was trained until it had learned the chord structure and until objective error had reached a plateau.
Experiment 2 – Learning Melody and Chords Testing: The network was allowed to freely compose music. Music was composed by providing a single note or series of notes (up to 24) from the training set as input.
Experiment 2 – Learning Melody and Chords Results: In all cases the network succeeded in reproducing chord structure while in parallel improvising new melodies. The network compositions were better sounding than a random walk across the pentatonic scale
conclusions LSTM can capture both the local structure of melody and the long-term structure of a of a musical style The first experiment verified that LSTM was not relying on regularities in the melody to learn the chord structure. The second experiment explored the ability for LSTM to generate new instances of a musical form.
discussion There was no variety in the underlying chord structure. This experiment is suitable for jazz music, in which chord changes are almost always provided separately from melodies. LSTM provided the best results compared to other RNN techniques.
Questions?