Download presentation
Presentation is loading. Please wait.
1
DeepMIDI: Music Generation
Arindam Bhattacharya, Jonathan Burge, Bryce Codell | MSiA Deep Learning | Spring 2017 | Northwestern University Problem Statement Technical Approach Results Our most successful attempt at generating a jazzy bass line resulted from seeding with “Autumn Leaves,” a classic 1940’s piece that strongly exhibits typical characteristics of jazz harmony and music. We have visualized how strongly our generated music exhibits these characteristics below: There is constant demand for new musical content for a multitude of uses, ranging from artistic expression, to jingles for new TV shows, to elevator music. Objective: to generate original music content using deep learning. Audio files were converted from MIDI to text and fed into a text generation model, the output of which was converted back to MIDI. The primary metric of success was whether or not the music demonstrated the typical characteristics of jazz music. Pre- Process Fit Model 2 Post- Convert midi files to csv Compute derived features Subset features Reformat data into midi-csv structure Provide proper midi metadata Convert from csv back to midi format Vectorize text using “musical” vocabulary Train models LSTM, GRU Provide new ‘seeds’ Save generated data to csv 1 3 Most significant challenge: deriving the appropriate features and vocabulary necessary to generate meaningful output Derived features: note duration and note delay (start time of current note - start time of previous note) Alleviated several otherwise difficult-to-address constraints, e.g. generated start times needing to be in strictly ascending order Vocabulary: instead of generating all four note components with the standard text generation vocabulary (e.g. ‘1’, ‘2’, etc.), we made four distinct vocabularies, each corresponding the to unique numeric values of the four note components We fit four models in parallel, each corresponding to the four components of the notes being generated Experimented with a variety of diversity, dropout, and memory/window sizes to avoid common traps (e.g. getting stuck in a loop) Legendary jazz bassist Charles Mingus Dataset Our dataset was obtained from freemidi.org We converted these MIDI files to csv format prior to model fitting Below are samples of what our data looks like prior and after pre- processing Data cleaning involved several steps: Removing files of text that did not directly pertain to the notes being played Standardization of note format Subsetting specific channels from individual MIDI files We faced several data processing challenges: identifying a method for batch conversion of MIDI files into csv format, understanding the details of how MIDI files are converted into audio, generalizing our data processing to account for the significant variance in syntax and structure among MIDI files, and enforcing proper output format after converting model output back to MIDI format. Our training set consisted of bass lines from 8 jazz classics, all converted to MIDI by Mel Webb. Conclusion Overall, DeepMIDI showed promising potential for music generation. It demonstrated the ability to identify and integrate key features of jazz bass into its generated music (particularly those which are rhythmic in nature) However, it struggled a bit more with tonality, perhaps due to the small training data set and/or the inclusion of multiple key signatures Moderate diversity ( ) produced the best sounding results One-hot encoding numeric values (effectively transforming them into categorical variables) has several major advantages, but also imposes limitations on the range of output prediction possibilities Additional limitations are inherent to the parallelized computation undertaken here - models ignore relationships between the four note components (e.g. shorter delays tend to be associated with shorter durations) Model Training Music Generation References and Related Work Freemidi.org - Free Midi Music Songs Download. Retrieved from Keras/Theano Jazz Generation - LSTM Text Generation - master/examples/lstm_text_generation.py MidiCSV - Convert midi file to and from csv. Retrieved from Our model consisted of 4 LSTM neural networks which each handled a single feature that together compose a note. The “note” and “velocity” models had fewer hidden nodes because their vocabularies were significantly smaller than those of “duration” and “delay.”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.