Harvestworks Part 3 : Audio analysis & machine learning Rebecca Fiebrink Princeton University 1
Real-time audio analysis Goal: Analyze audio within same sample- synchronous framework as synthesis & interaction.
The Unit Analyzer center freq, radius Impulse generator BiQuad Filter DAC Send impulse FFT Spectral feature extractors IFFT … … Time-domain feature extractors UAna New: Unit Analyzer UGen Old: Unit Generator
The Unit Analyzer 4 Like a unit generator –Blackbox for computation –Plug into a directed graph/network/patch Unlike a unit generator –Input is samples, data, and/or metadata –Output is samples, data, and/or metadata –Not tied to sample rate; computed on-demand
=>
=^
=> =^ chuck upchuck See upchuck_operator.ck, upchuck_function.ck, continuous_feature_extraction.ck
The UAnaBlob Upchucked by UAna Generic representation for metadata. – Real and complex arrays – Spectra, feature values, or user-defined – Timestamped One associated with each UAna
FFT/IFFT Takes care of: – Buffering input / overlap-adding output – Maintaining window and FFT sizes – Mediating audio rate and analysis “rate” FFT outputs complex spectrum as well as magnitude spectrum – Low-level: access/modify contents manually – High-level: connect FFT to spectral processing UAnae See ifft.ck, ifft_transformation.ck
Example: Cross-synthesis Apply the spectral envelope of one sound to another sound – Ex: xsynth_robot123.ck, xsynth_guitar123.ck – Voice spectrum taken from: 10
Machine learning for live performance Problem: How do we use audio and gestural features? – there is a semantic gap between the raw data that computers use and the musical, cultural, aesthetic meanings that humans perceive and assign.
One solution: A lot of code What algorithm would you design to tell a computer whether a picture contains a human face? 12
The problem If your algorithm doesn’t work, how can you fix it? You can’t easily reuse it to do a similar task (e.g., recognizing monkey faces that are not human) There’s no “theory” for how to write a good algorithm It’s a lot of work! 13
Another solution: Machine learning (Classification) Classification is a data-driven approach for applying labels to data. Once a classifier has been trained on a training set that includes the true labels, it will predict labels for new data it hasn’t seen before. 14
Classifier Data Set: A feature vector and class for every data point Train the classifier on a labeled dataset
Run the trained classifier on new data Classifier NO!
Candidates for classification Which gesture did the performer just make with the iCube? Which instruments are playing right now? Who is singing? What language are they singing? Is this chord major or minor? Is this dancer moving quickly or slowly? Is this music happy or sad? Is anyone standing near the camera? 18
An example algorithm: kNN The features of an example are treated as its coordinates in n-dimensional space To classify an new example, the algorithm looks for its k (maybe 10) nearest neighbors in that space, and chooses the most popular class. 19
kNN space: Basketball or Sumo? 20 Feature 1: WeightFeature 2: Height
kNN space: Basketball or Sumo? 21 Feature 1: WeightFeature 2: Height ?
kNN space: Basketball or Sumo? 22 Feature 1: WeightFeature 2: Height ? K=3
kNN space: Basketball or Sumo? 23 Feature 1: WeightFeature 2: Height S S
SMIRK (small music information retrieval toolkit) For real-time application of machine learning – Learning in ChucK – E.g., kNN gesture classification, musical audio genre/artist classification 24
Interaction & on-the-fly learning Can we make process of training a classifier interactive? Performative? 25
Another technique: Neural networks Very early method Inspired by the brain Results in highly non- linear functions from input to output 26
Combining Techniques with Wekinator 27 ChucK: Pass features to Java, receive results back and use them to make sound Java: Train a neural network to map features to sounds OSC Example: Wekinator See performance video at
Review Machine learning can be used to: – Apply meaningful labels (classification) – Learn (& re-learn) functions from inputs to outputs (e.g., neural networks) Appropriate for camera, audio, sensors, and many other types of data Live, interactive performance is a very interesting application area 28
Wrap-up Thanks for coming, thanks to Harvestworks! See resources on handout; workshop webpage with slides & code Please fill out evaluation forms! 29