Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton.

Similar presentations


Presentation on theme: "Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton."— Presentation transcript:

1 Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab

2 2 Introductions

3 3 Outline ~40 min: Machine learning fundamentals ~1 hour: Wekinator: Intro & hands-on ~1 hour: Eyesweb: Intro & hands-on Wrap-up

4 4 Models in gesture recognition & mapping sensed action interpretation response (music, visuals, etc.) computer human + sensors sound, visuals, etc. model What is the current state (e.g., pose)? Was a control motion performed? If so, which How? What sound should result from this state, motion, motion quality, etc.? What is the current state (e.g., pose)? Was a control motion performed? If so, which How? What sound should result from this state, motion, motion quality, etc.?

5 5 algorithm training data Training Supervised learning 5 model inputs outputs

6 6 algorithm training data Training Supervised learning 6 model inputs outputs Running “Gesture 1”“Gesture 2”“Gesture 3” “Gesture 1”

7 7 Why use supervised learning? Models capture complex relationships from the data. (feasible) Models can generalize to new inputs. (accurate) Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient)

8 8 Data, features, algorithms, and models: the basics

9 9 Features Each data point is represented as a feature vector Example #Red(pixel1)Green(pixel1 ) Blue(pixel1)…Label 18412034…Gesture 1 2432585…Gesture 1 3121284…Gesture 2

10 10 Features Good features can make a problem easier to learn! Example #X(r_hand)Y(r_hand)Depth(r_handLabel 10.10.50.6Gesture 1 20.20.40.1Gesture 1 30.9 0.1Gesture 2

11 11 Classification This model: a separating line or hyperplane (decision boundary) feature1 feature2

12 12 Regression This model: a real-valued function of the input features feature output

13 13 Unsupervised learning Training set includes examples, but no labels Example: Infer clusters from data: feature1 feature2

14 14 Temporal modeling Examples and inputs are sequential data points in time Model used for following, identification, recognition Image: Bevilacqua et al., NIME 2007

15 15 Temporal modeling Image: Bevilacqua et al., NIME 2007

16 16 How supervised learning algorithms work (the basics)

17 17 The learning problem Goal: Build the best** model given the training data –Definition of “best” depends on context, assumptions…

18 18 Which classifier is best? Image from Andrew Ng Competing goals: Accurately model training data **Accurately classify unseen data points** “Overfit”“Underfit”

19 19 A simple classifier: nearest neighbor ? feature1 feature2

20 20 Another simple classifier: Decision tree Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html, http://nghiaho.com/?p=1300http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html

21 21 AdaBoost: Iteratively train a “weak” learner Image from http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005 /FinalProject_KH.htm

22 22 Support vector machine Re-map input space into a higher number of dimensions and find a separating hyperplane

23 23 Choosing a classifier: Practical considerations k-Nearest Neighbor + Can tune k to adjust smoothness of decision boundaries - Sensitive to noisy, redundant, irrelevant features; prone to overfitting; weird in high dimensions Decision tree: + Can prune to reduce overfitting, produces human-understandable model - Can still overfit AdaBoost + Theoretical benefits, less prone to overfitting + Can tune by changing base learner, number of training rounds Support Vector Machine + Theoretical benefits similar to AdaBoost –Many parameters to tune, training can take a long time

24 24 How to evaluate which classifier is better? Compute a quality metric –Metrics on training set (e.g, accuracy, RMS error) –Metrics on test set –Cross-validation Use it Image from http://blog.weisu.org/2011/05/cross-validation.html

25 25 Neural Networks TODO: Use nick’s slides

26 26 Which learning method should you use? Classification (e.g., kNN, AdaBoost, SVM, decision tree): –Apply 1 of N labels to a static pose or state –Label a dynamic gesture, when segmentation & normalization are trivial E.g., feature vector is a fixed-length window in time Regression (e.g., with neural networks): –Produce a real-valued output (or vector of real-valued outputs) for each feature vector Dynamic time warping, HMMs, other temporal models –Identify when a gesture has occurred, identify probable location within a gesture, possibly also apply a label –Necessary when segmentation is non-trivial or online following is needed

27 27 Suggested ML reading Bishop, 2006: Pattern Recognition & Machine Learning. Science and Business Media, Springer Duda, 2001: Pattern Classification, Wiley- Interscience Witten, 2005: Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann

28 28 Suggested NIME-y reading Lee, Freed, & Wessel, 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems, 1706:244–55. (early example of ML in music) Hunt, A. and Wanderley, M. M. 2002. Mapping performer parameters to synthesis engines. Organised Sound 7, 2, 97–108. (learning as a tool for generative mapping creation) Chapter 2 of Rebecca’s dissertation: http://www.cs.princeton.edu/~fiebrink/thesis/ (historical/topic overview) http://www.cs.princeton.edu/~fiebrink/thesis/ Recent publications by F. Bevilacqua & team @ IRCAM (HMMs, gesture follower) TODO: Nick, anything else?

29 29 Hands-on with Wekinator

30 30 model(s).01,.59,.03,... 5,.01, 22.7, … time Feature extractor(s) Parameterizable process Inputs: from built-in feature extractors or OSC. Outputs: control ChucK patch or go elsewhere using OSC. The Wekinator: Running in real time OSC

31 31 Brief intro to OSC Messages sent to host (e.g., localhost) and port (e.g., 6448) –Listener must listen on the same port Message contains message string (e.g., “/myOscMessage”) and optionally some data –Data can be int, float, string types –Listener code may listen for specific message strings & data formats

32 32 3.3098 Class24 Wekinator: Under the hood Model1Model2ModelM Feature1 Feature2 Feature3 FeatureN … Parameter1 Parameter2 ParameterM … … joystick_xjoystick_y pitch volume webcam_1

33 33 3.3098 Class24 Under the hood Model1Model2ModelM Feature1 Feature2 Feature3 FeatureN … Parameter1 Parameter2 ParameterM … … Learning algorithms: Classification: AdaBoost.M1 J48 Decision Tree Support vector machine K-nearest neighbor Regression: Multilayer perceptron NNs

34 34 Interactive ML with Wekinator algorithm training data Training model inputs outputs Running “Gesture 1”“Gesture 2”“Gesture 3” “Gesture 1”

35 35 Interactive ML with Wekinator algorithm training data Training model inputs outputs Running “Gesture 1” “Gesture 2” creating training data

36 36 Interactive ML with Wekinator algorithm training data Training inputs outputs Running “Gesture 1”“Gesture 2” model “Gesture 1” creating training data… evaluating the trained model

37 37 Interactive ML with Wekinator algorithm training data Training model inputs outputs Running “Gesture 1” “Gesture 2”“Gesture 3” creating training data evaluating the trained model… modifying training data (and repeating) interactive machine learning

38 38 Time to play Discrete classifier Continuous neural net mapping Free-for-all


Download ppt "Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton."

Similar presentations


Ads by Google