Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007
1 Agenda Introduction Objectives Review & Innovation Work Dynamic Time Warping Hidden Markov Models Interface Conclusion
2 Audio to score alignment? Associate Notes in a score Timing points in a recording Example
3 Project objectives Implement a monophonic audio to score alignment algorithm Evaluate characteristics of the performance Design a learning interface to help music students improve their performance
4 Review (1) Previous work Algorithms already exist Similar to Spoken Language Processing Application: musicology Professional recordings
5 Review (2) Previous work (continued) Dynamic Time Warping Few parameters Heavy Low flexibility Hidden Markov Models Very flexible Large number of parameters (training)
6 Review (3) Innovation Apply to educational software Requires modifications & new functionalities Cope with errors Detect errors
7 Work Dynamic Time Warping Hidden Markov Models ITS & Interface design
8 DTW (1) Overview Get a first version to work Attack, Sustain, Silence Uses Dynamic Time Warping
9 DTW (2) Structure Feature extraction Distance matrix Find optimal path
10 DTW (3) Instrument model Silence Energy Attack Energy Sustain Guitar Vibes
11 DTW (4) Results ~95% notes aligned on “good” performances Rhythm errors Very high tolerance Provided pitches are correct Pitch errors Tuning errors: no problem Note errors: OK Good results, but limitations
12 DTW (5) Limitations Impossible to recover from severe student mistakes Self-correction not perfect
13 HMM (1) Why? Expected Lower computing requirements Flexibility to recover from student’s errors And also Use state-of-the-art techniques Find connections with SLP
14 HMM (2) Application to ASA HMM Observed symbols State trellis Emission matrix Decoded sequence ASA Recording frames Score representation Instrument model Performance image
15 HMM (3) Flexibility Note 6 D 6, P 6 1-p 12 1 Note 1 D 1, P 1 Note 2 D 2, P 2 Note 3 D 3, P 3 Note 4 D 4, P 4 Note 5 D 5, P 5 p 12 p Note 1 D 1, P 1 Note 2 D 2, P 2 Note 3 D 3, P 3 Note 4 D 4, P 4 Note 5 D 5, P 5 p 23 11p 12 Note 7 D’ 3, P 3 Note 8 D’ 4, P 4 1-p Note 6 D 2, P’ 2 1-p 12 1-p 63 p 63 1-p 23
16 HMM (4) Results 100% on rhythmic recordings Good on melodic recordings Rhythm errors Good tolerance, though inferior to DTW Pitch errors No data Severe mistakes Fine when anticipated Self correction More robust than DTW Tempo estimation not critical
17 HMM (5) Extensions Pitch Other note topologies Improve speed Local algorithm Language Waiting state
18 ITS & Interface (1) Intelligent Tutoring Systems Knowledge models Domain model Learner model Open Learner Model DM LM Teaching strategies DM LM Teaching strategies OverlayPerturbation
19 ITS & Interface (2)
20 Conclusion DTW not suitable for education Promising HMM results Works without pitch Additional paths for anticipated errors Still room for improvements Pitch Computation efficiency Coherent ground together with IF design
21 Thank you for listening Any questions?