Modeling Expressive Performances of the Singing Voice Maria-Cristina Marinescu (Universidad Carlos III de Madrid) Rafael Ramirez (Universitat Pompeu Fabra)

Slides:



Advertisements
Similar presentations
N(T) Music Syllabus Implementation Workshop. Objectives of Workshop Interpret the GCE N(T) Music syllabus Plan the N(T) Music course Teach the N(T) Music.
Advertisements

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
The distance between two bar lines is A measure. The distance between two pitches is Interval.
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
ALL MUSIC HAS VALUE TO SOMEBODY. What is Music? The Organization of Sound in Time.
Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno*
Pitch-spelling algorithms David Meredith Aalborg University.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
JSymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada.
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
1. Creating and Presenting Create works of art to communicate meaning Use the arts to represent feelings, ideas in literature, etc. Actively engage in.
1 Music Classification Using Significant Repeating Patterns Chang-Rong Lin, Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen, Proc. of 9th International Conference,
HANA HARRISON CSE 435 NOVEMBER 19, 2012 Music Composition.
Paper by Craig Stuart Sapp 2007 & 2008 Presented by Salehe Erfanian Ebadi QMUL ELE021/ELED021/ELEM021 5 March 2012.
by B. Zadrozny and C. Elkan
JSymbolic Cedar Wingate MUMT 621 Professor Ichiro Fujinaga 22 October 2009.
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Chapter 3 Experiencing Music. Listening to Music One of most pleasurable aural experiences is music Levels of Listening Different levels of attentiveness.
A year 1 musicianA year 2 musicianA year 3 musician I can use my voice to speak, sing and chant. I can use instruments to perform. I can clap short rhythmic.
Automatic music classification and the importance of instrument identification Cory McKay and Ichiro Fujinaga Music Technology Area Faculty of Music McGill.
Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.
Creating Music Text, Rhythm, and Pitch Combined to Compose a Song.
AURAL SKILLS ASSESSMENT TASK 2 Question 2 THE CONCEPTS OF MUSIC General Knowledge.
HELUS Middle School Elective Mr. Nosik
Spectral centroid PianoFlute Piano Flute decayed not decayed F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions.
Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.
Essential Music Vocabulary Review. Directions: Identify the music vocabulary word that matches the given definition. Students will be called at random.
Predicting Voice Elicited Emotions
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Opera Basics Created by: Laura Dedic Madison Central High School.
Issues in Automatic Musical Genre Classification Cory McKay.
For use with WJEC Performing Arts GCSE Unit 1 and Unit 3 Task 1 Music Technology Technical issues.
MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University.
Brian Lukoff Stanford University October 13, 2006.
Training Phase Modeling Jazz Artist Similarities Mathematically Andres Calderon Jaramillo - Mentor: Dr. Larry Lucas Department of Mathematics and Statistics,
 6 th Musical Literacy 1.1 All students will be able to use a steady tone when performing.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.
Melody Characterization by a Fuzzy Rule System Pedro J. Ponce de León, David Rizo, José M. Iñesta (DLSI, Univ. Alicante) Rafael Ramírez (MTG, Univ. Pompeu.
Semester Exam Review Vocabulary Words. Key Signature A. Musical markings which tell how loud or soft to sing B. The group of sharps or flats at the beginning.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Speech emotion detection General architecture of a speech emotion detection system: What features?
A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.
Genre Classification of Music by Tonal Harmony Carlos Pérez-Sancho, David Rizo Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,
Chelmsford Public Schools Fine and Performing Arts Department K-4 Music Curriculum Map By the end of each grade level, students will be able to: Kindergarte.
National Curriculum Requirements of Music at Key Stage 1
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
The Greek Audio Dataset
Created by: Laura Dedic Madison Central High School
National Curriculum Requirements of Music at Key Stage 1
Weaving Music Knowledge, Skills and Understanding into the new National Curriculum Key Stage 1: Music Forest Academy.
Presented by Steven Lewis
A First Look at Music Composition using LSTM Recurrent Neural Networks
Choir Training for Young Singers
Fourth Grade Music TEKS.
Fifth Grade Music TEKS.
Third Grade Music TEKS.
Music Technology What’s in the course?
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 5 The Middle Ages
Pitch Spelling Algorithms
Presentation transcript:

Modeling Expressive Performances of the Singing Voice Maria-Cristina Marinescu (Universidad Carlos III de Madrid) Rafael Ramirez (Universitat Pompeu Fabra)

voice style timbre interpretation musicality resonance technique color fluidity The human singing voice

voice style timbre interpretation musicality resonance technique color fluidity Acoustic Features: Pitch Timing Timbre Articulation Spectral energy distribution Verbal Features: Intonational phrasing The human singing voice

Our long-term goal Develop models of operatic singers …and generate expressive performances similar in voice quality and interpretation Why opera? –Constrained environment (score, libretto) makes comparison between singers and classification of expressive content easier –Better voice and technique make singers more effective in performing expressively 1.Entertainment tool – generate interpretations of songs never recorded 2.Learning tool – use by professionals to learn different aspects of expressive singing 3.Re-mastering old records 4.Understand evolution of classical singing in terms of impact and inspiration of singers

Our long-term goal Develop models of operatic singers …and generate expressive performances similar in voice quality and interpretation

In this work Develop models of operatic singers …and generate expressive performances similar in voice quality and interpretation Acoustic Features: Pitch Timing Timbre Articulation Spectral energy distribution

Timing models of expressive performance commercial audio recordings high-level descriptors singer-specific models extract train Properties of musical events Musical event = note Inter-note features – context of musical events Learn predictions for note duration based on h-l descriptor patterns. expressive performance evaluate Predict duration of each note in test melody. How close are performed / predicted values?

High-level descriptors Characterize the melody based on: –Score –Performance

High-level descriptors Characterize the melody based on: –Score: Note properties: pitch, duration, meter strength, note density EH M L H

High-level descriptors Characterize the melody based on: –Score: Note properties: pitch, duration, meter strength, note density Context: neighbours’ relative pitch and interval length, Narmour structures

High-level descriptors Characterize the melody based on: –Score: Note properties: pitch, duration, meter strength, note density Context: neighbours’ relative pitch and interval length, Narmour structures –Performance: note onset and duration OUT

High-level descriptors Characterize the melody based on: –Score: Note properties: pitch, duration, meter strength, note density Context: neighbours’ relative pitch and interval length, Narmour structures –Performance: note onset and duration –Score + performance: actual tempo

Timing models of expressive performance – extracting descriptors commercial audio recordings high-level descriptors singer-specific models extract train expressive performance evaluate Score pitch, duration: manually Note onset and duration: sound analysis techniques based on spectral models Meter strength: automatically computed based on note length Note density: manually computed for whole melody Narmour structure: automatically generate I/R analysis based on score Actual tempo: manually computed based on score and melody duration.wav files

Timing models of expressive performance – learning and validation commercial audio recordings high-level descriptors singer-specific models extract train expressive performance WEKA – model trees: - Generate decision list for regression problem using separate-and-conquer - Build model tree and make best leaf into rule – test set evaluation: - Predict note durations for test cases evaluate

Our data set 6 tenor arias by Verdi – 415 notes 1.Operatic 2.A cappella 3.Consistent composition style 4.Consistent interpretation style 5.Live

Our data set 6 tenor arias by Verdi – 415 notes 1.Operatic 2.A cappella 3.Consistent composition style –Verdi’s middle years ( ) –… maybe with exception of Rigoletto 4.Consistent interpretation style 5.Live

Our data set 6 tenor arias by Verdi – 415 notes 1.Operatic 2.A cappella 3.Consistent composition style –Verdi’s middle years ( ) –… maybe with exception of Rigoletto 4.Consistent interpretation style –Josep Carreras mid 70s – beginning 80s 5.Live

Experimental results – training and testing per aria Predict note duration as a function of h-l descriptors Compare predicted with actual (performed) Training/testing task: 80% percentage split evaluation Aria Correl coeff Ella mi fu rapita! 0.74 Un di, felice, eterea 0.72 De miei bollenti spiriti 0.46 Forse la soglia attinse 0.57 Oh fede negar potessi! 0.29 La pia materna mano 0.20 All arias (4/4) 0.56 no correlation Widely diverse actual tempos and note densities! 110 / 89 / 54 / 76 / / 5.66 / 3.6 / 3.47 / 4.71

Note number Relative duration Actual vs. Predicted - Ella mi fu rapita! Actual Predicted What is the extent to which the model chooses the same time transformation for a note as the singer does?

Experimental results – validation of the model Validation task: test set evaluation of each 4/4 aria against learned 4/4 model Aria Correl coeff Ella mi fu rapita! 0.62 De miei bollenti spiriti 0.57 Forse la soglia attinse 0.94 Oh fede negar potessi! 0.61 La pia materna mano 0.80 Larger data set strengthens the model! From statistically irrelevant 0.29 / 0.2 Down from 0.74

Relative duration Note number Actual Predicted Actual vs. Predicted - Forse la soglia attinse

Work in progress and future work Add more arias Add more acoustic features to singer-specific model –Currently: energy model –Future: timbre, exagerration, pitch range, accent shape, articulation Add more input parameters –Currently: syllable information (open/closed, stressed/unstressed) –Future: intonational phrasing

Questions?