Music Information Retrieval

Music Information Retrieval
D Chapt. 9.2

Information Retrieval
Aim effectively retrieve documents which convey content being relevant to the user’s information needs Techniques to index, search, and retrieve documents from collections Best results: collection of documents and user’s queries written in textual form and in English language statistical and probabilistic techniques Content-based multimedia access each media requires specific techniques specific techniques should be integrated whenever different media are present

Digital Music and Digital Libraries
Increasing interest towards music stored in digital format music is an art form that can be shared by people with different culture technology allows for an access that is almost comparable to the listening of a live performance music is an art form that can be both cultivated and popular Music Digital Libraries access by users from all over the world preservation of cultural heritage important multimedia DL component

Music Information Retrieval
Need of specific and effective techniques Current approaches to music IR are based either on string matching algorithms but: poor results textual bibliographic catalogue but: content-based retrieval impossible

Architecture for Music Retrieval
Users Music Player Music Query Interface Query result Result music objects Music query Music Storage manager Music Feature Extractor Music Query Processor Music objects Music features Result music objects Music Database Music Index

9.2.2 Content-based MIR Music can be considered as another medium together with text, image, video, and speech Specific issues form, instantiation, dimension, content, perception, user profile, and formats

9.2.2.1 Peculiarities of the Music Language
A music work can be represented in two different main forms: the notated form  score the acoustic form  performance Each music work may have different instantiations more performances correspond to an individual score the same music work may be transcribed into different scores Different dimensions characterize the information conveyed by music e.g. melody, harmony, rhythm, and structure e.g. timbre, articulation, and timing the choice of a representation format has a direct impact

Peculiarities of the Music Language (cont.)
It is still unclear what type of content, if any, music works do convey e.g. the concept of tempest Shakespeare Giorgione a tornado (video) Beethoven’s Sixth Symphony IV Movement Rossini’s Overture of “William Tell” Vivaldi’s Concerto “La Tempesta di Mare” In principle, music language does not convey information Music and emotions Musica a programma

Peculiarities of the Music Language (cont.)
How music is perceived and processed by listeners to highlight which kind of content is carried by this medium. Listeners perceive music as structured and consisting of different basic elements It is likely that all the dimensions of music language can be segmented in their lexical units be used to extract a content from a music document.

9.2.2.3 Forms of Music Documents
Music documents can be instantiated as symbolic/structured documents represent the information on the music score → only musically trained users can read them indicate how the piece is to be played by the performer contain information on melody, harmony, structure, key and time signatures, … audio/unstructured documents are related to the perception of music by listeners → all users can listen to them are audio recordings of real performances contain information on timbre, expressive gestures, …

Formats of Music Documents
Symbolic documents – scores image of the original score (bmp, pdf, eps, jpeg) text-based open-source representations (abc, Guido) proprietary/commercial representations (Finale, Sibelius, CuBase) Musical Interface Digital Instruments (MIDI) Audio documents – performances time domain with pulse code modulation (wav, aiff) time domain with compression (MP3, WMA) frequency domain, with possible compression (sdiff, MPEG-7)

Symbolic Scores Information that can be directly extracted
all the melodic lines in a polyphonic piece key and time signatures exact timing and exact pitch for all the notes Information that can be extracted with some effort structure, musical form (e.g., sonata, minuetto) chords and harmonic progressions lexical units in the melodic profiles Information than can never be extracted acoustic features and timbre expressive timing and interpretation

Audio Performances Information that can be directly extracted
estimates of timbre and room acoustics Information that can be extracted with some effort beat and rhythm main (i.e., loudest) voice in a polyphony melodic lines in simple polyphony (max 3-4 voices) chords estimates music genre Information than cannot be extracted yet all the melodies in a complex polyphonic piece exact timing and exact pitch for the notes, even in monophonic scores harmony, structure

From Score to Performance
variables musicians instruments musical style room acoustics recording equipment score performance Only rough estimates of a performance can be computed from a score most of the variables are not known or vary in time there are some constant features that allows a listener to recognize a score independently from these variables

From Performance to Score
techniques beat following pitch tracking rhythm quantization harmonic analysis structure recognition performance score At the state of the art it is almost impossible to reconstruct a score from a performance algorithms are still error prone difficulties in simulating human perception the task is quite difficult also for trained musicians

Music Representations

Music Representation - Theme
A short tune that is repeated or developed in a piece of music A small part of a musical work Efficient retrieval A highly semantic representation Effective retrieval Automatic theme extraction Exact repeating patterns Approximate repeating patterns

Statistical tools for MIR

Markov Models An approach to MIR is to describe a complete theme as a Markov process Capture global information for a music piece Repeating patterns Sequential patterns A simplistic and lossy representation The Markov property does not necessarily hold for music It is impossible to recreate the document from the model Good for music classification Markov models, and their extension to Hidden Markov Models are widely used for many recognition tasks Speech recognition, biological sequence analysis, handwritten and gesture recognition

Weather: A Markov Model
80% 15% 5% 60% 2% 38% Sunny Rainy 20% 75% 5% Snowy

Ingredients of a Markov Model
States: State transition probabilities: Initial state distribution: Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75%

Ingredients of Our Markov Model
States: State transition probabilities: Initial state distribution: Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75%

Probability of a Time Series
Given: What is the probability of this series?

Markov model definition
Process that goes through a sequence of discrete states, A probabilistic automaton S, a set of N states, A, a set of transition probabilities, Π, probability of being the initial state E, a subset of S containing the legal ending states. Markov property: the probability of transitioning from a given state to another state depends only on the current state.

Markov Model representation example
Homophonic reduction For each chord, compute its distance with the 24 lexical chords Capture statistical properties by Markov models The representation of each song is reduced into a matrix Lexical chords Chord [Pickens and Crawford, CIKM‘02] Markov model representation

Hidden Markov Models (HMMs)
Hidden Markov Models are probabilistic finite state automata HMMs are an extension of Markov models, where states are not directly observable HMMs can be represented as a direct graph with a number N of states q1…qN – at each time step t the HMM performs a transition from state qt(i) to state qt+1(j) state qt+1(j) emits symbol ot+1 i k j h t t+1 ot+1

Hidden Markov Models (cont.)
HMMs properties Each state has a given probability of being the initial state P(q(1)) Transitions are ruled by probability distributions, which are independent on previous path P(q(t+1) | q(t)…q(1)) = P(q(t+1) | q(t)) Emissions are ruled by probability distributions, which depend only on last state P(ot+1 | q(t+1),q(t)…q(1)) = P(ot+1 | q(t+1)) Since states are not observable, many state sequences may correspond to a sequence of emission HMMs are useful to model the evolution of an unknown process that generates a sequence of observations

Hidden Markov Models (cont.)
A Hidden Markov Model λ is completely defined by number of states: N alphabet of symbols emitted by the model: k  K probability of being the initial state: πi transition probability from state qi to state qj : aij emission probability of symbol k by state qj : bj(k) Application to music states qj i  N are the notes in a score symbols k are the features, e.g. midi pitch, spectral peaks initial state probability = 1 for the first note in the score transition probabilities depend on the counting of times a pitch is followed by another in the score emission probabilities model errors in the query

Hidden Markov Models 80% 60% Sunny NOT OBSERVABLE Rainy 15% 38% 5% 2%
65% 5% 30% Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% 60% 10% 30% NOT OBSERVABLE 50% 0%

Ingredients of an HMM States: State transition probabilities:
Initial state distribution: Observations: Observation probabilities

Ingredients of Our HMM States: Observations:
State transition probabilities: Initial state distribution: Observation probabilities:

Probability of a Time Series
Given: What is the probability of this series?

Examples: Melody modeling through HMMs
Given the melody excerpt: It can be modeled by HMM: C D A G F C D C D A G F 0.5 1 0.66 0.33 58 0.05 59 0.10 60 0.7 61 62 60 0.05 61 0.10 62 0.7 63 64 … transition probabilities emission probabilities

The three main problems of HMMs
Given a sequence of observations O = o1 … oT , and a model λ: Recognition: compute the probability that P(O|λ) If there is a set of competing models λ1 … λN, recognition is carried out by computing λ = argmaxλ P(o1 … oT | λi) Decoding: compute the most probable path across the states Viterbi decoding computes the most probable global path q1…qT = argmaxq P(q(1)…q(T)| o1 … oT, λ) Training: compute the models parameters (πi aij bj(k)) that maximizes the probability P(O|λ) Usually the expectation-maximization algorithm is used

The user

9.2.2.2 User Information Needs – 1
“Find me the song that sounds like this” Example is a melody sung or hummed by the user Retrieval based on melodic features Can be performed on symbolic documents (very hard to carry out on audio documents) User wants audio performances “I like this one, give me more songs that I may like” Example can be a sung melody, an audio excerpt, a reference to a song Retrieval based on similarity measures using melodic features, rhythm, timbre, musical, genre,… Can be performed on both symbolic and audio Users wants audio performances

User Information Needs – 2
“I search all the pieces with these characteristics” Example is an excerpt of a score Retrieval based on high-level features Can be performed on symbolic documents only User has good knowledge of the domain and wants symbolic scores “Tell me who and when is playing this song” Example is an audio excerpt (e.g., recorded from radio) Retrieval based on approximate match between audio features Can be performed on audio documents only User wants song metadata, and eventually where to buy/download it

User Information Needs – 3
“Help me reorganizing my digital music collection” Collection of audio documents, with metadata Rather a classification task, aimed at easy browsing Professional: “Find me another song with this beat” Example is an audio excerpt or a sung rhythm Retrieval based on exact match User wants audio performances Professional: “Search for a suitable soundtrack” Example are high level features, textual description, possibly an example of something similar Retrieval based on descriptors of different dimensions User may want either symbolic scores or audio performances, depending on the context

9.2.2.4 Dissemination of Music Documents
Evaluation of documents relevance central role played by time in the listening to music. Listen to the full piece Listen to incipit, = first notes. But . . .

Data-based MIR Data-based music IR systems allow users for searching databases by specifying exact values for predefined fields, such as composer name, title, date of publication, type of work, etc. content-based retrieval almost impossible bibliographic values are not always able to describe exhaustively and precisely the content of music works. Searching by composer name can be very effective

9.2.3.2 Content-based MIR Content-based MIR approach:
take into account the music document content e.g. notation or performance automatically extract some features to be used as content descriptors e.g. incipites or other melody fragments, timing or rhythm, instrumentation Typical content-based approaches are based on the extraction of note strings from the full-score Sometimes are oriented to disclosing music document semantic content using some music information

Content-based MIR (cont.)
On-line searching techniques compute a match between a representation of the query and a representation of the documents each time a new query is submitted to the system allows for a direct modeling of query errors high computational costs Indexing techniques extract off-line from music documents all the relevant information that is needed at retrieval time and perform the match between query and documents indexes. are more scalable to the document collection more difficult extraction of document content

On-line methods (algorithms)
On-line methods (string matching algorithms) Exact string matching Brute-force method KMP algorithm Boyer-Moore algorithm Shift-Or algorithm Partial string matching Approximate string matching Dynamic programming

Content-based MIR (cont.)
Most of the approaches are based on melody, while other music dimensions, such as harmony, timbre, or structure, are not taken into account. Query by humming Melody Rhythm Lyrics Query processing represent it either as a single note string, or as a sequence of smaller note fragments arbitrary note strings, such as n-grams fragments extracted using melody information.

Techniques for Music Information Retrieval
Feature: one of the characteristics that describe subsequent notes in a score. Examples: pitch, pitch interval with the previous note (PIT), a quantized PIT, the duration, the interonset interval with the subsequent note (IOI), the ratio of IOI with the previous note, All the features can be normalized or quantized.

Terminology (cont.) String: a sequence of features.
Any sequence of notes in a melody can be considered a string. The effectiveness by which each string represents a document may differ. String length: Long strings effective for search, but difficult to remember Pattern: a string that is repeated at least twice in the score. Length n and number of times r it is repeated inside the score. Algorithms for pattern discovery n-grams: all the string on length n of a document Describe the document

9.2.5 Document indexing Mandatory step for textual information retrieval. Through indexing, the relevant information about a collection of documents is computed and stored in a format that allows easy and fast access at retrieval time. It is faster to search for a match inside the indexes than inside the complete documents. Patterns may be effective descriptors for document indexing Exhaustive pattern discovery, but: patterns that have little or no musical meaning patterns that appear in almost all documents (e.g. scales)

Measuring pattern relevance: td * idf
Time frequency: number of occurrences of a given pattern inside a document. Inverse document frequency takes into account the number of different documents in which a patters appears Relevant patterns of a document have a high tf and/or a high idf The value of each pattern is given by the tf * idf product

Indexing Document representation: array of patterns
The index is built as an inverted file, entry: pattern hash maximum allowable pattern length to improve indexing

Query Processing Query-by-example paradigm
by singing (humming or whistling), by playing, or editing a short excerpt of the melody A query is likely to contain relevant patterns to extract relevant strings, or potential patterns, from a query consists in computing all its possible substrings. most of the arbitrary strings of a query will never form a relevant pattern inside the collection.

Ranking Relevant Documents
Retrieval Status Value (RSV) distance between the vector of strings representing the query and the vector of patterns representing each document Ordering depends on the selected features e.g. IOI, PIT, both (BTH) Data fusion approach to ranking: combine different ranking by weighted sum

The phases of a methodology for MIR
Indexing, retrieval, and data fusion

9.2.5.3 Measures for Performances of MIR Systems
MIR produces ranked list of documents Only the final user can judge relevance of retrieved documents Dissatisfaction causes: silence effect: the system does not retrieve documents that are relevant for the user information needs noise effect: the system retrieves documents that are not relevant for the user information needs All real systems for MIR try to balance these two negative effects. User studies are very expensive and time consuming. Need of automatic evaluation of the proposed systems: recall - precision

Measuring Search Effectiveness
Precision and recall are the basic measures used in evaluating search strategies.

Measuring Search Effectiveness (cont.)
RECALL is the ratio of the number of relevant records retrieved to the total number of relevant records in the database. PRECISION is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved.

A general inverse relationship between recall and precision To achieve high recall The searcher must include synonyms, related terms, broad or general terms, etc. for each concept  precision will suffer

In ranked lists: compute these measures also for the first N documents; compute the precision at given levels of recall. average precision: the mean of the different precisions computed each time a new relevant document is observed in the rank list. Evaluation on a test collection a set of documents, a set of queries, a set of relevance judgments that match documents to queries. Relevance judgments should be normally given by a pool of experts

Combining content and context
Audio signal does not capture all the relevant features There is usually no contextual information Usage in a movie Historical period Association with a particular situation Contextual information has limitations as well Creating metadata is a long and error-prone process New songs lack any kind of information

Approach Model at the same time Similarity based on audio content
Represented by timbre descriptors The contextual description of songs Represented by textual descriptors (aka tags) HMMs can be used, assuming that States represent songs Transitions are content-dependent Observations are context-dependent

Modeling content & context

Another use of the model
HMMs are also a generative model Given a new song, add it to the graph The position depends on audio similarity Based on a different dimensions Which are the tags observed “around” the new song? Different strategies to discover new tags Random-walk Optimal path (Viterbi) Shortest path (Dijkstra) Nearest neighbor Clustering Results have been evaluated on a experimental collection

Autotagging – Results

Coda: ClipMark soundtrack song index system output feature extraction
audio identification on-line alignment song index alignment supervisor system output

Main Components Approach based on audio processing techniques
Audio is immersive Low-dimensional Easy to capture from any portable device Three interacting modules Audio identification Responsible for initial synchronization On-line alignment Continuously follows the incoming audio Alignment supervisor Monitors the alignment and eventually restart synchro Demo... CLIPBOARD Nicola Orio

Domande?

Music Information Retrieval

Similar presentations

Presentation on theme: "Music Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Music Information Retrieval

Similar presentations

Presentation on theme: "Music Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback