Music Indexing and Retrieval for Multimedia Digital Libraries King Fahd University of Petroleum and Minerals Information and Computer Science Department.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Music Introduction to Humanities. Music chapter 9 Music is one of the most powerful of the arts partly because sounds – more than any other sensory stimulus.
Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
Properties of Text CS336 Lecture 3:. 2 Generating Document Representations Want to automatically generate with little human intervention Use significant.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
ADVISE: Advanced Digital Video Information Segmentation Engine
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Ch 4: Information Retrieval and Text Mining
T.Sharon 1 Internet Resources Discovery (IRD) Music IR.
Final Presentation Simon McNeilly Supervisors Dr. Lloyd Allison Jon McCormack Melody Generation by Phrase.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Sound Applications Advanced Multimedia Tamara Berg.
JSymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Polyphonic Queries A Review of Recent Research by Cory Mckay.
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
August 12, 2004IAML - IASA 2004 Congress, Olso1 Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits Michael.
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Aspects of Music Information Retrieval Will Meurer School of Information University of Texas.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Vector Space Models.
Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
1 Automatic Music Style Recognition Arturo Camacho.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Discovering Musical Patterns through Perceptive Heuristics By Oliver Lartillot Presentation by Ananda Jacobs.
Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.
A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
A Music Search Engine for Plagiarism Detection
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Music Information Retrieval
Memory and Melodic Density : A Model for Melody Segmentation
Presented by Steven Lewis
Multimedia Information Retrieval
Integrating Segmentation and Similarity in Melodic Analysis
Information Retrieval and Web Design
Presentation transcript:

Music Indexing and Retrieval for Multimedia Digital Libraries King Fahd University of Petroleum and Minerals Information and Computer Science Department ICS 542: Multimedia Computing - Dr. Was Al-Khatib Spring (072) Nicola Orio Department of Information Engineering – University of Padua, Italy Presented by : Mohammed H Yahia, ID# :

2 Outlines  The problem of the retrieval of music documents from multimedia digital libraries  Similarities and differences between indexing and retrieval of textual and music documents.  The main approaches to music retrieval  A novel methodology.  Results and conculsions.

3 1. Introduction  The access to music digital libraries should be content-based. Why? Because the users of musical documents have different knowledge and may range from casual listeners to performers and composers.  McLane stated that the application of some standard principles of textual information retrieval can be extended to indexing and retrieval of music documents. [30]

4 2. Background 2.1 Music Metadata 2.2 The Dimensions of Music 2.3 Application of Information Retrieval Concepts to Music

5 2.1 Music Metadata Music metadata describes a number of characteristics of music documents, which can be divided in three main categories:  Bibliographic values : ( the title of a complete work, the titles of its parts, the name of the performers or composers, …).  Information on music form : ( It gives information about genre, time and key signatures, tempo, orchestration, and so on).  Additional available information: (lyrics for vocal works).

6 Limitations of the use of metadata for music indexing  A study [25], showed that users are interested in retrieving songs by their specific content.  A user needs to have a good knowledge of the music domain.  General information is often too generic to be a good discriminator between different music works.  a user needs to be aware of the difference between a “dabka” and a “chobi” in order to effectively use these metadata to retrieve Arabic music  Finally, it is not clear yet that music can be considered as a language and describes something other than itself.

7 2.2 The Dimensions of Music There are four dimensions that can capture features of a music document, recognized easily by listeners, and they can be used as tools to describe, analyze, and study music works.  Rhythm  Melody  Harmony  Timbre As stated in [26], for any chosen dimension, the indexing scheme has to be based on a suitable definition of the particular lexical units of the dimension and their representation.

8 2.3 Application of Information Retrieval Concepts to Music The Main steps related to textual document indexing that are relevant to this discussion are : 1.Lexical analysis. 2.Stopwords removal. 3.Stemming. 4.Terms weighting.

9 Lexical analysis to the music domain  Describing some high-level characteristics of a music documents, related to one or more of its dimensions. (For instance, if rhythm is used to index music documents, the attack time of the different notes has to be automatically detected, and filtered.)  Melodic segmentation can be done by many techniques such as perception, statistical modeling, and discontinuity detection [38].  Difficulties of lexical analysis : Music language lacks of separators between candidate index terms for all of its dimensions. (For instance, Melodic phrases is not contoured by special signs or sounds that express the presence of a boundary between two lexical units, such as blanks or comma in textual documents.)

10 Example of the results of different segmentation techniques based on: a) Gestalt principles, b) statistical modeling, c) discontinuity detection

11 Stopwords removal  The term stop-list is seldom used in music documents indexing and retrieval.  It is difficult to state whether or not a musical lexical unit has a meaning.  Depending on the particular set of features used as content descriptors, the designer of a music retrieval system can make a number of choices about the possible stop-lists of lexical units, which could be driven by both musicological and statistical analysis.

12 Stemming The idea behind stemming is that two index terms may be different but can convey similar meanings. In music, two musical lexical units may be slightly different, yet listeners can perceive them as almost identical.  Quantization : Is a way to take into account variants in lexical units. SymbolsQuantization level 2401cents 0, 1,..., semitones: 0, 1,...,12 15intervals: unison, second,..., octave 9perceptual intervals: unison, small, medium, large 3direction: same, up Number of different symbols when quantization is applied to intervals within an octave

13 Terms Weighting The popular weighting scheme for music indexing, called ( term frequency - inverse document frequency ), tf. idf  Term frequency seems to be a reasonable choice if a musical lexical unit, for any chosen dimension, appears frequently inside a given document, it is very likely that listeners will remember it.  Inverse document frequency seems to be a reasonable choice if a lexical unit that is very common inside a collection of documents can be related to the style of a thematic collection.

14 3 Approaches to Music Retrieval Retrieval can be carried out on any dimension. For example, based on the melody, The user could provide an example of the sound that he is looking for. (query by example).  Computation of index terms, similar to words in textual documents.  Sequence matching techniques, which consider both the query and the documents as sequences of symbols and model the possible differences between them).  Geometric methods, which can cope with polyphonic scores and may exploit the properties of continuous distance measures.

Melodic Retrieval Based on Index Terms  N-grams : Melodies were indexed through the use of n- grams. [11]  Did not seem to be robust to decreases in query length  Musical phrases : Indexing was carried out by highlighting musically relevant sequences of notes [32].  The length of indexes was not fixed but depended on the musical context.  Segmentation : Patterns were computed using either only rhythm, only pitch, or the combined information, and the final retrieval was carried out using a data fusion approach. [41], [37]  Results : N-grams are still the best choice for index terms 0.98 of average precision

Melodic Retrieval Based on Sequence Matching A representation of the query (an approximate excerpt provided by the user) is compared with the representations of the documents in the collection.  Techniques have been applied to melodic retrieval on sequence matching :  Approximate string matching. (melodies are represented by three symbols – ascending or descending interval and same note – in order to cope with possible mismatches in pitch between the query and the documents )  Statistical models (modeling a set of themes extracted from musical documents. )  A computational cost for a single comparison is O(m+n), where m is the length of the query and n is the size of the document. So it is Sequence matching techniques are very efficient very

Melodic Retrieval Based on Geometric Methods  The matching of the query with documents can be computed in a geometric framework.  The complete score is represented as a set of points, or lines, on a plane: the vertical axis usually corresponds to pitch while the horizontal axis corresponds to time. The same representation applies to queries.

18 4 A Probabilistic Approach to Music Indexing and Retrieval The methodology presented in this section combines the idea of sequence matching with an indexing scheme.  indexing techniques are efficient and scalable, but do not take into account the presence of errors in the query.  sequence matching can direct model differences between queries and documents but is characterized by high computational costs Common pitch errors in users’ queries: a)original melody b)pitch errors c)tonality errors d)insertion errors e)deletion errors

19 A Novel Methodology is Proposed for Describing Music Documents With the aim of overcoming the drawbacks of both techniques, a novel methodology is proposed for describing music documents.  contour models are computed from melodic N-grams, on sequences of N notes in the melody, and are used as index terms.  Retrieval is then carried out by performing an approximate matching between the lexical units extracted from the query and the contour models computed from the collection.  The approximate matching is based on an application of Weighted Transducers (WTs) as models for contours.  Once the most probable contour models corresponding to a query are computed, retrieval is then carried out using standard techniques.

20  At indexing time:  all the J sequences of N notes in the collection are extracted and used to build WT models M j, with j  J;  M j are indexed using an inverted file, which links each M j with the music documents it belongs to.  At retrieval time:  the user’s query is transcribed to a sequence of notes, from which all the subsequences Q i of N notes are computed.  the probability p(i, j) that Q i corresponds to model M j is computed for all i and j.  the distance between the query and the documents is computed using the Vector Space Model [1], with a variant of the tf ・ idf weighting scheme. The Summary of the methodology

Description of the Model Weighted Transducer is an 8-tuple T = (Q, i, F, Σ, φ, σ, λ, ρ), where: Q is the set of N states, each one described by a vector, i is the initial state, F  Q is the set of final states, Σ is the input alphabet, φ is the transition density functions mapping Q × Σ to Q ⋆, σ is the output function mapping Q × Σ to R +, λ  R + is the initial weight, ρ is the final weight function mapping F to R +. A string w is accepted by a transducer T if there exists f  F such that φ(i,w) = f. The output associated with w is: λ + σ(i,w)+ρ(f).

22 WTs and HMMs WTs are similar to Hidden Markov Models (HMMs). The three main problems of HMMs can be solved for WTs too.  Recognition of an unknown sequence where S is the sequence of states s1,..., sN, sk is its generic elemen, and X is input sequence.  The recognition of an unknown input sequence (by computing the probability of the most probable path corresponding to the sequence of observations, using the iterative steps ) The iteration of these two steps over the input sequence X 1 k gives the best state path and the weight of the model conditioned by the input.

Application of Weighted Transducers to Melody Representation The information used to describe a melody is bases on pitch and the duration of events (notes and rests)  A preprocessing step : transforms initial information in the array [duration,  pitch], where duration is represented in beats and  pitch(k) = pitch(k) − pitch(k − 1), where pitch(k) is the MIDI pitch k,  pitch is the difference in semitones between two subsequent notes.  The input alphabet of the WT is composed by a vector Σ :  Based on the computation from duration and  pitch, a generic state Q is described by the vector.

Computation of the Similarity between a Query and the Documents  Each document in the collection is described by its set of WT models, ( and it will be the basis for the computation of the similarity between users’ query and the documents in the collection).  the weight v(s,m) corresponding to each model m is computed (using the Viterbi algorithm).  The output is used to compute a measure of similarity between the query and the documents, ( using a variant of the tf ・ idf weighting scheme).  The standard tf ・ idf computation is based on : where freq(m, d) is the frequency of model m in document d, N is the number of documents in the collection, n m is the number of documents that contain contour m.  the information v(s,m) about the similarity between query sequence and document models has to be added to the model by : where occq(s) is the frequency of sequence s in query q

25 5. Results  The approach was tested using 2004 songs in MIDI format.  The melody was extracted from each song and used to build the set of WT models.  A set of 31queries, provided by the MAMI team, were used.  The retrieval procedure described in Sect. 4.3 was applied, with different values for Nmax.  shorter sequences – two or one notes – were considered too generic to be a good content descriptor, while longer sequences – more than seven notes – increased the computational complexity.

26  models with longer contours improve the average precision because they exploit more information about the melody.  The recall level does not present a variation trend increasing the models size, because the number of songs found in the first 50 positions is nearly constant. Results

27 6. Conclusions  This paper presents an overview of problems and characteristics of music retrieval, and presents a novel methodology based on approximate indexing of music documents.  Themethodology was tested on a test collection of music documents, using a set of audio queries, with couraging results.