Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering Bureau IRCAM – Centre Pompidou IAML - IASA 2004 Congress, Oslo IRCAM - Institut de Recherche et Coordination Acoustique/Musique IAML- International Association of Music Libraries IASA – International association of sound archives Presented by: Shailesh Deshpande 06/28/2009
Agenda Introduction Why MIR? Take 1: multi-disciplinary domain Take 2: schematic Take 3: typology Challenges IRCAM cataloging tool
Introduction Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music Paper presents three views of this domain Challenges What is an incipit? First few words or opening line of a book. In music – first few notes of a composition.
Why MIR? Storage => increased availability of musical content in digital form (locally) CD’s, DVD’s, iPods Computing power => faster processing of large volumes of digitized content Networks => increased availability of musical content in digital form (remotely) Pandora, Yahoo Music, iTunes Technological advances + demand from consumers = attention of research and industry
Take 1: multi-disciplinary domain General Computer Science, Data Processing, AI, Pattern Recognition, Library & Information Sciences Philosophy and Psychology Sensory Perception, Emotions & feelings, Mental processes & intelligence Social Sciences Sociology & Anthropology, Culture & Institutions, Law, Commerce Natural Science & Mathematics General Technology Electric, Electronic, Magnetic, Communications & Computer Engineering The Arts Music, Aesthetics, Composition
Take 2: schematic representation of MIR
Take 3: a typology of MIR Preprocessing OCR, digitization, compression Encoding, notation Feature extraction Segmentation Instrument recognition Voice recognition Indexing Identification Clustering Classification Extraction Melody, Key, Harmony, Rhythm Structural analysis Polyphony Repetition Similarity Summarization Organization Databases, systems, networks Compression Synchronization Metadata Search Objective criteria Metadata indices (name, title, period, genre, instrumentation) Full-text (with or without semantic tags) Query by example (audio excerpt, melody, contour, rhythm, tonality, harmony) Similarity Acoustical characteristics Subjective criteria Mood Taste Retrieve, deliver, use Browsing Playlists Using and reusing (annotate, combine, transform) Rights management (recognition, watermarking) Usability Evaluation User studies
Music terms used in MIR Pitch – perceived fundamental frequency of a sound. Maybe different from actual frequency because of harmonics. Timbre – the quality of a musical note that distinguishes different types of sound production, such as voices or musical instruments (saxophone vs. trumpet – with same pitch and loudness) Rhythm (aka beat) - the variation of the length and accentuation of a series of sounds Tempo – the speed or pace of a musical piece. Usually affects the Mood of a song. Melody – a linear succession of musical tones which is perceived as a single entity (‘horizontal’ aspect of music) Harmony – simultaneous use of different pitches (‘vertical’ aspect of music) Monophony – musical texture consisting of melody without accompanying harmony Polyphony - is a texture consisting of two or more independent melodic voices
Common Methods Modeling: start from a theory, look for patterns Look for melodies, harmonic progressions Attempt to find elements in data that correspond to such entities Statistical methods: look for patterns, build a theory Perform statistical analysis on data, find common patterns and group them in clusters Attempt to interpret their occurrence in musical pieces
MIR Challenges The integration of audiovisual, symbolic and textual data Fingerprinting - unique small set of features excerpted from a sound file, allowing to discriminate it from any other sound file Music Summarization- how to select a representative excerpt that gives a good idea of the work (similar to thumbnails for image files) Computing Similarity – no unique way in which two pieces may be similar Melodic, Rhythmic, Timbre, Genre, Style similarities Indexing a musical piece by melody – to allow QBH interface
MIR Challenges contd.. Encoding of music – at acoustic, structural and semantic levels Query-by-example – search for music by singing, humming, whistling or playing an audio excerpt Watermarking – adding identification information to digital audio for DRM Benchmarking - limited number of standardized test collections available for evaluation of MIR systems
A tool to catalog and extract audio CD contents for online distribution Automatic identification of CDs Compute CDDB of the CD CDDB - a binary number reflecting the offsets (start time) and lengths of the tracks of the CD Metadata retrieval and correction Query Internet CDDB for metadata Allow correction Extraction and compression Transfer to a Web server
IRCAM tool interface When a CD is inserted in the computer: -The tool computes its CDDB - Retrieves the metadata if available (freedb.org, cddb.com, allmusic.com) - Allows the librarian to correct errors, structure the tracks into works and select names from authority lists. - When done, it adds the metadata to the catalog, and extracts the tracks, compresses them and sends them to the audio server.
Information sources The International Society for Music Information Retrieval ( University of Illinois’ Graduate School of Library and Information Science ( IRCAM ( The Listen Game — UCSD Computer Audition Lab MIR music ranking game (Herd It on Facebook) Multi-player game where you listen to music with lots of other people (aka the Herd). You are asked to describe the music (genre, mood, singer etc.) and get points when the Herd agrees with you. Innovative way to harness the power of social networking and collect metadata for MIR