Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Slides:



Advertisements
Similar presentations
Relevance Feedback and User Interaction for CBIR Hai Le Supervisor: Dr. Sid Ray.
Advertisements

MFCC for Music Modeling
Face Recognition and Biometric Systems Eigenfaces (2)
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Computational Rhythm and Beat Analysis Nick Berkner.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
Rhythmic Similarity Carmine Casciato MUMT 611 Thursday, March 13, 2005.
Extracting Noise-Robust Features from Audio Data Chris Burges, John Platt, Erin Renshaw, Soumya Jana* Microsoft Research *U. Illinois, Urbana/Champaign.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Representing Acoustic Information
Indexing Techniques Mei-Chen Yeh.
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Music retrieval Conventional music retrieval systems Exact queries: ”Give me all songs from J.Lo’s latest album” What about ”Give me the music that I like”?
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
CS591k - 20th November - Fall Content-Based Retrieval of Music and Audio Seminar : CS591k Multimedia Systems By Rahul Parthe Anirudha Vaidya.
Content-Based Image Retrieval
Jacob Zurasky ECE5526 – Spring 2011
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Predicting Voice Elicited Emotions
Realtime Recognition of Orchestral Instruments Ichiro Fujinaga McGill University.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
Query by Singing and Humming System
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Music Emotion Classification: A Fuzzy Approach
Segmenting Popular Music Sentence by Sentence Wan-chi Lee.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
PATTERN COMPARISON TECHNIQUES
Carmine Casciato MUMT 611 Thursday, March 13, 2005
A review of audio fingerprinting (Cano et al. 2005)
Instance Based Learning
Video Google: Text Retrieval Approach to Object Matching in Videos
Histogram—Representation of Color Feature in Image Processing Yang, Li
Carmine Casciato MUMT 611 Thursday, March 13, 2005
Presented by Steven Lewis
EE513 Audio Signals and Systems
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
J. Ellis, F. Jenet, & M. McLaughlin
Nearest Neighbors CSC 576: Data Mining.
Video Google: Text Retrieval Approach to Object Matching in Videos
Machine Learning with Clinical Data
Realtime Recognition of Orchestral Instruments
Realtime Recognition of Orchestral Instruments
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University

Overview Need effective ways to browse by content through audio databases of growing sizes Need effective ways to browse by content through audio databases of growing sizes Using descriptive sound parameters or query by example systems Using descriptive sound parameters or query by example systems Determine similarity to query in order to rank search results by relevance (AudioGoogle) Determine similarity to query in order to rank search results by relevance (AudioGoogle) Feature selection is the sinews of war… Feature selection is the sinews of war…

Cheng Yang Approach (1) Audio files preprocessed to identify local peaks in signal power (n = /min) Audio files preprocessed to identify local peaks in signal power (n = /min) Spectrogram computed using STFT of 2048 samples with Hamming window of 1024 samples and overlap factor of 2 Spectrogram computed using STFT of 2048 samples with Hamming window of 1024 samples and overlap factor of 2 Spectral vector extracted around each peak makes up (n, 180, k<<2048) feature space ( Hz range only) Spectral vector extracted around each peak makes up (n, 180, k<<2048) feature space ( Hz range only)

Yang Approach (2) Given an example query, compute the feature vector for the query and look for similar audio in database Given an example query, compute the feature vector for the query and look for similar audio in database Compute minimum distance between query and database feature sets saving time using dynamic programming techniques (use results from previous pairs) Compute minimum distance between query and database feature sets saving time using dynamic programming techniques (use results from previous pairs) Linearity filtering to favor time- scaled version compared to error orientation disagreement Linearity filtering to favor time- scaled version compared to error orientation disagreement

Yang’s Results Use database of 120 song excerpts (~1min) Use database of 120 song excerpts (~1min) Good performance with varying tempos, audio quality, performance variations Good performance with varying tempos, audio quality, performance variations Poor performance with transposed versions Poor performance with transposed versions Slow response, improved with indexing schemes Slow response, improved with indexing schemes

Jonathan Foote Approach Calculate feature vectors of audio examples of desired classes (12 MFCCs plus energy) Calculate feature vectors of audio examples of desired classes (12 MFCCs plus energy) Supervise training of quantized tree (partition feature space in maximally different class populations) Supervise training of quantized tree (partition feature space in maximally different class populations) Parameterized data is quantized using the tree for subsequent retrieval (creates template) Parameterized data is quantized using the tree for subsequent retrieval (creates template) To retrieve similar audio content, template is constructed for query audio, compared with corpus templates using cosine distance measure To retrieve similar audio content, template is constructed for query audio, compared with corpus templates using cosine distance measure

Foote’s Results Good way of measuring subjective qualities of sound, without using targeted features Good way of measuring subjective qualities of sound, without using targeted features Not as accurate to other techniques using psycho-acoustic knowledge in finding similar timbres (e.g. instruments) Not as accurate to other techniques using psycho-acoustic knowledge in finding similar timbres (e.g. instruments) Sensitive to pitch (will often return different timbres of same pitch) Sensitive to pitch (will often return different timbres of same pitch)

Erling Wold et al. Approach (1) Implemented several approaches in Muscle Fish software Implemented several approaches in Muscle Fish software More particularly, specify explicit perceptual features (loudness, pitch, brightness, bandwidth, harmonicity) More particularly, specify explicit perceptual features (loudness, pitch, brightness, bandwidth, harmonicity) Statistics of corresponding acoustic correlates calculated for entire sample (mean, variance, autocorrelation) form a-vector Statistics of corresponding acoustic correlates calculated for entire sample (mean, variance, autocorrelation) form a-vector For training set, mean vector calculated and covariance matrix built from the examples and becomes systems model For training set, mean vector calculated and covariance matrix built from the examples and becomes systems model

Wold Approach (2) Use a weighted Euclidean distance for classification and similarity measurements Use a weighted Euclidean distance for classification and similarity measurements Distance compared to threshold to decide if objects belong to the same class (optional) Distance compared to threshold to decide if objects belong to the same class (optional)

Wold Approach (3) Segmentation is required beforehand, achieved using same features, detecting strong discrepancies Segmentation is required beforehand, achieved using same features, detecting strong discrepancies

Wold and Foote comparison What I retain: Wold has proven that it is possible to use statistical methods for flexible classification