Matthias Gruhne, Page 1 Fraunhofer Institut Integrierte Schaltungen Robust Audio Identification for Commercial Applications Matthias Gruhne Fraunhofer IIS, AEMT, D Ilmenau, Germany
Matthias Gruhne, Page 2 Fraunhofer Institut Integrierte Schaltungen Overview What is AudioID? Requirements System Architecture MPEG 7 Recognition Performance Applications Conclusions Demonstration
Matthias Gruhne, Page 3 Fraunhofer Institut Integrierte Schaltungen What is AudioID?
Matthias Gruhne, Page 4 Fraunhofer Institut Integrierte Schaltungen What is AudioID? Identify audio material (artist, song, etc.) by analysis of the signal itself Content-Based Identification No associated information required (headers, ID3 tags) No embedded signals (e.g. watermark), are required Some knowledge available about music to be identified (reference database) Purpose Conditions
Matthias Gruhne, Page 5 Fraunhofer Institut Integrierte Schaltungen Requirements High recognition rates (> 95%), even with distorted signals Robust against various distortions: –volume change, equalization, noise addition, audio coding (e.g. MP3),... –analog artifacts (e.g. D/A, A/D) Small signature size Extensibility of database (> 10 6 items) while keeping processing time low (few ms/item) Recognition rate Robustness Compactness Scalability
Matthias Gruhne, Page 6 Fraunhofer Institut Integrierte Schaltungen System Architecture - Overview
Matthias Gruhne, Page 7 Fraunhofer Institut Integrierte Schaltungen System Architecture Signal preprocessing Extract the essence of audio signal Increase discriminance & efficiency Temporal grouping of features (super vector) Statistics calculation (mean, variance, etc.) Feature Extractor Feature Processor
Matthias Gruhne, Page 8 Fraunhofer Institut Integrierte Schaltungen System Architecture Clustering of processed feature vectors: –further reduce the amount of data –enhance robustness (overfitting) Add class with associated metadata to database Compare feature vectors against classes in database by means of some metric Find class yielding the best approximation Retrieve associated metadata Class generator Classification
Matthias Gruhne, Page 9 Fraunhofer Institut Integrierte Schaltungen MPEG-7 - Elements for Robust Audio Matching AudioSpectrumFlatness LLD –Derived from: Spectral Flatness Measure (SFM) –Describes un/flatness of spectrum in frequency bands (tonal noise) AudioSignature Description Scheme –Statistical data summarization of AudioSpectrumFlatness LLD –Textual description in XML syntax Low level data Fingerprint
Matthias Gruhne, Page 10 Fraunhofer Institut Integrierte Schaltungen MPEG-7 - Benefits Standardized Feature Format guarantees worldwide interoperability Published, open format descriptive data can be produced easily Large MPEG-7 compliant databases expected to be available in near future (incl. fingerprints) Long term format stability/ life time
Matthias Gruhne, Page 11 Fraunhofer Institut Integrierte Schaltungen Recognition Performance- Conditions Training and test sets (mostly rock / pop): –15,000 items –90,000 items Spectral Flatness Measure (SFM) Number of correctly identified items (both single best and within top 10) Conditions Considered feature Classification performance
Matthias Gruhne, Page 12 Fraunhofer Institut Integrierte Schaltungen Recognition Performance - 15k items Top 1 / Top bands Advanced matching with temporal tracking Feature:SFM Cropping100.0% / 100.0% 96kbps99.6% / 99.8% Loudsp./Mic.98.0% / 99.0%
Matthias Gruhne, Page 13 Fraunhofer Institut Integrierte Schaltungen Recognition Performance - 90k items 16 bands Advanced matching with temporal tracking
Matthias Gruhne, Page 14 Fraunhofer Institut Integrierte Schaltungen Applications Retrieve associated metadata by identifying audio content Automated search of audio content on the Internet Broadcast monitoring by protocoling the transmission of audio material Feature based indexing of audio databases (similarity search)...
Matthias Gruhne, Page 15 Fraunhofer Institut Integrierte Schaltungen Conclusions High recognition rates (>99 % tested with 90,000 items) Robust to real world signal distortions Fast and reliable extraction and classification Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone
Matthias Gruhne, Page 16 Fraunhofer Institut Integrierte Schaltungen Real Time Demonstration: Demo running on laptop (Pentium 500 MHz) Local database with 15,000 items (Rock / Pop genre) Acoustic transmission: mp3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> AudioID
Matthias Gruhne, Page 17 Fraunhofer Institut Integrierte Schaltungen Thanks for your Attention !