A review of audio fingerprinting (Cano et al. 2005) mainly based on A review of audio fingerprinting (Cano et al. 2005) My name is Denis Lebel and I will talk about Interactive Rendering of Suggestive Contours with Temporal Coherence. presented by Denis Lebel
Presentation Outline Introduction Desired Properties Usage Modes Applications Fingerprinting Framework Front-end Fingerprint Models Similarity Measures and Searching Methods Hypothesis Testing Conclusion References To start off this presentation, we will look at an example of suggestive contours vs true contours and clarify the terminology I will be using throughout my presentation. I will then give a brief overview of suggestive contours, so you get a better idea of what they actually are. Then, we’ll move the motivation behind the work of this paper and cover the various contributions by the author. I will end this presentation by giving you future challenges for suggestive contours. If time permits it, you will also have a chance to watch a live demonstration of suggestive contours… One more thing: feel free to ask questions if you don’t understand and I’ll do my best to answer or will refer you to a more adequate source of information. MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Introduction Idea Audio Fingerprint Fingerprinting System An attempt to mimic human music recognition abilities Audio Fingerprint Unique identifier of an audio signal Content-based signature that summarizes an audio recording Uses relevant (perceptual) acoustics characteristics of signal Fingerprinting System Database of known fingerprints Query system Unidentified song Match Return Information Search Collection Of all known songs NO Match Keep Looking… Figure 1: General idea of a fingerprinting system MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Desired Properties Accuracy Reliability Robustness Granularity Function of correct, missed, and wrong identifications Reliability Correct identification method Robustness Ability to accurately identify an item (no matter how compressed or distorted it is) Granularity Ability to identify a signal from a short excerpt Security Vulnerability to cracking MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Desired Properties Versatility Scalability Complexity Fragility Ability to identify a signal regardless of audio format Scalability Performance with very large databases Complexity Computational costs of fingerprint extraction, size of fingerprint, search complexity, comparison complexity, etc. Fragility Integrity verification (detection of changes in content) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Desired Properties Properties are interrelated and dependent of system purpose Generally speaking, fingerprint should be: A perceptual digest of the recording Invariant to distortions Compact Easily computable MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Usage Modes Identification Integrity Verification Content identification of an audio signal Integrity Verification Detection of data alteration Figure 2: Content-based audio identification framework. (Cano et al. 2005) Figure 3: Integrity verification framework. (Cano et al. 2005) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Usage Modes Watermarking support Audio fingerprints can be used to derive secrets keys from the audio content Content-based Audio Retrieval and Processing Extraction of audio features (i.e., low-level and high-level descriptors) Fingerprints can be used to retrieve similar content (i.e., query-by-example scheme) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Applications Audio Content Monitoring and Tracking At the distributor end At the transmission channel At the consumer end Added-Value Services Content information describing audio excerpt (e.g., tempo) Meta-data describing musical work (e.g., composer, year, …) Other information (e.g., album cover) Integrity Verification Systems Audio fingerprints can be used to ensure user’s audio files have the best quality available MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Presentation Outline Introduction Desired Properties Usage Modes Applications Fingerprinting Framework Front-end Fingerprint Models Similarity Measures and Searching Methods Hypothesis Testing Conclusion References To start off this presentation, we will look at an example of suggestive contours vs true contours and clarify the terminology I will be using throughout my presentation. I will then give a brief overview of suggestive contours, so you get a better idea of what they actually are. Then, we’ll move the motivation behind the work of this paper and cover the various contributions by the author. I will end this presentation by giving you future challenges for suggestive contours. If time permits it, you will also have a chance to watch a live demonstration of suggestive contours… One more thing: feel free to ask questions if you don’t understand and I’ll do my best to answer or will refer you to a more adequate source of information. MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Figure 4: Content-based audio identification framework. (Cano et al. 2005) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Fingerprint Extraction: Front-End Figure 5: Fingerprint Extraction Framework. (Cano et al. 2005) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Fingerprint Extraction: Fingerprint Modeling Idea: Reduce redundancies Reduce size of fingerprint Similarity measure and search method depends on the model chosen Several techniques can be used (for a summary: Cano et al. 2005) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Figure 4: Content-based audio identification framework. (Cano et al. 2005) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Fingerprint Extraction: Similarity Measures Related to type of model chosen Correlation metric is common Example: Euclidean distance Figure 6: a) Fingerprint block of original clip b) fingerprint block of a compressed version. c) Difference (error) (Haitsma et al. 2002) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Fingerprint Extraction: Searching Methods Using brute-force search is inappropriate for large database Idea: Optimizing the search Some possible optimizations Pre-computing distances offline Filtering unlikely candidates with a cheap similarity measure Candidate pruning Others… MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Fingerprinting Framework Fingerprint Extraction: Hypothesis Testing Idea: Whether the query is present in the repository A threshold must be used and it depends on: Fingerprint model Similarity of fingerprints in the database Database size Discriminative information of the query The larger the database, the higher the probability of wrong match False Acceptance Rate (FAR) False Rejected Rate (FRR) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Conclusion Most existing systems fall more or less into this generic framework Large databases still represent a challenge (scalability, complexity, accuracy…) P2P systems might be the future (e.g., Music2Share) MUMT-611: Music Information Acquisition, Preservation, and Retrieval
References Cano, P., E. Batlle, T. Kalker, and J. Haitsma. 2005. A review of audio fingerprinting. The Journal of VLSI Signal Processing 41: 271–84. Haitsma, J., and T. Kalker. 2002. A highly robust audio fingerprinting system. Proceedings of the International Symposium on Music Information Retrieval. 107–15. Kalker, T., D. Epema, P. Hartel, R. Langendijk, and M. Van Steen. 2004. Music2Share: Copyright-compliant music sharing in P2P systems. Proceedings of the IEEE 92 (6): 961–70. MUMT-611: Music Information Acquisition, Preservation, and Retrieval
Links http://www.shazam.com/ http://www.relatable.com/ http://www.audiblemagic.com/ http://www.gracenote.com/ MUMT-611: Music Information Acquisition, Preservation, and Retrieval