Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw
Let’s agree on names… A ‘fingerprint’ is a vector that represents a given audio clip. It lives in a database with a lot of other fingerprints. A ‘confirmation fingerprint’ is a second fingerprint used to confirm a match. A ‘trace’ is generated from audio every 186 ms. It’s computed exactly the same way as a fingerprint.
64 floats / frame In Database? Confirmed? 6 sec Analyze a Stream Design of the Funnel sec of distorted Song A 6 sec of Song B 6 sec of Song A Find 64 good projections of 6 seconds of audio If 1, declare match δ2δ2 δ1δ1 good projection Good projections maximize δ 2 / δ 1
Feature Extraction Feature Extraction (186 ms)
De-Equalization BeforeAfter De-equalize by flattening the log spectrum.
De-Equalization Details Goal: Remove slow variation in frequency space
Perceptual Thresholding Remove coefficients that are below a perceptual threshold to lower unwanted variance. … inaudible to human … audible to human
Project to 64 Floats
Bitvector yields 50x Speedup
Server Internet Client... Feature Extraction Audio stream Lookup Audio stream identity Example Architecture Optional Pruning
Client: Resources Computing traces takes approx 10% CPU on 750 MHz P3. However we can get speedup over the current DCT, since we’re only modifying the first 6 coefficients: O(Nlog(N)) → O(6N). Total data loaded by client is 2.1MB.
Client Side Options What can be done on the client side to off- load the server lookup? Three ideas (in addition to only querying untagged music, and adding ID3 tags when found): 1. Leverage Zipf’s law (if it holds!) 2. Reduce rate at which traces are sent 3. Prune traces on the client
Client Side Pruning – Local Lookup Having a database of fingerprints for e.g. the top 10,000 songs would significantly reduce server load, but we don’t know by how much. Also requires updates (e.g. weekly?) log(# times played) log(rank) Zipf’s Law
Client Options, cont. Can reduce sampling by factor of 2 (from 186 to 372 ms) at some (likely small) loss in accuracy. This would halve both client CPU and server load.
Client Side Pruning – Margin Trees Using a tree built from first 24 components: No overpopulating, but flip 5 most error- prone bits in each trace Gets a factor 2 reduction in throughput at 0.5% increase in false neg. for very noisy data Number of nodes in tree (for 254,885 fingerprints) was found to be 1,531,508 Requires updates (e.g. weekly?)
A note on the code Upper bound: 22,000 lines of C++. File- and stream-based versions use the same libraries.