Audio Content Description with Wavelets Neural Nets and Diploma Thesis Stephan Rein Prof. Dr.-Ing. Thomas Sikora Prof. Dr. Martin Reisslein Dr. Nicolas Moreau
Overview Next Generation Internet Search Machine MPEG-7: Multimedia Content Description Why Wavelets? Statistical Analysis of Wavelet Coefficients Neural Nets for Audio Content Classification Results Summary
Next Generation Internet Search Machine identifiy classical movements So.1 iii Men 57 feature extraction similarity measure So.1 iv Men 57 So.1 iv
Moving Pictures Expert Group MPEG-1, 2, 4: Compression of Multimedia Data MPEG-7: Description of Multimedia Data Idea of Multimedia content description: key to completely novel and futuristic applications Content description tools Platform for Descriptive Data Encourage research on content description
6 Sonatas & Partitas for the Test Data Base: J. S. Bach 6 Sonatas & Partitas for the Solo Violin BWV 1001-1006 1934-36 1957 1952 1973 Current today, current in 100 years from now Recordings differ in time, frequency, quality and sound environments Polyphonic and non separable phenomena
Problem Short-Term Fourier Analysis: Trade-off between Time and Frequency (Heisenberg Uncertainty Principle) Short analysis window: high frequencies can be well located, but low frequency components can not be measured Long analysis window: low frequencies can be measured, but high frequencies can not be resolved in time coarser time resolution when? ? ? ?
Solution: Wavelet Time-Scale Approach lower scale high frequency convolution higher scale low frequency time(position) scale
Wavelet Mother Functions must satisfy admissibility conditions (Farge 1992). Decrease quickly towards 0 Zero mean Localized in time and frequency domain Family of shifts and dilations of must allow for signal reconstruction
Analysis of Wavelet Coefficients Gaussian Wavelet Envelope Descriptor Statistical Data Summarization Tools arithmetic mean geometric mean harmonic mean standard deviation variation mean absolute deviation median interquartile range range skewness Scale Frequency Measure Percentile Correlations
A novel Wavelet Dispersion Measure time a) scale b) rank d) c) e)
Performance Wavelet Disp. Measure Identify pieces of novel recording of Menuhin 1934 recordings employed by the search system user query: recording of Menuhin 1934
Neural Nets for Audio Classification training Perceptron Neural Net Backpropagation Net Probabilistic Radial Basis Net next slides answer user query Mil 75 Wavelet dispersion vectors target vectors Men34 Men57 Hei57 class 1 1 Neural Net class 2 example 2 vectors class 32 32 class x
Single Layer Perceptron Network net output net input 3 b transfer function training algorithm net error
Backpropagation Network Hidden layers and output layer Different transfer functions Gradient decent algorithm: learning rate minimum
Performance Perceptron Net Perfectly learned example recordings Was not able to generalize
Performance Radial Neural Net best performance with biorthogonal wavelet good performance with Morlet wavelet
Summary Analysis of Wavelets and Neural Nets for identification of classical movements Novel Wavelet Dispersion Measure Novel Methodology with 78 % success rate: a) biorthogonal Wavelet b) Dispersion Measure c) Radial Basis Neural Net Readily applicable for next generation Internet Search Machine
Publication & Contact Pending U.S. patent application: ask Prof. Martin Reisslein (reisslein@asu.edu) for details Diploma Thesis available at www.fulton.asu.edu/~mre S. Rein, M. Reisslein, T. Sikora (sikora@nue.tu-berlin.de), Audio Content Description with Wavelets and Neural Nets, 4-page version submitted to ICASSP’ 04, available on request: rein@cs.tu-berlin.de
Thank You for your help Prof. Dr.-Ing. Thomas Sikora Prof. Dr. Martin Reisslein Dr. Nicolas Moreau Birgit Boldin Dr.-Ing. Frank Fitzek