Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.

Audio Processing Mitch Parry

Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible light. Video samples at ~30 frames per second Amplitude red yel. green blue One color RB G www.jiscdigitalmedia.ac.uk http://www.chemistryland.com

Similar to Image Processing? Each pixel contains R, G, and B corresponding to three cones that perceive color. A frame is a picture from “one instant of time” Amplitude red yel. green blue One color RB G www.jiscdigitalmedia.ac.uk http://www.chemistryland.com

Resource!

Chapter 2: Sound Waves Sound Waves and Harmonic Motion Properties of Sine Waves Resonance as Harmonic Frequencies Nonsinusoidal Waves

Chapter 5: Digitization Sampling and Aliasing Quantization Dynamic Range Nyquist and Aliasing

Spectral Domain One color in one pixel in one frame of video – Hundredths of a second One audio frame – Hundredths of a second Power 400nm700nm blue green yel. red 20 Hz20kHz Power

Audacity: Plot Spectrum

Audio Mixing Free Multitrack Downloads http://www.cambridge-mt.com/ms-mtk.htm

“Stop Messing with Me” by Sven Bornemark Steinberg Grand Piano Acoustic Guitar Bass Drums Overhead Electric Guitar Electric Guitar Ambience Kick Drum Vocal

Audacity: Mixing Tutorial Mixing Tutorial

Simple Unmixing Left: Drums + 0.5 * Vocal Right: Guitar + 0.5 * Vocal Remove vocals: – Karaoke track = Left – Right = Drums – Guitar

Audacity: Let’s try it. Real example: Norah Jones

Removing Hiss Hiss_*.wav

Removing Clicks

Short-Time Fourier Transform Spectrogram Each frame contributes one column of spectrogram FFT

Audacity: Let’s try it.

Changing Speed Downsample – Shorten the clip – Increase its pitch

Changing Tempo Change length of clip without changing pitch Split into frames, repeat or remove frames

Changing Pitch Change pitch without changing length – Increase pitch: Repeat frames and downsample – Decrease pitch: Remove frames and upsample

Beats Amplitude Envelope – Filterbank – Full-wave rectify – Low-pass filter – Differentiate/Half-wave rectify Scheirer. JASA 1998. Tzanetakis. AMTA 2001 IPEM Toolbox

Beats Beat Envelope – Filterbank (Discrete Wavelet Transform) – Full-wave rectify – Low-pass filter – Differentiate/Half-wave rectify – Low-pass filter – Sum Peak detection Scheirer. JASA 1998. Tzanetakis. AMTA 2001 IPEM Toolbox

Audacity: Beat Detection Drum track

Audacity Audacity Manual More Effects and Analyzers

Musical Features Visualizing Structure Rhythm/ Tempo Melody Timbre

Visualizing Structure Compute any features Choose similarity metric Visualize self-similarity Foote & Cooper. ICMC 2001.

Visualizing Structure High-level segmentation based on novelty score

Tempo Diagonal Sums Autocorrelation Foote & Cooper. ICMC 2001 Beat Spectrum

Identifying Identical Audio Segmentation – 0.37 second frames – Overlapping by 31/32 FFT – Band Division – Energy computed for 33 non-overlapping logarithmically spaced frequency bands (300-2000Hz) – E(n,m) = energy of band m of frame n. Haitsma & Kalker. ISMIR 2002.

Identifying Identical Audio 2 32-bit sub-fingerprint represents increase/decrease between neighboring frequency bands and frame F(n,m) = [E(n-1, m+1) + E(n,m)] -[E(n-1,m) + E(n,m+1)] > 0 n-1 n mm+1… … -+ -+ 257 33 Time (Frames) Frequency Bands Haitsma & Kalker. ISMIR 2002.

Identifying Identical Audio 3 Similarity is the bit error rate (BER) between two fingerprints Approximately 3 seconds of audio 256 X 32-bit = 1KB per fingerprint. Haitsma & Kalker. ISMIR 2002.

Timbre Similarity Timbre = “Color” of sound Timbre = Type of instrument, voice Similarity decreases in order: – Same recording – Same artist – Same genre Useful for finding different live performances of the same song by an artist Aucouturier & Klapuri. ISMIR 2002.

Timbre Similarity 2 Timbre Features – Low-order MFCCs account for timbre. – Hi-order MFCCs account for pitch. – Only use first 8 MFCCs (out of 13). Feature Extraction: – Segment signal into 0.05 sec. non-overlapping frames – Compute first 8 MFCCs for each frame. – Yields ~3600 features (28,800 scalars) per song Aucouturier & Klapuri. ISMIR 2002.

Timbre Similarity 3 Gaussian Mixture Model (GMM) – Approximates the distribution of features as the sum of M Gaussian distributions – M = 3 Learn timbre model for each song Timbre similarity between song A and song B is the likelihood that the model for song A generated the features in song B. Aucouturier & Klapuri. ISMIR 2002.

Timbre Similarity Examples http://www.csl.sony.fr/~jj/Timbre/timbre.html

Audio Textures Generate new audio given examples Analysis – Segment into frames – Extract MFCCs – Similarity Window Weighted Cosine Distance – Transition probabilities proportional to exponential similarity – Segment into sub-clips according to novelty score Lu et. al. ICASSP 2002

References Aucouturier, J-J., and Klapuri, A. (2002). "Music Similarity Measures: What's the Use?". Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 157-163). PDFPDF Foote, J. and Cooper, M. (2001). "Visualizing Musical Structure and Rhythm via Self- Similarity." Proc. of Int'l Computer Music Conference, 27, (pp. 419-422). PDFPDF Haitsma, J. and Kalker, T. (2002). "A Highly Robust Audio Fingerprinting System." Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 107-115). PDFPDF Lu, L., Li, S., Liu, W., AND Zhang, H. (2002). “Audio Textures.” Proc. of IEEE Int’l Conference on Acoustics, Speech and Signal Processing. PDFPDF Paulus, J. & Klapuri, A. (2002). Measuring the Similarity of Rhythmic Patterns. Proc. of the International Conference on Music Information Retrieval, 3, (pp. 150-156). Paris: IRCAM Centre Pompidou. PDFPDF Scheirer, E. (1998). "Tempo and Beat Analysis of Acoustic Musical Signals.” Journal of the Acoustical Society of America, 103(1), 588-601. PDFPDF Tzanetakis, G., Essl, G., & Cook, P. (2001). Audio Analysis using the DiscreteWavelet Transform. Proc. of WSES International Conference on Acoustics and Music: Theory and Applications. PDFPDF Tzanetakis, G., Ermolinskiy, A. and Cook, P. (2002). "Pitch Histograms in Audio and Symbolic Music Information Retrieval." Proc. of Int'l Conference on Music Information Retrieval, 3, (pp. 31-38). PDFPDF

Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.

Similar presentations

Presentation on theme: "Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.

Similar presentations

Presentation on theme: "Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible."— Presentation transcript:

Similar presentations

About project

Feedback