CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007
CS335 Principles of Multimedia Systems Digital Audio Audio comes from different sources: –Speech. –Sounds of instruments, Music. –Sounds of all other kinds (the sound of wind, train and ocean). Audio needs new methods for coding and processing. Audio processing is a key task in multimedia systems –Audio coding (MPEG audio, mp3, AAC and others) –Authoring and representation (composition) –Analysis and searching (retrieval and database) –3D sound, etc. We will focus on basic audio processing, MPEG audio and related topics.
CS335 Principles of Multimedia Systems Audio Processing Audio authoring Audio file formats: Waveform files and MIDI. MIDI: Musical Instrument Digital Interface. Instead of storing the waveform samples, MIDI file has a sequence of commands to control an audio device to generate a specified note with given properties.
CS335 Principles of Multimedia Systems Audio Processing Using Matlab To load a wave in Windows: audat = wavread(‘filename.wav’) ; Or, directly open the file and load a stream of “words” (2 bytes) or bytes depending on the wav format. To play a sound, use sound(audat, samplingrate). To display the spectrogram, use specgram. Audio analysis are done in frames of 20ms – 40ms long.
CS335 Principles of Multimedia Systems Frequency Domain Analysis Fourier transform can be used to decompose any signal into summation of sinusoidal waves. In Matlab, we can use fft (Fast Fourier Transform) for frequency domain analysis. The time domain waveform The frequency Domain components. Base frequency ¼ 1/T T
CS335 Principles of Multimedia Systems MP3 and Others MPEG (Motion Picture Expert Group) and ISO (International Standard Organization) have published several standards about digital audio coding. –MPEG-1 Layer 1,2 and 3 (MP3) –MPEG2 AAC –MPEG4 AAC and TwinVQ Other standards –Dolby AC3 They have been widely used in consumer electronics, digital audio broadcasting, DVD and movies etc.
CS335 Principles of Multimedia Systems Perceptual Coding in MPEG Encoder FFT Masking Threshold Dynamic bit allocation MUX Encoder Dynamic bit allocation Bit stream audio
CS335 Principles of Multimedia Systems Simultaneous Masking Hz dB Masking threshold Masker Sound pressure level Threshold in quiet A strong audio component can mask its nearby frequency components.
CS335 Principles of Multimedia Systems Masking and Quantization Hz dB Masker Sound pressure level 20 Critical band A Neighbor critical band Minimum masking threshold for band A. Signal To mask ratio m-bit quantizer SNR m+1-bit quantizer SNR A critical band defines the “resolution” of the hearing at some frequency location.
CS335 Principles of Multimedia Systems Temporal Masking time Amplitude Pre-masking curve Post-masking curve
CS335 Principles of Multimedia Systems MPEG Perceptual Model A matlab demo.
CS335 Principles of Multimedia Systems MPEG Audio Layer 1 MPEG (1 and 2) audio allows sampling rate at , 32, 22.05, 24 and 16KHz. MPEG filters the input audio into 32 bands. Filtering And downsampling Audio 384 samples 12 samples Perceptual coder Normalize By scale factor
CS335 Principles of Multimedia Systems MPEG Audio Layer 2 Layer 2 is very similar to Layer 1, but groups samples together in coding. It also improves the scaling factor quantization and also groups 3 audio samples together in bit assignment. Filtering And downsampling Audio 3x384 samples 36 samples Perceptual coder Normalize By scale factor
CS335 Principles of Multimedia Systems Overlapped Transform and MDCT 2N Window 1 Window 2 Window 3 Window 4 In overlapped transform, 2N samples are transformed to N elements Reconstructed result. In reverse Transform:
CS335 Principles of Multimedia Systems Some Matlab Codes The program compares DCT and MDCT in audio processing. Code is available on the course website as a tar ball mdct_and_dct.tar.
CS335 Principles of Multimedia Systems MP3 MP3 is another layer built on top of MPEG audio layer 2. MP3 further does MDCT on each band and tries to encode the MDCT coefficients. MP3 then uses Huffman coding to further compress the bit streams losslessly.
CS335 Principles of Multimedia Systems File Format HeaderCRC Bit Allocation Scale factors Subband DataHeaderCRC Bit Allocation Scale factors Subband Data Mpeg audio puts header in each of the frame, so that they can be decoded separately. Frame 1 Frame 2
CS335 Principles of Multimedia Systems Other Audio Coding Standards MPEG 2 and MPEG 4 ACC (advanced audio coding) –Not backward compatible –Use MDCT without bandpass filtering Dolby AC3 –MDCT based codec –Similar to MPEG ACC but uses a different quantization and coding scheme –A de-facto standard for DVD and Digital audio in Movie.
CS335 Principles of Multimedia Systems Realtime Audio Systems Audio Processing Unit Audio input circular queue Write pointer Read pointer Audio I/O Process Audio output circular queue