CGMB324: Multimedia System Design

Slides:



Advertisements
Similar presentations
MPEG & MP3 -supplement - from “ Graham McAllister - Nortel Networks ”
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Dale & Lewis Chapter 3 Data Representation Analog and digital information The real world is continuous and finite, data on computers are finite  need.
Digital Audio Compression
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
I Power Higher Computing Multimedia technology Audio.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 7 End-to-End Data
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
Spatial and Temporal Data Mining
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
{ Lossy Compression William Dayton Nick Trojanowski.
Digital Audio What do we mean by “digital”? How do we produce, process, and playback? Why is physics important? What are the limitations and possibilities?
Lecture 10 Data Compression.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
CMPT 365 Multimedia Systems
Media Representations - Audio
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 9 This presentation © 2004, MacAvon Media Productions Sound.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs Dhatchaini Rajendran Student ID: Date :
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
UNIT V. Linear Predictive coding With the advent of inexpensive digital signal processing circuits, the source simply analyzing the audio waveform to.
By :- Ishank Ranjan Akash Gupta. Audio & Audio File Formats Audio is an electrical or other representation of sound. An audio file format is a file format.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
MP3 and MP4 Audio By: Krunal Tailor
Data Compression.
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Digital Communications Chapter 13. Source Coding
Multimedia: Digitised Sound Data
Data Compression.
Data Compression CS 147 Minh Nguyen.
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

CGMB324: Multimedia System Design Chapter 8: Audio Compression

Objectives Upon completing this chapter, you should be able to: understand some of the common audio compression methods understand the basic concept of MPEG audio compression apply audio compression in a multimedia system

Simple Audio Compression Methods Motivation: Traditional lossless compression methods (Huffman, LZW, etc.) usually don't work well on audio compression

Simple Audio Compression Methods Existing Lossy methods: Silence Compression - detect the "silence", similar to run-length coding Apple has proprietary scheme called ACE / MACE. Lossy scheme that tries to predict where wave will go in next sample. About 2:1 compression.

Simple Audio Compression Methods Linear Predictive Coding (LPC)  fits signal to speech model and then transmits parameters of model  sounds like a computer talking, 2.4 kbits/sec. Code Excited Linear Predictor (CELP)  does LPC, but also transmits error term  audio conferencing quality at 4.8 kbits/sec.

Simple Audio Compression Methods Adaptive Differential Pulse Code Modulation (ADPCM) e.g., 16 or 32 Kbits/sec. Encodes the difference between two or more consecutive signals; The difference is then quantized (approximated) hence the loss Adapts at quantization so fewer bits are used when the value is smaller.

Psychoacoustics Human hearing and voice Frequency range is about 20 Hz to 20 KHz, most sensitive at 2 to 4 KHz. Dynamic range (quietest to loudest) is about 96 dB Normal voice range is about 500 Hz to 2 KHz Low frequencies are vowels (louder than consonants, but low frequency) and bass High frequencies are consonants

MPEG Audio Compression Some facts VCD: Max 1.5 Mbps. Usually 1.374 Mbps for audio and video. About 1.15 Mbps for video and 0.224 Mbps (224 Kbps) for audio. The audio here is MPEG-1 Layer 2 audio (*.mp2). This is for the VCD Standard. Uncompressed CD audio is 44,100 samples/sec, 16 bits/sample, 2 channels > 1.4 Mbits/sec Compression factor ranging from 2.7 to 24.

MPEG Audio Compression With a compression rate of 6:1 (16 bit stereo sampled at 48 KHz is reduced to 256 kbits/sec) and optimal listening conditions, expert listeners could not distinguish between coded and original audio clips. MPEG audio supports sampling frequencies of 32, 44.1 and 48 KHz. Newer technology like mp3, supports sampling frequencies as low as 11 KHz but is usually restricted to 16 bit and not 8 bits.

MPEG Audio Compression Supports one or two audio channels in one of the four modes: Monophonic -- single audio channel Dual-monophonic -- two independent channels, e.g., English and French Stereo -- for stereo channels that share bits, but not using Joint-stereo coding Joint-stereo -- the encoder encodes to a mid channel, which is just one full channel by itself that is averaged out from the two original channels, and a side channel which contains all of the stereo separation. Instead of encoding two full channels like with normal stereo, the encoder only has to encode one channel and part of another. Sometimes there is too much stereo separation for JS to reproduce, if such a situation occurs, the encoder will switch back to normal stereo. This doesn't happen very often though, on most songs, JS is used on about 95% of the frames

MPEG Audio Compression Steps in algorithm: Use convolution filters to divide the audio signal (e.g., 48 KHz sound) into 32 frequency subbands --> subband filtering. Determine amount of masking for each band caused by nearby band using a psychoacoustic model. The psychoacoustic model is based on many studies of human perception. These studies have shown that the average human does not hear all frequencies the same. Effects due to different sounds in the environment and limitations of the human sensory system lead to facts that can be used to cut out unnecessary data in an audio signal Masking is the phenomenon where a strong signal "covers" the sound of another signal such that the softer one cannot be heard by the human ear.

MPEG Audio Compression For example, a jet engine noise can drown out music easily If the power in a band is below the masking threshold, don't encode it. Otherwise, determine number of bits needed to represent the coefficient (constant number) Format the bitstream

MPEG Audio Compression

MPEG Audio Compression MPEG Layers MPEG defines 3 layers for audio. Basic model is the same, but codec complexity increases with each layer. Divides data into frames, each of them contains 384 samples, 12 samples from each of the 32 filtered subbands as shown in Figure 1.

Figure 1

MPEG Audio Compression Layer 1: DCT type filter with one frame and equal frequency spread per band. Psychoacoustic model only uses frequency masking. the simplest and is best suited for bit rates above 128 kbits/sec per channel. For example, Philips' Digital Compact Cassette (DCC)[5] uses Layer I compression at 192 kbits/s per channel.

MPEG Audio Compression Layer 2: Use three frames in filter (before, current, next, a total of 1152 samples). This models a little bit of the temporal masking. Temporal masking is the characteristic of the auditory system where sounds are hidden due to maskers before or even after that time. The effect of masking after a strong sound is called post-masking, and can be in effect up to 200 ms. The pre-masking, where a sound actually is masked by something which appears after it, is relatively short and may last up to 20 ms. Also, temporal masking is a defence mechanism of the ear that is activated to protect its delicate structures from loud sounds. When exposed to a loud sound, the human ear will contract slightly, temporarily reducing the perceived volume of sounds that follow.

MPEG Audio Compression This reflex manoeuvre, sometimes called "blinking" (by analogy to the eye), is meant to protect the delicate structures of the ear from potentially damaging sonic power. However, it also means that relatively loud sounds in an audio signal, such as a loud trumpet's note, will tend to overpower other sounds that occur just before and just after it, a phenomenon known as temporal masking. has an intermediate complexity and is targeted for bit rates around 128 kbits/s per channel. Possible applications for this layer include the coding of audio for Digital Audio Broadcasting (DAB®)[6] , for the storage of synchronized video-and-audio sequences on CD-ROM, and the full motion extension of CD-interactive, Video CD.

MPEG Audio Compression Layer 3 (MP3): Better critical band filter is used (non-equal frequencies), psychoacoustic model includes temporal masking effects (hiding), takes into account stereo redundancy, and uses Huffman coder. the most complex but offers the best audio quality, particularly for bit rates around 64 kbits/s per channel. This layer is well suited for audio transmission over ISDN (Integrated Services Digital Network).

MPEG Audio Compression Stereo Redundancy Coding: Intensity stereo coding -- at upper-frequency subbands, encode summed signals (together) instead of independent signals from left and right channels. achieves a saving in bitrate by replacing the left and the right signal by a single representing signal plus directional information is psychoacoustically justified in the higher frequency range since the human auditory system is insensitive to the signal phase at frequencies above approximately 2kHz a lossy coding method, primarily useful at low bitrates Middle/Side (MS) stereo coding -- encode middle (sum of left and right) and side (difference of left and right) channels. whenever a signal is concentrated in the middle of the stereo image, MS stereo can achieve a significant saving in bitrate Useful for high bitrates

How Much Should We Compress Audio? When applying audio compression, we must make a crucial decision as to how much we need it to be compressed. This will affect the audio file size itself (like an mp3 music file) or the overall size of a video file (like Divx 5.0.5 combined with mp3 audio). Certain things need to be considered when deciding on this matter. Generally, the idea is to preserve as much quality as possible, no matter what the purpose of the audio is.

Things To Consider Let us first talk about compressing just an audio file, like mp3. Assuming we ‘rip’ a WAV file from a CD and then compress it to mp3, the amount of compression is pretty much decided at 128 kb/s, 44 KHz, Stereo by default. However, this has recently changed with improved bandwidth, storage space, portable players and the need for higher fidelity. It is not uncommon for regular mp3 listeners to compress their CD tracks to 160 kb/s or 192 kb/s.

Things To Consider Some even go as high as 224 kb/s or prefer to use a variable bitrate which changes between 96 kb/s and 224 kb/s – using a lower datarate when there is less information to encode in the waveform and a higher datarate when there is more information. One way to go about it would be to see how much space and memory you have to spare, or can afford. Then, think about what the audio file is going to be used for? Does the music even sound good enough to warrant a high bitrate. Sometimes, a mere instrumental with little variance can be effectively compressed to just 64 kb/s, Mono.

Things To Consider Also take note of the actual space savings. If reducing the compression ratio saves you some 500 KB on a 700 MB CD, maybe it will be better to use the higher quality audio file because the benefit it gives to the user (listening pleasure) is worth more than your meager 500 KB saving. As for applying audio compression when it is part of a video stream, you must realize that the priority here goes to the video. Most of the file size (maybe 85%-95%) should be allocated to the video stream so that the picture quality is higher – this is what matters most to those who view video.

Things To Consider But audio can’t be neglected altogether. It must be compressed to as small as possible without actually jeopardizing its ability to convey what is in the video. For example, an MPEG-1 VCD by default uses 44.1 KHz, Stereo audio. If you are converting this to an MPEG-4 (or Divx) video file, you can easily reduce the audio to 22.05 KHz, Stereo mp3, with no perceptible difference – especially if what you are encoding is a movie with dialogue and many parts that are silent. Sometimes, the movie is so old, just using mono is enough. This is not the case with music videos, though. There, keeping the audio at 44.1 KHz might be essential. In these, you may have to allocate about 15%-20% for the audio stream.