Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Concepts of Audio, Video and Compression

Similar presentations


Presentation on theme: "Basic Concepts of Audio, Video and Compression"— Presentation transcript:

1 Basic Concepts of Audio, Video and Compression
Anupriya Sharma

2 CS 598kn - Advanced Topics in Multimedia Systems
Content Introduction on Multimedia Audio encoding Video encoding Compression 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

3 Video on Demand Video On Demand: (a) ADSL vs. (b) cable
ADSL (Asymmetric Digital Subscriber Loop), phone line, dedicated line, but lower bandwidt x Mb/s Cable: shared line, higher aggregate bandwidth Video On Demand: (a) ADSL vs. (b) cable 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

4 Multimedia Files A movie may consist of several files
11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

5 CS 598kn - Advanced Topics in Multimedia Systems
Multimedia Issues Analog to digital Problem: need to be acceptable by ears or eyes Jitter Require high data rate Large storage Compression Require real-time playback Scheduling Quality of service Resource reservation 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

6 CS 598kn - Advanced Topics in Multimedia Systems
Audio Sound is a continuous wave that travels through the air. The wave is made up of pressure differences. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

7 CS 598kn - Advanced Topics in Multimedia Systems
How do we hear sound? Sound is detected by measuring the pressure level at a point When an acoustic signal reaches the otter- ear (Pinna), the generated wave will be transformed into energy and filtered through the middle-ear. The inner-ear (Cochlea) transforms the energy into nerve activity. In similar way, when an acoustic wave strikes a microphone, the microphone generates an electrical signal, representing the sound amplitude as a function of time. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

8 CS 598kn - Advanced Topics in Multimedia Systems
Basic Sound Concepts Frequency represents the number of periods in a second (measured in hertz, cycles/second) Human hearing frequency range: 20 Hz - 20 kHz (audio), voice is about 500 Hz to 2 kHz. Amplitude of a sound is the measure of displacement of the air pressure wave from its mean. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

9 Computer Representation of Audio
Speech is analog in nature and it is converted to digital form by an analog-to-digital converter (ADC). A transducer converts pressure to voltage levels. Convert analog signal into a digital stream by discrete sampling Discretization both in time and amplitude (quantization) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

10 CS 598kn - Advanced Topics in Multimedia Systems
Audio Encoding (1) Audio Waves Converted to Digital electrical voltage input sample voltage levels at intervals to get a vector of values: (0, 0.2, 0.5, 1.1, 1.5, 2.3, 2.5, 3.1, 3.0, 2.4,...) A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers (samples). The ADC process is governed by four factors: sampling rate, quantization, linearity, and conversion speed. binary number as output 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

11 CS 598kn - Advanced Topics in Multimedia Systems
Audio Encoding (2) Sampling Rate: rate at which a continuous wave is sampled (measured in Hertz) Examples: CD standard Hz, Telephone quality Hz The audio industry uses kHz, kHz, kHz, and 44.1 kHz as the standard sampling frequencies. These frequencies are supported by most sound cards. Question: How often do you need to sample a signal to avoid losing information? 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

12 CS 598kn - Advanced Topics in Multimedia Systems
Audio Encoding (3) Answer: It depends on how fast the signal is changing,. Real Answer: twice per cycle (this follows from Nyquist sampling theorem) Nyquist Sampling Theorem: If a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal. Example: CD's actual sampling frequency is Hz, but because of Nyquist's Theorem, we need to sample the signal twice, therefore the sampling frequency is 44100Hz. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

13 CS 598kn - Advanced Topics in Multimedia Systems
Audio Encoding (4) The best-known technique for voice digitization is Pulse-Code Modulation (PCM). PCM is based on the sampling theorem. If voice data are limited to 4000 Hz, then PCM samples 8000 samples/second which is sufficient for the input voice signal. PCM provides analog samples which must be converted to digital representation. Each of these analog samples must be assigned a binary code. Each sample is approximated by being quantized as explained above. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

14 CS 598kn - Advanced Topics in Multimedia Systems
Audio Encoding (5) Quantization (sample precision): the resolution of a sample value. Samples are typically stored as raw numbers (linear PCM format) or as logarithms (u-law or A-law) Quantization depends on the number of bits used measuring the height of the waveform Example: 16-bit CD quality quantization results in over values Audio Formats are described by the sample rate and quantization Voice quality: 8-bit quantization, 8000 Hz u-law mono (8kBytes/s) 22 kHz 8-bit linear mono (22 kBytes/second) and stereo (44 kBytes/s) CD quality 16-bit quantization, Hz linear stereo (176.4 kBytes/s = samples x 16 bits/sample x 2 (two channels)/8000) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

15 CS 598kn - Advanced Topics in Multimedia Systems
Audio Formats Audio formats are characterized by four parameters: Sample Rate: sampling frequency per second Encoding: Audio data representation U-law – CCITT G.711 standard for voice data in telephone companies (USA, Canada, Japan) A-law – CCITT G.711 standard for voice data in telephony elsewhere (Europe,…) A-law and u-law are sampled at 8000 samples/second with precision of 12 bits and compressed to 8 bit samples. Linear PCM – uncompressed audio where samples are proportional to audio signal voltage Precision: number of bits used to store each audio sample. Channel: Multiple channels of audio may be interleaved at sample boundaries 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

16 Basic Concepts of Video
Visual Representation – Video Encoding Objective is to offer the viewer a sense of presence in the scene and of participation in the events portrayed Transmission Video signals are transmitted to a receiver through a single television channel Digitalization Analog to digital conversion, sampling of gray/color level - Quantization 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

17 Visual Representation (1)
Video Signals are generated at the output of a camera by scanning a two-dimensional moving scene and converting them into a one–dimensional electric signal A moving scene is a collection of individual images, where each scanned picture generates a frame of the picture Scanning starts at the top-left corner of the picutre and ends at the bottom-right. Aspect Ratio – ratio of picture width and height Pixel – discrete picture element – digitized light point in a frame Vertical Frame Resolution – number of pixels in picture height Horizontal Frame Resolution – number of pixels in picture width Spatial Resolution – Vertical x Horizontal Resolution Temporal Resolution – Rapid succession of different frames 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

18 Visual Representation (2)
Continuity of Motion Minimum 15 frames per second NTSC – Hz repetition rate, 30 frames/sec PAL – 25 HZ, 25 frames/sec HDTV – Hz, 60 frames/sec Flicker Effect is a periodic fluctuation of brightness perception. To avoid this effect, we need at least 50 refresh cycles per second. This is done by display devices using display refresh buffer Picture Scanning Progressive scanning – single scanning of a picture Interlaced scanning – the frame is formed by scanning two pictures at different times, with the lines interleaved, such that two consecutive lines of a frame belong to alternative field (scan odd and even lines separately) NTSC TV uses interlaced scanning to trade-off vertical with temporal resolution HDTV and computer displays are high spatio-temporal videos and use progressive scanning. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

19 Video Color Encoding (3)
During the scanning, a camera creates three signals: RGB (red, greed and blue) signals. For compatibility with black-and-white video and because of the fact that the three color signals are highly correlated, a new set of signals of different space are generated. The new color systems correspond to the standards such as NTCS, PAL, SECAM. For transmission of the visual signal we use three signals: 1 luminance (brightness- basic signal) and 2 chrominance (color signals). YUV Signal (Y = 0.3R+0.59G+0.11B, U = (B-Y) x 0.493, V=(R-Y) x 0.877) Coding ratio between components Y,U,V 4:2:2 In NTSC signal the luminance and chrominance signals are interleaved; The goal at the receiver is : (1) separate luminance from chrominance components, and (2) avoid interference between them (cross-color, cross luminance) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

20 Basic Concepts of Image Formats
Important Parameters for Captured Image Formats: Spatial Resolution (pixels x pixels) Color encoding (quantization level of a pixel: e.g., 8-bit, 24-bit) Examples: `SunVideo' Video Digitizer Board allows pictures of 320 by 240 pixels with 8-bit gray-scale or color resolution. For a precise demonstration of image basic concepts try the program xv which displays images and allows to show, edit and manipulate the image characteristics. Important Parameters for Stored Image Formats: Images are stored as a 2D array of values where each value represents the data associated with a pixel in the image (bitmap or a color image). The stored images can use flexible formats such as the RIFF (Resource Interchange File Format). RIFF includes formats such as bitmats, vector-representations, animations, audio and video. Currently, most used image storage formats are GIF (Graphics Interchange Format), XBM (X11 Bitmap), Postscript, JPEG (see compression chapter), TIFF (Tagged Image File Format), PBM (Portable Bitmap), BMP (Bitmap). 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

21 CS 598kn - Advanced Topics in Multimedia Systems
Digital Video The process of digitizing analog video involves Filtering, sampling, quantization Filtering is employed to avoid the aliasing artifacts of the follow-up sampling process Filtered luminance and chrominance signals are sampled to generate a discrete time signal Digitization means sampling gray/color levels in the frame at MxN array of points The minimum rate at which each component (YUV) can be sampled is the Nyquist rate and corresponds to twice the signal bandwidth Once the points are sampled , they are quantized into pixels; i.e., the sampled value is mapped into integer. The quantization level depends on how many bits do we allocate to present the resulting integer (e.g., 8 bits per pixel, or 24 bits per pixel) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

22 Digital Transmission Bandwidth
Bandwidth requirements for Images: Raw Image Transmission Bandwidth:= size of the image:= spatial resolution x pixel resolution; Compressed Image Transmission Bandwidth: = depends on the compression scheme(e.g., JPEG) and content of the image; Symbolic Image Transmission bandwidth:= size of the instructions and variables carrying graphics primitives and attributes. Bandwidth Requirements for Video: Uncompressed Video Bandwidth:= image size x frame rate; Compressed Video Bandwidth:= depends on the compression scheme(e.g., Motion JPEG, MPEG) and content of the video (scene changes). Example: Assume the following video characteristics - 720,000 pixels per image(frame), 8 bits per pixel quantization, and 60 frames per second frame rate. The Video Bandwidth:= 720,000 pixels per frame x 8 bits per pixel x 60 fps which results in HDTV data rate of 43,200,000 bytes per second = Mbps When we use MPEG compression, the bandwidth goes to 34 Mbps wit some loss in image/video quality. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

23 Compression Classification
Compression is important due to limited bandwidth All compression systems require two algorithms: Encoding at the source Decoding at the destination Entropy Coding Lossless encoding Used regardless of media’s specific characteristics Data taken as a simple digital sequence Decompression process regenerates data completely Examples: Run-length coding, Huffman coding, Arithmetic coding Source Coding Lossy encoding Takes into account the semantics of data Degree of compression depends on data content Examples: DPCM, Delta Modulation Hybrid Coding Combined entropy coding with source coding Examples: JPEG, MPEG, H.263, … 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

24 CS 598kn - Advanced Topics in Multimedia Systems
Compression (1) Uncompressed Picture Picture Preparation Picture Processing Adaptive Feedback Quantization Entropy Encoding 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

25 CS 598kn - Advanced Topics in Multimedia Systems
Compression (2) Picture Preparation Analog-to-digital conversion Generation of appropriate digital representation Image division into 8x8 blocks Fix the number of bits per pixel Picture Processing (Compression Algorithm) Transformation from time to frequency domain (e.g., Discrete Cosine Transformation – DCT) Motion vector computation for motion video Quantization Mapping real numbers to integers (reduction in precision) Entropy Coding Compress a sequential digital stream without loss 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

26 Compression (3) (Entropy Encoding)
Simple lossless compression algorithm is the Run-length Coding, where multiple occurring bytes are grouped together as Number-OccuranceSpecial-CharacterCompressed-Byte. For example, 'AAAAAABBBBBDDDDDAAAAAAAA' can be encoded as `6!A5!B5!D8!A', where `!' is the special character. The compression ratio is 50% (12/24*100%). Fixed-length Coding – each symbol gets allocated the same number of bits independent of frequency (L ≥ log2(N)), N – number of symbols Statistical Encoding – each symbol has a probability of frequency (e.g., P(A) = 0.16, P(B) = 0.51, P(C) = 0.33) The theoretical minimum average number of bits per codeword is known as Entropy (H). According to Shannon H = ΣPilog2Pi bits per codeword 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

27 CS 598kn - Advanced Topics in Multimedia Systems
Huffman Coding P(ACB) = 1 1 P(AC) = 0.49 P(B) = 0.51 1 Symbol Code A C B P(A) = 0.16 P (C) = 0.33 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

28 JPEG: Joint Photographic Experts Group
6 major steps to compress an image: (1) block preparation, (2) DCT (Discrete Cosine Transform) transformation, (3) quantization, (4) further compression via differential compression, (5) zig-zag scanning and run-length coding compression, (6) Huffman coding compression Quantization step represents the lossy step where we loose data in a non-invertible fashion. Differential compression means that we consider similar blocks in the image and encode only the first block and for the rest of the similar blocks, we encode only differences between the previous block and current block. The hope is that the difference is a much smaller value, hence we need less bits to represent it. Also often the differences end up close to 0 and can be very well compressed by the next compression - run-length coding. Huffman compression is a lossless statistical encoding algorithm which takes into account frequency of occurrence (not each byte has the same weight) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

29 JPEG: Block Preparation
Joint Photographic experts Group for compressing continuous-tone still pictures MPEG is just the JPEG encoding of each frame separetely RGB input data and block preparation Eyes responds to luminance (Y) more than chrominance (I and Q) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

30 CS 598kn - Advanced Topics in Multimedia Systems
Image Processing After image preparation we have Uncompressed image samples grouped into data units of 8x8 pixels Precision 8bits/pixel Values are in the range of [0,255] Steps in image processing Pixel values are shifted into the range [-128,127] with center 0 DCT maps values from time to frequency domain S(u,v)= ¼ C(u)C(v)ΣΣS(x,y)cos(2x+1)uπ/16 cos(2y+1)vπ/16 S(0,0) – lowest frequency in both directions – DC coefficient – determines the fundamental color of the block S(0,1), …, S(7,7) – AC coefficients 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

31 CS 598kn - Advanced Topics in Multimedia Systems
Quantization Goal of quantization is to throw out bits Consider example: = 45 (6bits) ; we can truncate this string to 4 bits: = 11 or to 3 bits 1012 = 5 (original value 40) or 1102 = 6 (original value 48) Uniform quantization is achieved by dividing DCT coefficient value S(u,v) by N and round the result. JPEG uses quantization tables 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

32 CS 598kn - Advanced Topics in Multimedia Systems
Entropy Encoding After image processing we have quantized DC and AC coefficients Initial step of entropy encoding is to map 8x8 plane into 64 element vector using Zig-Zag DC Coefficient Processing – use Difference coding AC Coefficient Processing – apply Run-length coding Apply Huffman Coding on DC and AC coefficients 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

33 MPEG: Motion Picture Experts Group
MPEG-1 was designed for video recorder-quality output (320x240 for NTSC) using the bit rate of 1.2 Mpbs. MPEG-2 is for broadcast quality video into 4-6Mbps (it fits into the NTSC or PAL broadcast channel) MPEG takes advantage of temporal and spatial redundancy. Temporal redundancy means that two neighboring frames are similar, almost identical. MPEG-2 output consists of three different kinds of frames that have to be processed: I (Intracoded) frames - self-contained JPEG-encoded still pictures P (Predictive) frames - Block-by-block difference with the last frame B (Bidirectional) frames - Differences with the last and next frames 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

34 CS 598kn - Advanced Topics in Multimedia Systems
The MPEG Standard I frames - self-contained, hence they are used for fast forward and rewind operations in VOD applications P frames code interfame differences. The algorithm searches for similar macroblocks in the current and previous frame, and if they are only slightly different, it encodes only the difference and the motion vector in order to find the position of the macroblock for decoding. B frames - encoded if three frames are available at once: the past one, the current one and the future one. Similar to P frame, the algorithm takes a macroblock in the current frame and looks for similar macroblocks in the past and future frames. MPEG is suitable for stored video because it is an asymmetric lossy compression. The encoding takes long time, but the decoding is very fast. The frames are delivered at the receiver in the dependency order rather than display order, hence we need buffering to reorder the frames. 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

35 CS 598kn - Advanced Topics in Multimedia Systems
MPEG/Video I-Frames I frames (intra-coded images) MPEG uses JPEG compression algorithm for I-frame encoding I-frames use 8x8 blocks defined within a macro-block. On these blocks, DCT is performed. Quantization is done by a constant value for all DCT coefficients, it means no quantization tables exist as it is the case in JPEG 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

36 CS 598kn - Advanced Topics in Multimedia Systems
MPEG/Video P-Frames P-frames (predictive coded frames) requires previous I-frame and/or previous P-frame for encoding and decoding Use motion estimation method at the encoder Define match window within a given search window. Match window corresponds to macro-block, search window corresponds to an arbitrary window size depending how far away we are willing to look. Matching methods: SSD correlation uses SSD = Σi(xi-yi)2 SDA correlation uses SAD = Σi|xi-yi| 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

37 CS 598kn - Advanced Topics in Multimedia Systems
MPEG/Video B-Frame B-frames (bi-directionally predictive-coded frames) require information of the previous and following I and/or P-frame - [ ½ x ( + ) ] = DCT + Quant. + RLE Motion Vectors (two) Huffman Coding 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

38 CS 598kn - Advanced Topics in Multimedia Systems
MPEG/Audio Encoding Precision is 16 bits Sampling frequency is 32 KHz, 44.1 KHz, 48 KHz 3 compression methods exist: Layer 1, Layer 2, Layer 3 (MP3) Decoder is accepting layer 2 and layer 1 32 kbps-320kpbs, target 64 kbps Decoder is accepting layer 1 32kbps-384kbps, target 128 kbps 32kbps-448kbps, target 192 kbps Layer 3 Layer 2 Layer 1 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems

39 MPEG/System Data Stream
Video is interleaved with audio. Audio consists of three layers Video consists of 6 layers (1) sequence layer (2) group of pictures layer (Video Param, Bitstream Param, …) (3) picture layer (Time code, GOP Param, …) (4) slice layer (Type, Buffer Param, Encode Param, ….) (5) macro-block layer (Qscale, …) (6) block layer (Type, Motion Vector, Qscale,…) 11/9/2018 CS 598kn - Advanced Topics in Multimedia Systems


Download ppt "Basic Concepts of Audio, Video and Compression"

Similar presentations


Ads by Google