Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by.

Similar presentations


Presentation on theme: "Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by."— Presentation transcript:

1 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by machines as a matrix of digital picture elements, or pixels The appearance of an image depends on image type resolution

2 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20052 Types of images & Resolution bilevel (black & white) e.g. faxes grayscale color dot per inches (dpi) 600 x 600 – actual medium quality laser printer 1200 x 1200 – low cost phototypesetter 4800 x 4800 – high resolution phototypesetter

3 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20053 Bilevel images: CCITT fax standard fax: facsimile CCITT Comité Consultatif International Téléphonique et Télégraphique, it is part of the ITU International Telecommunication Union, one of the specialized agencies of the United Nations In the late 70s CCITT starts thinking about a standard for fax transmission 1980 CCITT Group 3 standard group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions

4 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20054 CCITT Group 3 - I It is the most common standard for fax transmission It is accepted worldwide, almost every fax machine supports this standard It uses compression algorithms for bilevel images

5 5 CCITT Group 3 - II Paper size: international A4 (not US letter) standard resolution 204x98 dpi (200x100) high resolution 204x196 dpi (200x200) 1728 bits/line 1188 lines/page bilevel image  1 bit/pixel image size: 1728x1188 bits at standard resolution  about 2 Mbit Transmission rate: 4.8 Kbit/s today is usually higher, 14.4 – 33.6 Kbit/s At 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms

6 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20056 Run-length coding Each scan line is composed by sequences of pixel of the same color Count the number of element of each run Example 3w 4b 9w 2b 2w 6b 5w 2b 5w...

7 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20057 G3 1D Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes run lengths using a predefined Huffman code In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length

8 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20058 G3 1D 1000 011 10100 11 0111 0010... predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents

9 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20059 G3 1D As one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengths As shorter runs occur more frequently that longer runs, we code each run length in an additive form there is a terminating and makeup codeword Lengths form 0 to 63 are coded with a single terminating codeword Longer runs are coded with one or more makeup codewords and a terminating codeword Each line is terminated with a EOL symbol composed of eleven 0 and one 1

10 10 G3 2D Group 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate) Many images have a high degree of vertical coherence between consecutive lines changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line

11 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200511 G3 2D Nearby means within an interval of radius 3 pixels If there are changing elements in the current line without correspondents in the reference line  switch to horizontal mode (1D) On the opposite if the ref line has a run with no counterpart in the current line  special pass code

12 12 G3 2D current line reference line generated code vertical mode 0 horizontal mode pass code from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3 0001 vertical mode +2-2...

13 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200513 G3 2D Two dimensional coding is more prone to transmission errors In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword Here an error in the reference line is likely propagated in all the other lines For this reason there are 1 reference line for each k lines (i.e. k-1 are coded w.r.t. each ref line) standard resolution  k=2 high resolution  k=4

14 14 CCITT fax standard compression performances Standard resolution (~200x100 dpi) G3 1D  0.13 bits/pixel 57s. for A4 at 4.8 Kbps G3 2D (k=2)  0.11 bits/pixel 47s. for A4 at 4.8 Kbps High resolution (~200x200 dpi) G3 2D (k=4)  0.09 bits/pixel 74s. for A4 at 4.8 Kbps Compression is very good for office image where run lengths are long It would be very bad for bilevel natural images

15 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200515 Continuous-tone images: why lossless compression? lossy compression is often preferred to have remarkably more compressed images, with good quality However there are some situations in which using an approximation may not be adequate medical images historical documents images with legal relevance

16 16 Continuous-tone images: lossless compression GIF standard PNG standard JPEG-LS It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’ extimation of pixel value using quite simple context: effective and low cost solution www.hpl.hp.com/loco

17 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200517 GIF image format - I Adopted by CompuServe to minimize the time required to download images over a modem link The most widely used lossless image format until 1995 8-bit pixel description 256 color images, but it is possible to use a color map

18 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200518 GIF image format - II The color map can be specified for each image or can be omitted if specified, it is included as an header into image file, in uncompressed form color map is composed of 256 24-bit entries, that specify 256 RGB colors Compression scheme used is LZW Alphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information” code

19 19 GIF image format - III Even if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color map LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements

20 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200520 PNG image format - I Portable Network Graphics (pronounced “ping”) it uses gzip compression scheme through some improvements compression obtained is about 10-30% better than GIF By default it encodes the pixels in raster scan order, but some other methods are available it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixel average difference, the difference with the average of above and next pixel...

21 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200521 PNG image format - II It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF

22 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200522 Continuous-tone images: why lossy compression? Digital images are yet an approximation of the real analog phenomenon lossy techniques allow to obtain very good compression with a modest lost of details This is useful for storing and trasmitting images

23 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200523 Continuous-tone images: lossy compression JPEG JPEG2000 a new image coding system that uses state- of-the-art compression techniques based on wavelet technology file extension.jp2 With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images

24 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200524 JPEG format - I JPEG is a standard defined by the Joint Photographic Experts Group in 1992 It was conceived to transmit images at 64 Kbps It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard) With lossy mode it allows to obtain very good quality at about 1 bit/pixel Implementation complexity is reasonable

25 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200525 JPEG format - II It could be used with graylevel and color images Each channel of the color space (RGB, YUV...) is treated separately it allows progressive transmission (that is much better suited for WWW than raster transmission)

26 Raster vs. progressive transmission

27 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200527 JPEG Coder - I Binary Encoder DiscreteCosineTransform Quantization

28 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200528 JPEG Coder - II Image is divided in 8x8-pixel squares Preprocessing Apply Discrete Cosine Transform on each square Coefficient quantization Bit stream encoding

29 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200529 Preprocessing: color space transformation & downsampling from RGB into YUV The Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturation Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively 4:4:4 no downsampling 4:2:2 horizontal downsampling of a factor 2 4:2:0 both horizontal and vertical downsampling

30 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200530 Discrete Cosine Transform - I The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is used in JPEG because it is fast and quite easy to implement efficiently

31 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200531 Discrete Cosine Transform - II where the block is pixels (in JPEG, 8x8) A(i,j) is the value of pixel of position (i,j) is the DCT coefficient of position low values for corresponds to low vertical frequencies, low values for to low horizontal frequencies Generally higher frequencies have very low values

32 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200532 Discrete Cosine Transform - III DCT function basis each 8x8 square is reduced to 64 coefficient

33 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200533 Discrete Cosine Transform - IV Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square But finite precision quantization of the coefficients (always) Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible

34 34 Quantization - I The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor the factor for each component has been decided based on human sensitivity to changes at each frequency In practice the matrix of factor is usually

35 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200535 Quantization - II Next, all values are rounded to nearest integer This leads to a quite high number of 0s in the high frequency zone, as factors are bigger

36 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200536 Zig-zag scan Low frequency coefficients are transmitted before higher frequency coefficients This allows for progressive visualization of this 8x8 block

37 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200537 Raster vs. progressive transmission Raster transmission DCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on Progressive transmission first all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block

38 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200538 Binary coding DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value For this reason it is convenient to encode the difference from the previous value Tipically the bit stream is coded with Huffman It is possible to use arithmetic scheme, gaining some compression at cost of decoding speed Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream

39 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200539 JPEG Decoder Some values are lost! Binary Decoder Inverse DCT Dequantization Good quality, but reconstruction is not exact

40 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-200540 JPEG performances - I

41 41 JPEG performances - II OriginalQuality factor 75 Quality factor 20Quality factor 3


Download ppt "Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-20051 Introduction Much of the information is in form of images Images are handled by."

Similar presentations


Ads by Google