Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Compression. How Is This Possible? Entire King James Bible : 4,834,757 bytes Zip Archive Containing It: 1,339,843 bytes.

Similar presentations


Presentation on theme: "Data Compression. How Is This Possible? Entire King James Bible : 4,834,757 bytes Zip Archive Containing It: 1,339,843 bytes."— Presentation transcript:

1 Data Compression

2 How Is This Possible? Entire King James Bible : 4,834,757 bytes Zip Archive Containing It: 1,339,843 bytes

3 More Questions Why does this file: Compress different than:

4 Behind The Scenes Compression used for: – ~50% of web traffic – Most audio/video files – Sometimes for every file on a drive

5 Trick 1: Describe the contents of this file in as few words as possible…

6 Trick 1: Run Length Encoding : – Describe repetition as: (How many times)What to repeat – 2000000A

7 RLE Examples ABABABABABAB 6AB AAABBBBBAAACC 3A,5B,3A,2C 111110010101010101 (5)1,(1)0,(6)01

8 RLE Fail ABCDEF 1A,1B,1C,1D,1E

9 RLE Fail 2 This file doesn't just have A's: 80A,1newline,80A,1newline,80A,1newline…

10 Trick 2 Same As Earlier: – Describe patterns with instructions to go back x and copy y characters ABCDEFG-b7c7 "Write down ABCDEFG, then go back 7 characters and copy the next 7 characters to the end of what you have"

11 Same As Earlier ABCDEFG-b7c7

12 Same As Earlier ABCDEFG-b7c7 ABCDEFG

13 Same As Earlier ABCDEFG-b7c7 ABCDEFG

14 Same As Earlier ABCDEFG-b7c7 ABCDEFGA

15 Same As Earlier ABCDEFG-b7c7 ABCDEFGAB

16 Same As Earlier ABCDEFG-b7c7 ABCDEFGABC

17 Same As Earlier ABCDEFG-b7c7 ABCDEFGABCD

18 Same As Earlier ABCDEFG-b7c7 ABCDEFGABCDE

19 Same As Earlier ABCDEFG-b7c7 ABCDEFGABCDEF

20 Same As Earlier ABCDEFG-b7c7ABCDEFG

21 Same As Earlier ABCDEFG-b7c7 ABCDEFGABCDEFG

22 Same As Earlier AB-b2c6

23 Same As Earlier AB-b2c6 AB

24 Same As Earlier AB-b2c6 AB

25 Same As Earlier AB-b2c6 ABA

26 Same As Earlier AB-b2c6AB

27 Same As Earlier AB-b2c6 ABABA

28 Same As Earlier AB-b2c6 ABABAB

29 Same As Earlier AB-b2c6 ABABABA

30 Same As Earlier AB-b2c6 ABABABAB

31 Same As Earlier AB-b2c6 ABABABAB

32 Same As Earlier AB-b2c2-C-b3c4

33 Same As Earlier AB-b2c2-C-b2c5 AB

34 Same As Earlier AB-b2c2-C-b2c5 AB

35 Same As Earlier AB-b2c2-C-b2c5AB

36 Same As Earlier AB-b2c2-C-b2c5AB

37 Same As Earlier AB-b2c2-C-b2c5 ABABC

38 Same As Earlier AB-b2c2-C-b2c5 ABABC

39 Same As Earlier AB-b2c2-C-b2c5 ABABCB

40 Same As Earlier AB-b2c2-C-b2c5 ABABCBC

41 Same As Earlier AB-b2c2-C-b2c5 ABABCBCB

42 Same As Earlier AB-b2c2-C-b2c5 ABABCBCBC

43 Same As Earlier AB-b2c2-C-b2c5 ABABCBCBCB

44 Same As Earlier AB-b2c2-C-b2c5 ABABCBCBCB

45 Shorter Symbol Trick Normally text is 8-bit ASCII – 8bits = 256 possibilities

46 Shorter Symbol Trick If messages is just A's and B's we are wasting space: A B 01000001 01000010 Why not: 0 1

47 Shorter Symbol Trick Shorter Symbol Trick: – Use minimum number of bits to represent different symbols in message – More common symbols get shorter representation

48 More Common This message: AAAABAAC Three symbols, need 2 bits – Could do  AAAABAAC 00 00 00 00 01 00 00 11 (16 bits)

49 More Common But A is more common: AAAABAAC So maybe we can use a shorter code for it  AAAABAAC 0 0 0 0 10 0 0 11 (10 bits)

50 Why Does it Work No code is a prefix for another – 0 : it is an A – 1 : keep going 010110010 ABCAAB

51 Why Does it Work A BAD code  – 0 : is it an A? is it the start of a D?010 010010 ABDA

52 Building a Code CS160 Reader… – Huffman Code Building

53 Lossy Compression Lossless compression : – Can recreate original perfectly – Algorithms: Run length encoding, same as earlier, shorter symbol – Examples: zip files, www traffic

54 Lossy Compression Lossy compression – Original can NOT be recreated perfectly

55 My Kids - 1278Kb

56 Every Other Line/Column Removed

57 Remaining pixels packed back down : 320Kb

58 Blown back up vs original OriginalCompressed

59 Only keep every 4th line/column : 81 Kb

60 Real JPEG Image broken into blocks of pixels

61 Real JPEG Each block processed seperately

62 Real JPEG Block processed, to look for compressible patterns

63 Real JPEG Patterns can more or less recreate image

64 JPEG Samples @ 200% No compress Low compress Med compress High compress


Download ppt "Data Compression. How Is This Possible? Entire King James Bible : 4,834,757 bytes Zip Archive Containing It: 1,339,843 bytes."

Similar presentations


Ads by Google