Download presentation
Presentation is loading. Please wait.
Published byJoel Chase Modified over 9 years ago
1
On Compression of Data Encrypted with Block Ciphers
Demijan Klinc* Carmit Hazay† Ashish Jagmohan** Hugo Krawczyk** Tal Rabin** * Georgia Institute of Technology ** IBM T.J. Watson Research Labs † Weizmann Institute and IDC
2
Traditional Model Transmitting redundant data over insecure and bandwidth-constrained channel Traditionally, data first compressed and then encrypted key (k) source X C(X) EK(C(X)) compress encrypt encoder
3
Traditional Model What if encryptor and compressor are two entities with different goals? E.g., storage provider wants to compress data to minimize storage space but does not have access to the key Can we reverse the order of these steps?
4
Compression and Encryption in Reverse Order
Does not know k! key (k) source X Ek(X) C(Ek(X)) encrypt compress Can we encrypt first and only then compress without knowing the key?
5
Compression and Encryption in Reverse Order
For a fixed key, encryption scheme is a bijection, therefore the entropy is preserved It follows that it is theoretically possible to compress the source to the same level as before encryption In practice, encrypted data appears to be random Conventional compression techniques do not yield desirable results
6
Compression and Encryption in Reverse Order
Fully homomorphic encryption shows that one can compress optimally without decrypting Simply run the compression algorithm on the plaintext Fully homomorphic encryption supports addition and multiplication: E(m1), E(m2) → E(m1+m2) E(m1), E(m2) → E(m1∙m2) Stating differently: C, E(m) → E(C(m))
7
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
8
Private Key Encryption
Triple of algorithms: (Gen,Enc,Dec) Same key for encryption and decryption Security – CPA security (informally): It should be infeasible to distinguish an encryption of m from an encryption of m’
9
Private Key Encryption
Two categories: Stream ciphers Plaintext encrypted one symbol at a time, typically by summing it with a key (XOR operation for binary alphabets), e.g., one-time pad Block ciphers Encryption is accomplished by means of nonlinear mappings on input blocks of fixed length E.g., AES, DES
10
Binary Symmetric Channel
Communication model where each sent bit is flipped with probability p X Y 1-p p p 1 1 1-p Pr( Y = 0 | X = 0 ) = 1−p Pr( Y = 0 | X = 1) = p Pr( Y = 1 | X = 0 ) = p Pr( Y = 1 | X = 1 ) = 1−p Entropy is: H(p)= - (p log p +(1-p) log (1-p))
11
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
12
Source Coding with Side Information
X C(X) X compress decompress Y X,Y : random variables over a finite alphabet with a joint probability distribution PXY Goal: losslessly compress X with Y known only to the decoder
13
Source Coding with Side Information
For sufficiently large block length, this can be done at rates arbitrarily close to H[X|Y] [SlepianWolf73] Non constructive theorem Practical coding schemes use constructions based on good linear error-correcting codes e.g. LDPC code [RichardsonUrbanke08]
14
Linear Error Correcting Codes
Communication is over a noisy channel Add redundancy to source to correct errors A linear code of length n and dimension r is a linear subspace of the vector space (F2)m Encoding: using generating matrix Decoding: using parity check matrix
15
Linear Error Correcting Codes
Minimum distance: The weight of the lowest-weight nonzero codeword In order to correct i errors the minimum distance should be 2i+1
16
Linear Error Correcting Codes
Cosets: Suppose that C is [m, r] linear code over F2 and that a is any vector in (F2)m Then the set a+C = {a+x | xC} is called a coset of C Every vector of (F2)m is in some coset of C Every coset contains exactly 2r vectors Two cosets are either disjoint or equal
17
Source Coding with Side Information
Example: Assume Y known to encoder and decoder Ham(X,Y)≤1 source X C(X) X compress decompress Y
18
Source Coding with Side Information
Let X=010, then Y{010, 011, 000, 110} Goal: encode XY using less than 3 bits How? Let e= XY, then e{000, 001, 010, 100} encoder sends index of coset in which e occurs
19
Source Coding with Side Information
Let C={Y,Y} be a linear code with distance 3 that can fix one error The space is partitioned into 4 cosets: Coset 1 = {000,111} Coset 2 = {001, 110} Coset 3 = {010, 101} Coset 4 = {100, 011} 000 Recall: e{000, 001, 010, 100} 001 010 100 Each index requires 2 bits decoding: output Ye’ where e’ is the leader
20
Source Coding with Side Information
Without Y the encoder cannot compute e! e= XY source X C(X) X compress decompress Y
21
Source Coding with Side Information
Still possible: Encode coset in which X occurs Coset 1 = {000,111} Coset 2 = {001, 110} Coset 3 = {010, 101} Coset 4 = {100, 011} Each index requires 2 bits decoding: output e’ where the hamming distance of e’ and Y is smallest Slepian-Wolf codes over finite block lengths have nonzero error which implies that the decoder will sometimes fail
22
Source Coding with Side Information
In practice: Fix p and determine the compression rate of a Slepian-Wolf code that satisfies the target error Pick Slepian-Wolf code and determine the maximum p for which target error is satisfied Need to know the source statistics!
23
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
24
Compression Stream Ciphers
This problem can be formulated as a Slepian- Wolf coding problem [JohnsonWagnerRamchandran04] The shared key k is cast as the decoder-only side-information The ciphertext is cast as a source key (k) source X Ek(X) C(Ek(X)) compress
25
Compression Stream Ciphers
Compression is achievable due to correlation between the key K and the ciphertext C=XK The joint distribution of the source and side- information can be determined from the statistics of the source key (k) source X Ek(X) C(Ek(X)) compress
26
Compression Stream Ciphers
key (k) source Joint decryption and decompression C(Ek(X)) X decoder The decoder knows k and source statistics Compression rate H(Ek(X)|K)=H(XK|K)=H(X) is asymptotically achievable
27
Efficiency Encoding: finding coset of Ek(X) can be done by multiplying Ek(X) with parity check matrix I.e., Ek(X)∙HT is the syndrome of Ek(X) Decoding: exhaustive search through the coset of Ek(X) Is improved using LDPC codes, decoding is polynomial in the block length
28
Security Compression that operates on top of one time pad does not compromise security of the encryption scheme Compressor does not know K
29
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
30
Compressing Block Ciphers
Widely used in practice The correlation between the key ciphertext is more complex Previous approach is not directly applicable Does data encrypted with block ciphers can be compressed without access to the key?
31
Electronic Code Book (ECB) Mode
The compression schemes that we present rely on the specifics of chaining operations The simplest mode of operation where each block is evaluated separately Compression in this mode is theoretically possible, is it also practical? X1 X2 Xn k block cipher k block cipher k block cipher … Ek(X1) Ek(X2) Ek(Xn)
32
Cipher Block Chaining (CBC) Mode
Correlation between Ek(Xi) and Xi+1 is easier to characterize and can be exploit for compression X1 X2 Xn IV … X1 X2 Xn k block cipher k block cipher k block cipher IV Ek(X1) Ek(X2) Ek(Xn)
33
Compressing Block Ciphers
Recalling that Xi+1= Ek(Xi)Xi+1 Ek(Xi) is cast as the source and Xi+1 is cast as the side information IV, Ek(X1)…Ek(Xn) compressor C(IV,) C(Ek(X1))…Ek(Xn) Last block is left uncompressed, while IV is compressed
34
Decoding Xn-1 Xn Ek(Xn) Ek(Xn-1) Slepian-Wolf decoder
decryption decryption C(Ek(Xn)) Ek(Xn-1) C(Ek(Xn-1)) Ek(Xn)
35
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
36
Compression Factor let {Cm,R,Dm,R} denote an order m Slepian- Wolf code with compression rate R Compressor Cm,R: {0,1}m → {0,1}mR Decompressor Dm,R: {0,1}mR x {0,1}m → {0,1}m compression factor:
37
Compression Results Irregular LDPC codes were used in our performance evaluation Table: Attainable compression rates for m = 128 bits Source Entropy Compression Rate Target Error P 0.1739 0.50 10-3 0.026 0.1301 10-4 0.018 0.3584 0.75 0.068 0.3032 0.054
38
Compression Results Irregular LDPC codes were used in our performance evaluation Table: Attainable compression rates for m = 1024 bits Source Entropy Compression Rate Target Error P 0.3195 0.50 10-3 0.058 0.2778 10-4 0.048 0.5710 0.75 0.134 0.5464 0.126
39
Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result
40
Recall -- ECB Mode … m1 m2 mn K block cipher K block cipher K
Ek(m1) Ek(m2) Ek(mn)
41
Can we construct a better strategy?
Notable Observations Exhaustive strategies are infeasible in most cases Except for very low-entropy plaintext distributions or compression ratios By truncating the ciphertext For example, consider plaintext distribution consisting of 1, bit values uniformly distributed One can compress the output of a 128-bit block cipher by truncating the 128-bit ciphertext to 40 bits Can we construct a better strategy?
42
There does not exist efficient (C,D) for ECB mode!
Impossibility Result There does not exist generic (C,D) for block ciphers unless (C,D) Either exhaustive or Computationally infeasible There does not exist efficient (C,D) for ECB mode!
43
The Public-Key Setting
Hybrid encryption Using public-key scheme to encrypt a symmetric key and then encrypt the data with this key El Gamal encryption Similar technique when using xor
44
Concluding Remarks Data encrypted with block ciphers are practically compressible, when chaining modes are employed Notable compression factors were demonstrated with binary memoryless sources Short block sizes limit the performance, but that could change in the future Generic compression is impossible
45
Future Work An interesting question refers to whether compression is possible without any preliminary knowledge on the data Can compression be achieved using algorithms that do not rely on the source statistics, i.e., universal algorithms The error: Can we consider less limited setting where the error is not independent?
46
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.