On Compression of Data Encrypted with Block Ciphers

On Compression of Data Encrypted with Block Ciphers
Demijan Klinc* Carmit Hazay† Ashish Jagmohan** Hugo Krawczyk** Tal Rabin** * Georgia Institute of Technology ** IBM T.J. Watson Research Labs † Weizmann Institute and IDC

Traditional Model Transmitting redundant data over insecure and bandwidth-constrained channel Traditionally, data first compressed and then encrypted key (k) source X C(X) EK(C(X)) compress encrypt encoder

Traditional Model What if encryptor and compressor are two entities with different goals? E.g., storage provider wants to compress data to minimize storage space but does not have access to the key Can we reverse the order of these steps?

Compression and Encryption in Reverse Order
Does not know k! key (k) source X Ek(X) C(Ek(X)) encrypt compress Can we encrypt first and only then compress without knowing the key?

For a fixed key, encryption scheme is a bijection, therefore the entropy is preserved It follows that it is theoretically possible to compress the source to the same level as before encryption In practice, encrypted data appears to be random Conventional compression techniques do not yield desirable results

Fully homomorphic encryption shows that one can compress optimally without decrypting Simply run the compression algorithm on the plaintext Fully homomorphic encryption supports addition and multiplication: E(m1), E(m2) → E(m1+m2) E(m1), E(m2) → E(m1∙m2) Stating differently: C, E(m) → E(C(m))

Outline Preliminaries Source Coding with Side Information
Compressing Stream Ciphers Compressing Block Ciphers Simulation results Impossibility Result

Private Key Encryption
Triple of algorithms: (Gen,Enc,Dec) Same key for encryption and decryption Security – CPA security (informally): It should be infeasible to distinguish an encryption of m from an encryption of m’

Private Key Encryption
Two categories: Stream ciphers Plaintext encrypted one symbol at a time, typically by summing it with a key (XOR operation for binary alphabets), e.g., one-time pad Block ciphers Encryption is accomplished by means of nonlinear mappings on input blocks of fixed length E.g., AES, DES

Binary Symmetric Channel
Communication model where each sent bit is flipped with probability p X Y 1-p p p 1 1 1-p Pr( Y = 0 | X = 0 ) = 1−p Pr( Y = 0 | X = 1) = p Pr( Y = 1 | X = 0 ) = p Pr( Y = 1 | X = 1 ) = 1−p Entropy is: H(p)= - (p log p +(1-p) log (1-p))

Source Coding with Side Information
X C(X) X compress decompress Y X,Y : random variables over a finite alphabet with a joint probability distribution PXY Goal: losslessly compress X with Y known only to the decoder

For sufficiently large block length, this can be done at rates arbitrarily close to H[X|Y] [SlepianWolf73] Non constructive theorem Practical coding schemes use constructions based on good linear error-correcting codes e.g. LDPC code [RichardsonUrbanke08]

Linear Error Correcting Codes
Communication is over a noisy channel Add redundancy to source to correct errors A linear code of length n and dimension r is a linear subspace of the vector space (F2)m Encoding: using generating matrix Decoding: using parity check matrix

Minimum distance: The weight of the lowest-weight nonzero codeword In order to correct i errors the minimum distance should be 2i+1

Cosets: Suppose that C is [m, r] linear code over F2 and that a is any vector in (F2)m Then the set a+C = {a+x | xC} is called a coset of C Every vector of (F2)m is in some coset of C Every coset contains exactly 2r vectors Two cosets are either disjoint or equal

Example: Assume Y known to encoder and decoder Ham(X,Y)≤1 source X C(X) X compress decompress Y

Let X=010, then Y{010, 011, 000, 110} Goal: encode XY using less than 3 bits How? Let e= XY, then e{000, 001, 010, 100} encoder sends index of coset in which e occurs

Let C={Y,Y} be a linear code with distance 3 that can fix one error The space is partitioned into 4 cosets: Coset 1 = {000,111} Coset 2 = {001, 110} Coset 3 = {010, 101} Coset 4 = {100, 011} 000 Recall: e{000, 001, 010, 100} 001 010 100 Each index requires 2 bits decoding: output Ye’ where e’ is the leader

Without Y the encoder cannot compute e! e= XY source X C(X) X compress decompress Y

Still possible: Encode coset in which X occurs Coset 1 = {000,111} Coset 2 = {001, 110} Coset 3 = {010, 101} Coset 4 = {100, 011} Each index requires 2 bits decoding: output e’ where the hamming distance of e’ and Y is smallest Slepian-Wolf codes over finite block lengths have nonzero error which implies that the decoder will sometimes fail

In practice: Fix p and determine the compression rate of a Slepian-Wolf code that satisfies the target error Pick Slepian-Wolf code and determine the maximum p for which target error is satisfied Need to know the source statistics!

Compression Stream Ciphers
This problem can be formulated as a Slepian- Wolf coding problem [JohnsonWagnerRamchandran04] The shared key k is cast as the decoder-only side-information The ciphertext is cast as a source key (k) source X Ek(X) C(Ek(X)) compress

Compression is achievable due to correlation between the key K and the ciphertext C=XK The joint distribution of the source and side- information can be determined from the statistics of the source key (k) source X Ek(X) C(Ek(X)) compress

key (k) source Joint decryption and decompression C(Ek(X)) X decoder The decoder knows k and source statistics Compression rate H(Ek(X)|K)=H(XK|K)=H(X) is asymptotically achievable

Efficiency Encoding: finding coset of Ek(X) can be done by multiplying Ek(X) with parity check matrix I.e., Ek(X)∙HT is the syndrome of Ek(X) Decoding: exhaustive search through the coset of Ek(X) Is improved using LDPC codes, decoding is polynomial in the block length

Security Compression that operates on top of one time pad does not compromise security of the encryption scheme Compressor does not know K

Compressing Block Ciphers
Widely used in practice The correlation between the key ciphertext is more complex Previous approach is not directly applicable Does data encrypted with block ciphers can be compressed without access to the key?

Electronic Code Book (ECB) Mode
The compression schemes that we present rely on the specifics of chaining operations The simplest mode of operation where each block is evaluated separately Compression in this mode is theoretically possible, is it also practical? X1 X2 Xn k block cipher k block cipher k block cipher … Ek(X1) Ek(X2) Ek(Xn)

Cipher Block Chaining (CBC) Mode
Correlation between Ek(Xi) and Xi+1 is easier to characterize and can be exploit for compression X1 X2 Xn IV … X1 X2 Xn k block cipher k block cipher k block cipher IV Ek(X1) Ek(X2) Ek(Xn)

Compressing Block Ciphers
Recalling that Xi+1= Ek(Xi)Xi+1 Ek(Xi) is cast as the source and Xi+1 is cast as the side information IV, Ek(X1)…Ek(Xn) compressor C(IV,) C(Ek(X1))…Ek(Xn) Last block is left uncompressed, while IV is compressed

Decoding Xn-1 Xn Ek(Xn) Ek(Xn-1) Slepian-Wolf decoder
decryption decryption C(Ek(Xn)) Ek(Xn-1) C(Ek(Xn-1)) Ek(Xn)

Compression Factor let {Cm,R,Dm,R} denote an order m Slepian- Wolf code with compression rate R Compressor Cm,R: {0,1}m → {0,1}mR Decompressor Dm,R: {0,1}mR x {0,1}m → {0,1}m compression factor:

Compression Results Irregular LDPC codes were used in our performance evaluation Table: Attainable compression rates for m = 128 bits Source Entropy Compression Rate Target Error P 0.1739 0.50 10-3 0.026 0.1301 10-4 0.018 0.3584 0.75 0.068 0.3032 0.054

Compression Results Irregular LDPC codes were used in our performance evaluation Table: Attainable compression rates for m = 1024 bits Source Entropy Compression Rate Target Error P 0.3195 0.50 10-3 0.058 0.2778 10-4 0.048 0.5710 0.75 0.134 0.5464 0.126

Recall -- ECB Mode … m1 m2 mn K block cipher K block cipher K
Ek(m1) Ek(m2) Ek(mn)

Can we construct a better strategy?
Notable Observations Exhaustive strategies are infeasible in most cases Except for very low-entropy plaintext distributions or compression ratios By truncating the ciphertext For example, consider plaintext distribution consisting of 1, bit values uniformly distributed One can compress the output of a 128-bit block cipher by truncating the 128-bit ciphertext to 40 bits Can we construct a better strategy?

There does not exist efficient (C,D) for ECB mode!
Impossibility Result There does not exist generic (C,D) for block ciphers unless (C,D) Either exhaustive or Computationally infeasible There does not exist efficient (C,D) for ECB mode!

The Public-Key Setting
Hybrid encryption Using public-key scheme to encrypt a symmetric key and then encrypt the data with this key El Gamal encryption Similar technique when using xor

Concluding Remarks Data encrypted with block ciphers are practically compressible, when chaining modes are employed Notable compression factors were demonstrated with binary memoryless sources Short block sizes limit the performance, but that could change in the future Generic compression is impossible

Future Work An interesting question refers to whether compression is possible without any preliminary knowledge on the data Can compression be achieved using algorithms that do not rely on the source statistics, i.e., universal algorithms The error: Can we consider less limited setting where the error is not independent?

Thank You!

On Compression of Data Encrypted with Block Ciphers

Similar presentations

Presentation on theme: "On Compression of Data Encrypted with Block Ciphers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On Compression of Data Encrypted with Block Ciphers

Similar presentations

Presentation on theme: "On Compression of Data Encrypted with Block Ciphers"— Presentation transcript:

Similar presentations

About project

Feedback