3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E.

3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E.
ECE578 Cryptography 3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E. Spring 2010 © , Richard A. Stanley

Summary of last class… Both symmetric and asymmetric crypto have their uses in communications Symmetric keys can be purely random, but asymmetric keys are mathematically related Symmetric crypto is much faster than asymmetric, which leads to combining the types in practical applications Spring 2010 © , Richard A. Stanley

Kerckhoffs’ Principle
Secrecy must reside solely in the key It is assumed that the attacker knows the complete details of the cryptographic algorithm and implementation If this be true, then the demands on the key are increased dramatically, as it is carrying the bulk of the security burden Spring 2010 © , Richard A. Stanley

Stream Encryption Spring 2010 © , Richard A. Stanley

One-Time Pad or Vernam Cipher
The one-time pad, which is a provably secure cryptosystem, was developed by Gilbert Vernam in 1918. The message is represented as a binary string (a sequence of 0’s and 1’s using a coding mechanism such as ASCII) The key is a truly random sequence of 0’s and 1’s of the same length as the message The encryption is done by adding the key to the message modulo 2, bit by bit. This process is often called exclusive or, and is denoted by XOR. The symbol  is used. a b c = a  b 1 Spring 2010 © , Richard A. Stanley

One-Time Pad or Vernam Cipher
Example: Let the message be IF; then its ASCII code is ( ). Let the key be ( ) The ciphertext can be found XORing message and key bits Encryption: plaintext key ciphertext Decryption: ciphertext plaintext Spring 2010 © , Richard A. Stanley

Why Is A One-Time Pad Provably Secure?
Or, how can we prove it is unbreakable? The security depends on the randomness of the key. It is hard to define randomness In cryptographic context, we seek two fundamental properties in a binary random key sequence: Unpredictability: Independent of the number of the bits of a sequence observed, the probability of guessing the next bit is not better than ½. Therefore, the probability of a certain bit being 1 or 0 is exactly equal to ½. Balanced (Equal Distribution): The number of 1’s and 0’s should be equal. Spring 2010 © , Richard A. Stanley

Mathematical Proof Probability of a key bit being 1 or 0 is exactly equal to ½. The plaintext bits are not balanced. Let the probability of 0 be x; then the probability of 1 turns out to be 1-x. Let us calculate the probability of ciphertext bits. mi prob. ki prob. ci prob. x ½ ½ x ½ ½ x x ½ (1-x) ½ (1-x) We find the probability of a ciphertext bit being 1 or 0 is equal to (½)x + (½)(1-x) = ½. Therefore, the ciphertext looks like a random sequence. Spring 2010 © , Richard A. Stanley

A Practical One-Time Pad
A satellite produces and broadcasts several random sequences of bit at a rate fast enough such that no computer can store more than a very small fraction of them. Alice & Bob use asymmetric crypto to agree on a method of sampling bits from these random sequences. They use these bits to form a key for one-time pad. Eve, in theory, can break the asymmetric crypto they used, even though doing so is difficult. But by the time she breaks it, the random bits Alice & Bob collected have disappeared and Eve can not decrypt the message since she hasn’t the resources to store all the random bits that have been broadcast. Spring 2010 © , Richard A. Stanley

Stream Ciphers Symmetric-key ciphers
Encrypt individual characters at a time, Faster and less complex in hardware, They are desirable in some applications in which buffering is limited bits must be individually processed as they are received. Transmission errors are highly probable. Vast amount of theoretical knowledge. Various design principles. Widely being used at present, will probably be used in the future. Spring 2010 © , Richard A. Stanley

Stream Ciphers (con’t.)
Basic Idea comes from One-Time-Pad cipher, Encryption : mi : plain-text bits. ki : key (key-stream ) bits ci : cipher-text bits. Decryption : Provably Secure. Drawbacks : Key-stream must be as long as plain-text. Key distribution & Management difficult. Solution: Stream Ciphers (in which key-stream is generated in pseudo-random fashion from relatively short secret key) Spring 2010 © , Richard A. Stanley

Stream Cipher Model Spring 2010 © , Richard A. Stanley

Stream Ciphers Spring 2010 © , Richard A. Stanley

Cryptographically Secure PRNGs
Every CSPRNG should satisfy the "next-bit test" Given the first k bits of a random sequence, there is no polynomial-time algorithm that can predict the (k+1)th bit with probability of success higher than 50% Every CSPRNG should withstand 'state compromise extensions’ If part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation If there is an entropy input while running, it should be infeasible to use knowledge of the input's state to predict future conditions of the CSPRNG state. Spring 2010 © , Richard A. Stanley

Polynomial Time Definition: When the execution time of a computation, m(n), is no more than a polynomial function of the problem size, n. More formally, m(n) = O(nk) where k is a constant [NIST] An algorithm is said to be solvable in polynomial time if the number of steps required to complete the algorithm for a given input is O(nk) for some nonnegative integer k, where n is the complexity of the input. Polynomial-time algorithms are said to be “fast.” Most familiar mathematical operations such as addition, subtraction, multiplication, and division, as well as computing square roots, powers, and logarithms, can be performed in polynomial time. Computing the digits of most interesting mathematical constants, including π and e, can also be done in polynomial time. [Wolfram Mathworld] Spring 2010 © , Richard A. Stanley

CSPRNGs and PRNGs Any CSPRNG will satisfy the requirements of a PRNG
The converse is not true Why not? Spring 2010 © , Richard A. Stanley

Random Keys It is mathematically impossible to create purely random outputs from deterministic processes, e.g., computer programs Obtaining random numbers is not easy Radioactive decay Shot noise ...etc. But: key is critical to security (Kerckhoff) So…how do we proceed? Spring 2010 © , Richard A. Stanley

Practical Security Issues
Brute force keyspace search is usually hard, owing to large keyspace Known ciphertext attacks are hard Why? It gets easier if you can get known plaintext to be encrypted and then cryptanalyze both streams Spring 2010 © , Richard A. Stanley

Secrecy What we want is that our encrypted information be invulnerable to cryptanalysis This is rarely possible (after all, invulnerable is a pretty tough test) Need to assess how closely the cryptosystem can approach this goal There are different levels of secrecy Spring 2010 © , Richard A. Stanley

Security Levels Computationally secure Provably secure
Breakable only by N operations, where N is so large as to be impractical with existing computing power available to Oscar Provably secure Proof provided relative to some other problem Unconditionally secure Cannot be broken, even with unlimited computing resources Spring 2010 © , Richard A. Stanley

Claude Shannon Bell Labs researcher 1941-1972
Best known for his seminal 1948 paper on information theory Also did considerable cryptography research, including the 1949 paper which introduced these concepts Died 2001 at age 84 Spring 2010 © , Richard A. Stanley

Security Realities Cannot investigate unconditional security in relation to computational complexity Unlimited resources allowed Must study security on a stochastic (probabilistic) basis instead This means we need to revisit statistics Spring 2010 © , Richard A. Stanley

Probability Distributions
The mathematical definition of a discrete probability function, P[x], is a function that satisfies the following properties. The probability that x can take a specific value is P[x] P[x] is non-negative for all real x. The sum of P[x] over all possible values of x is 1, 0  P[x]  1. Spring 2010 © , Richard A. Stanley

Normal Distribution Spring 2010 © , Richard A. Stanley

Cumulative Normal Distribution Function
Spring 2010 © , Richard A. Stanley

Not All PDFs are Normal Spring 2010 © , Richard A. Stanley

What does this actually mean?
Discrete probability function can take a discrete number of values (not necessarily finite, most often the non-negative integers or some subset of the non-negative integers). No mathematical restriction that discrete probability functions be defined only at integers, but in practice this is usually what makes sense. For example, if you toss a coin 6 times, you can get 2 heads or 3 heads but not 2½ heads. Each discrete value has a certain probability of occurrence that is between zero and one. The condition that the probabilities sum to one means that at least one of the values has to occur. Spring 2010 © , Richard A. Stanley

E is usually called an event
Events Let X be a random variable defined on X Let E be a subset of X Then P[x E] =  P[x] xE E is usually called an event Spring 2010 © , Richard A. Stanley

Independent Variables
Two variables, X and Y, are said to be independent if the value of one does not depend in any way on the value of the other This is a critical probability concept Many probabilistic equations are predicated on the independence of the variables If the variables are, in fact, not independent, the results of assuming they are will be wrong and misleading Spring 2010 © , Richard A. Stanley

For Independent Events
P (AB) = P(A) · P(B) P(A|B) = P(A) P(B|A) = P(B) Likewise, for the complements, Spring 2010 © , Richard A. Stanley

Rolling Dice Each die has six faces and is honest
Number showing on one die is independent of the number showing on the other Then let Z be the random variable denoting the value on the two dies Z={1,2,3,4,5,6} x {1,2,3,4,5,6} Thus, P[i,j] = 1/36 for all {i,j} Z Spring 2010 © , Richard A. Stanley

Making a Number Probability of rolling a 5? Snakeyes = Boxcars
S5= the event of rolling a five P[S5]={(1,4), (2,3), (3,2), (4,1)} = 4(1/36) = 1/9 Snakeyes = Boxcars S2= the event of rolling a two P[S2]={(1,1)} = 1/36 S6= the event of rolling a twelve P[S12]={(6,6)} = 1/36 Spring 2010 © , Richard A. Stanley

Joint Probability ...is what we have just studied
The joint probability P[x,y] is the probability that X takes the value x and Y takes the value y This is precisely what we calculated for the rolling of dice Spring 2010 © , Richard A. Stanley

Conditional Probability
The conditional probability P[x|y] is the probability that X takes the value x given that Y takes the value y X and Y are independent variables iff. P[x,y] = P[x]P[y] for all x  X and y  Y This can lead to some nonintuitive solutions to common problems Spring 2010 © , Richard A. Stanley

P[A|B]= --------------------------
Bayes’ Theorem If P[r]>0, Many things can be effectively modeled by Bayesian statistics Bayes Theorem dates from 1763, deals with outcomes from conditional probabilities P(A) P(B|A) P[A|B]= P(B) Spring 2010 © , Richard A. Stanley

Monty Hall Problem Three doors Monty knows what is where
Mercedes behind one Goats behind the other two Monty knows what is where You choose Door 1 for the Mercedes Monty opens Door 2 to reveal a goat You now have another chance to choose where the Mercedes is. What are odds it is behind Door 1? Spring 2010 © , Richard A. Stanley

Bayesian Analysis Let An be probability Mercedes is behind Door n, so P(A1)=P(A2)=P(A3)= 1/3 If Mercedes is behind Door 1, Monty is equally likely to open 2 or 3. Let B be the event that Monty opened Door 2. Then, P(B|A1)=1/2 If Mercedes is behind Door 2, Monty won’t open Door 2, so P(B|A2)=0 Spring 2010 © , Richard A. Stanley

Bayesian Outcomes Substituting the values already calculated,
P(A1 |B)= 1/3 P(A3 |B)= 2/3 Remember, P(A1 |B) is the probability that the Mercedes is behind Door 1, given that Monty opened Door 2 to reveal a goat You chose Door 1. Do you want to change your choice? Spring 2010 © , Richard A. Stanley

Cryptosystems Revisited
Cryptosystem = (P,C,K,E,D), where P is the finite set of possible plaintexts C is the finite set of possible ciphertexts K is the finite set of possible keys E is the encryption rule D is the decryption rule So what? Spring 2010 © , Richard A. Stanley

Probability and Cryptosystems
There is a probability distribution on P We know this: all letters are not equiprobable in any written language There is a probability distribution on K Ideally, all keys are equiprobable Unless the PDF on K is uniform, this is not so Keys are chosen before the plaintext is known, so it can be asserted that P and K are independent random variables Spring 2010 © , Richard A. Stanley

What Now? If P and K have probability distributions, then C must have a probability distribution Since much effort will be expended to make the key as random as possible, we can assert that C is a random variable We can determine the probability of any cC if we know the probabilities of the events in the plaintext and key spaces Spring 2010 © , Richard A. Stanley

P[y = y] =  P[K=K] P[x=dK(y)]
How? Conditional probabilities! C(K) is the set of ciphertexts for key K y is an event in C (an instance of ciphertext) Then, we can calculate P[y = y] =  P[K=K] P[x=dK(y)] {K:y  C(K)} Spring 2010 © , Richard A. Stanley

Continuing... We can also calculate the conditional probability that y is the ciphertext given that x is the plaintext, which is Now, we can apply Bayes’ Theorem to find the probability x is the plaintext given that y is the ciphertext P[y = y | x=x] =  P[K=K] {K:x=dK(y)} Spring 2010 © , Richard A. Stanley

Bayes’ Theorem P[x=x]  P[K=K]  P[K=K] P[x=dK(y)]
Anyone – including Oscar -- who knows the probability distributions can perform the following calculation: P[x=x]  P[K=K] P[x=x|y = y] = {K:x=dK(y)}  P[K=K] P[x=dK(y)] {K:y  C(K)} Spring 2010 © , Richard A. Stanley

Perfect Secrecy A cryptosystem has perfect secrecy if P[x|y] = P[x] for all x P, y  C What does this mean? It means that if the a posteriori probability that the plaintext is x, given that the ciphertext is y, is identical to the a priori probability that the plaintext is x, the system has perfect secrecy In other words, the statistics of P cannot be discerned from observing C Spring 2010 © , Richard A. Stanley

How to Achieve Perfect Security?
Assume a cryptosystem (P,C,K,E,D) where |K |=|C |=|P | This system provides perfect security iff. Every key is used with equal probability |K |-1 For every x  P and every y  C there is a unique key such that eK(x) = y Spring 2010 © , Richard A. Stanley

One-Time Pads Invented in 1917 by Gilbert Vernam (WPI alum)
Thought to be secure, but proof of security not provided until Shannon’s 1949 paper Provides perfect security Requires |K ||P |, which is a key management problem of the first rank Vulnerable to known-plaintext attack Given any two variables, XOR gives the other Spring 2010 © , Richard A. Stanley

One-Time Pads Using each key only once is a special case, made hard to put into practice by the demands of key management and the size of the key (at least as large as the plaintext) What happens if more than a single plaintext is encrypted with the same key? How likely is a ciphertext-only attack? Spring 2010 © , Richard A. Stanley

Measuring Information
How much information is received when we observe a specific value for a discrete random variable x? Amount of information is degree of surprise Certain means no information More information when event is unlikely Depends on probability distribution p(x),a quantity h(x) If there are two unrelated events x and y we want h(x,y)= h(x) + h(y) i.e. they are statistically independent Thus we choose h(x)= -log2 p(x) Negative assures that information measure is positive •Average amount of information transmitted is the expectation wrt p(x) referred to as entropy, H(x): H(x)=-Σx p(x) log2 p(x) Spring 2010 © , Richard A. Stanley

From Thermodynamics to Information Theory
Boltzmann( ) Created Statistical Mechanics Explained why not all energy is available to do useful work Shannon ( ) Developed mathematical measure of information Developed information concept of entropy Spring 2010 © , Richard A. Stanley

Entropy Introduced by Shannon in 1948 paper on information theory
Entropy is a mathematical measure of information or uncertainty, and is a function of a probability distribution Entropy is a concept not unique to information theory; it was first used to describe thermodynamic concepts Spring 2010 © , Richard A. Stanley

Thermodynamic Entropy
Physical, chemical, and electrical energy can be completely changed into heat The reverse (e.g., changing heat into physical energy) cannot be fully accomplished without outside help or inevitable loss of energy in the form of irretrievable heat This energy is not destroyed, but it becomes unavailable for producing work. The irreversible increase of this nondisposable energy in the universe is measured by the abstract dimension that Clausius in 1865 called entropy Thus, “Entropy is always increasing” Spring 2010 © , Richard A. Stanley

Information Entropy What is the information gained by the outcome of an experiment which takes place from a random variable, X, taking values from probability distribution X? Or, if the experiment has yet to be conducted, what is the uncertainty about the outcome? Quantity is called entropy, denoted by H(X) Spring 2010 © , Richard A. Stanley

Coins Again P[heads] = P[tails] = 0.5
There are only two outcomes, heads or tails We can encode these outcomes using one bit Heads = 1 Tails = 0 Entropy = 1 bit Entropy of n tosses is n bits Spring 2010 © , Richard A. Stanley

Another Example X has three events, x1, x2, x3 P[x1]=1/2
P[x2] = P[x3] = 1/4 Efficient coding: x1 =>0, x2 =>10, x3 =>11 Average number of bits to represent the events: 0.5    2 =1.5 We could thus say that an outcome occurring with probability p could be encoded as a bit string of length log2 p Spring 2010 © , Richard A. Stanley

Formal Definition H(X) = –  P[x] log2 P[x]
Let X be a discrete random variable which takes on values from the finite set X. Then the entropy of random variable X is defined as the quantity Another logarithm base could be used; the adjustment is merely a constant factor H(X) = –  P[x] log2 P[x] xX Spring 2010 © , Richard A. Stanley

Huffman Coding Entropy encoding algorithm for lossless data compression Minimum redundancy code Based on probability of symbol occurrence 1 Spring 2010 © , Richard A. Stanley

Entropy & Huffman Coding
Entropy value gives a good estimate to the average length Huffman encoding, and vice versa Thus, the concept of entropy is not limited to cryptography Spring 2010 © , Richard A. Stanley

Concave Functions A real-valued function f is a concave function on an interval I if f((x+y/2)  (f(x) + f(y))/2 for all x, y  I f is strictly concave on interval I if f((x+y/2) > (f(x) + f(y))/2 for all x, y  I, x  y What does this mean? f has a single local maxima/minima in I Some authors use convex vs. concave. The meaning is essentially the same Spring 2010 © , Richard A. Stanley

 ai f(xi )  f ( ai xi ) , where xi I
Jensen’s Inequality If f is a continuous strictly concave function on the interval I, Then, The equality holds iff. x1= ...= xn n  ai = 1 where ai > 0, 1 i  n n n  ai f(xi )  f ( ai xi ) , where xi I Spring 2010 © , Richard A. Stanley

So What? The function log2 n is strictly concave on the interval (0,)
We can apply Jensen’s equality to learn more about entropy n n H(X) = –  pi log2 pi =  pi log2 (1/pi)  log2  pi (1/pi) = log2 n iff. pi = 1/n n Spring 2010 © , Richard A. Stanley

Furthermore... H(X,Y)  H(X) + H(Y)
The equality obtains iff. X and Y are independent random variables Spring 2010 © , Richard A. Stanley

H(X|y) = – P[x|y]log2 P[x|y]
Conditional Entropy - 1 A measure of the average amount of information about X that is revealed by Y If X and Y are independent random variables, for any fixed value of y there is a conditional probability distribution on X H(X|y) = – P[x|y]log2 P[x|y] Spring 2010 © , Richard A. Stanley

H(X|Y) = –   P[y] P[x|y]log2 P[x|y]
Conditional Entropy - 2 Conditional entropy is now defined as the weighted average of the entropies H(X|y) over all possible values of y: Once again, this measures the average amount of information about X that is revealed by Y H(X|Y) = –   P[y] P[x|y]log2 P[x|y] y x Spring 2010 © , Richard A. Stanley

Entropy and Cryptosystems
Consider the conditional entropy H(K|C) known as the key equivocation measures how much information about the key is revealed by the ciphertext clearly, this could be a useful measure for evaluating cryptosystems Let (P,C,K,E,D) be a cryptosystem. Then H(K|C) = H(K) + H(P) – H(C) Spring 2010 © , Richard A. Stanley

Summary Probability plays an important role in cryptography, both in design and analysis Perfect secrecy can be achieved, but at great cost in key management As a result, it is rarely attempted Using Shannon’s concept of entropy, we can provide an objective measure of information “leakage” through encryption Spring 2010 © , Richard A. Stanley

Homework Read (found on the web page): Do the following problems:
Shannon’s paper on Information Theory RSA stream cipher paper Do the following problems: Suppose a cryptosystem achieves perfect secrecy for a particular plaintext PDF. Prove that perfect secrecy is maintained for any plaintext PDF. Prove that if a cryptosystem has perfect security and |P| = |C| = |K|, then every ciphertext is equally probable. Spring 2010 © , Richard A. Stanley

3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E.

Similar presentations

Presentation on theme: "3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E.

Similar presentations

Presentation on theme: "3: Stream Ciphers and Probability Professor Richard A. Stanley, P.E."— Presentation transcript:

Similar presentations

About project

Feedback