Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK

Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK jac@cs.york.ac.uk jac@cs.york.ac.uk IBM Hursley 13.02.2001

Overview Introduction to heuristic optimisation techniques Part I: making security protocols Part II: breaking protocols based on NP-hardness

Heuristic Optimisation

Local Optimisation - Hill Climbing x0x0 x1x1 x2x2 z(x) Neighbourhood of a point x might be N(x)={x+1,x-1} Hill-climb goes x 0  x 1  x 2 since f(x 0 ) f(x 3 ) and gets stuck at x 2 (local optimum) x opt Really want to obtain x opt x3x3

Simulated Annealing x0x0 x1x1 x2x2 z(x) Allows non-improving moves so that it is possible to go down x 11 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 x 10 x 12 x 13 x in order to rise again to reach global optimum In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.

Simulated Annealing Improving moves always accepted Non-improving moves may be accepted probabilistically and in a manner depending on the temperature parameter T. Loosely the worse the move the less likely it is to be accepted a worsening move is less likely to be accepted the cooler the temperature The temperature T starts high and is gradually cooled as the search progresses. Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)

Simulated Annealing Current candidate x. Minimisation formulation. At each temperature consider 400 moves Always accept improving moves Accept worsening moves probabilistically. Gets harder to do this the worse the move. Gets harder as Temp decreases. Temperature cycle

Simulated Annealing Do 400 trial moves

Based on evolution: survival of the fittest. Encode solution to optimisation problem as a gene string. Carry out the following (simple GA approach): take a group of solutions assess their fitness. choose a new population with fitter individuals having more chance of selection. ‘mate’ pairs to produce offspring. allow individuals to mutate. return to first step with offspring as new group. Eventually the strings will converge to a solution. Genetic Algorithms

The problem is: maximise the function g(x)=x over the integers 0..15 We shall now show how genetic algorithms might find this solution. Let’s choose the obvious binary encoding of the integer solution space: x=0 has encoding 0000 x=5 has encoding 0101 x=15 has encoding 1111 Choose the obvious fitness function, fitness(x)=g(x)=x Genetic Algorithms: Simple Example

0100 0011 0011 0010 4 3 3 2 12 a b c d Randomly generate initial population 0011 0100 0100 0011 3 4 4 3 14 a b c d Randomly select 4 of these solutions according to fitness, e.g. b, a, a, c 0011 0100 0100 0011 3 4 4 3 14 a b c d Randomly choose pairs to mate, e.g. (a,b) and (c,d) with random cross-over points and swap right parts of genes 0000 0111 0101 1010 0 7 5 10 22 a b c d Now have radically fitter population, so continue to cycle. 0000 0111 0101 0010 0 7 5 2 14 a b c d Also allow bits to ‘flip’ occasionally, e.g. first bit of d. This allows a 1 to appear in the first column

General Iteration We now have our new generation, which is subject to selection, mating and mutation again......until some convergence criterion is met. In practice it’s a bit more sophisticated but the preceding slide gives the gist. Genetic algorithms have been found to be very versatile. One of the most important heuristic techniques of the past 30 years.

Making Protocols with Heuristic Optimisation

Examples: Secure session key exchange “I am alive” protocols. Various electronic transaction protocols. Problems Rather hard to get right “We cannot even get three-line programs right” Probably the highest profile area of academic security research. Major impetus given to the area by Burrows Abadi and Needham’s belief logic “BAN logic”. Security Protocols

Allows the assumptions and goals of a protocol to be stated abstractly in a belief logic. Messages contain beliefs actually held by the sender. Rules govern how receiver may legitimately update his belief state when he receives a message. Protocols are series of messages. At the end of the protocol the belief states of the principals should contain the goals. BAN Logic

Basic elements BAN Logic K is a good key for communicating between P and Q Np is a well-typed ‘nonce’, a number to be used only once in the current protocol run, e.g. a randomly generated number use as a challenge. Np is ‘fresh’ #, meaning that it really is a valid ‘nonce’ P,Q stand for arbitrary protocol principals

BAN Logic P believes X. The general idea is that principals should only issue statements they actually believe. Thus, P might have believed that the number Na was fresh yesterday and said so, but it would be wrong to conclude that he believes it now. If the message is recent (see later) then we might conclude he believes it. P once said X, i.e. has issued a message containing X at some point P has jurisdiction over X. This captures the notion that P is an authority about the statement X. If you believe P believes X and you trust him on the matter, then you should believe X too (see later)

BAN Logic - Assumptions and Goals A and S share common belief in the goodness of the key Kas and so they can use it to communicate. S also believes that the key Kab is a good session key for A and B. A has a number Na that he also believes is fresh and believes that S is the authority on statements about the goodness of key Kab. The goal of the protocol is to get A to believe the key Kab is good for communication with B

BAN Logic –Message Meaning Rule then P should believe that Q once uttered or ‘once said’ X. If P sees X encrypted using key K and P believes that key K is shared securely only with principal Q

BAN Logic –Nonce Verification Rule then P should believe that Q currently believes X and P believes that X is ‘fresh’ This rule promotes ‘once saids’ to actual beliefs If P believes that Q once said X

BAN Logic – Jurisdiction Rule then P should believe X too If P believes that Q has jurisdiction over X and P believes Q believes X Jurisdiction captures the notion of being an authority. A typical use would be to give a key server authority over statements of belief about keys. If I believe that a key is good and you reckon I am an authority on such matters then you should believe the key is good too

Messages as Integer Sequences senderBelief_1 4 3 2 1 0 receiverBelief_2 2281912 0=22 mod 33=8 mod 51=19 mod 32=12 mod 5 PQ Say 3 principals P, Q and S P=0, Q=1,S=2 Message components are beliefs in the sender’s current belief state (and so if P has 5 beliefs integers are interpreted modulo 5)

Search Strategy We can now interpret sequences of integers as valid protocols. Interpret each message in turn updating belief states after each message This is the execution of the abstract protocol. Every protocol achieves something! The issue is whether it is something we want! We also have a move strategy for the search, e.g. just randomly change an integer element. This can change the sender,receiver or specific belief of a message (and indeed subsequent ones)

Fitness Function We need a fitness function to capture the attainment of goals. Could simply count the number of goals attained at the end of the protocol In practice this is awful. A protocol that achieves a goal after 6 messages would be ‘good as’ ne that achieved a goal after 1 message. Much better to reward the early attainment of goals in some way Have investigated a variety of strategies.

Fitness Functions is given by One strategy (uniform credit) would be to make all the weights the same. Note that credit is cumulative. A goal achieved after the first message is also achieved after the second and third and so on.

Examples One of the assumptions made was that B would take S’s word n whether A |~Na

Examples

General Observations Able to generate protocols whose abstract execution is a proof of their own correctness Have done so for protocols requiring up to 9 messages to achieve the required goals. Other methods for protocol synthesis is search via model checking. Exhaustive but limited to short protocols. Can generalise notion of fitness function to include aspects other than correctness (e.g. amount of encryption).

Breaking Protocols with Heuristic Optimisation

Identification Problems Notion of zero-knowledge introduced by Goldwasser and Micali (1985) Indicate that you have a secret without revealing it Early scheme by Shamir Several schemes of late based on NP-complete problems Permuted Kernel Problem (Shamir) Syndrome Decoding (Stern) Constrained Linear Equations (Stern) Permuted Perceptron Problem (Pointcheval)

Pointcheval’s Perceptron Schemes GivenFind So That Interactive identification protocols based on NP-complete problem. Perceptron Problem.

Pointcheval’s Perceptron Schemes GivenFindSo That Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint. Has particular histogram H of positive values 135..

Example: Pointcheval’s Scheme PP and PPP-example Every PPP solution is a PP solution. Has particular histogram H of positive values 135

Generating Instances Suggested method of generation: Generate random matrix A Generate random secret S Calculate AS If any (AS) i <0 then negate ith row of A Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits

Instance Properties Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables. Initial image histogram has Binomial shape and is symmetric about 0 After negation simply folds over to be positive -7–5-3-1 1 3 5 7… 1 3 5 7… Image elements tend to be small

PP Using Search: Pointcheval Pointcheval couched the Perceptron Problem as a search problem. current solution Y Neighbourhood defined by single bit flips on current solution Cost function punishes any negative image components costNeg(y)=|-1|+|-3| =4

Using Annealing: Pointcheval PPP solution is also PP solution. Based estimates of cracking PPP on ratio of PP solutions to PPP solutions. Calculated sizes of matrix for which this should be most difficult Gave rise to (m,n)=(m,m+16) Recommended (m,n)=(101,117),(131,147),(151,167) Gave estimates for number of years needed to solve PPP using annealing as PP solution means Instances with matrices of size 200 ‘could usually be solved within a day’ But no PPP problem instance greater than 71 was ever solved this way ‘despite months of computation’.

Perceptron Problem (PP) Knudsen and Meier approach (loosely): Carrying out sets of runs Note where results obtained all agree Fix those elements where there is complete agreement and carry out new set of runs and so on. If repeated runs give same values for particular bits assumption is that those bits are actually set correctly Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.

Profiling Annealing Approach is not without its problems. Not all bits that have complete agreement are correct. Actual Secret Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 All runs agree All agree (wrongly) 1

Knudsen and Meier Have used this method to attack PPP problem sizes (101,117) Needs hefty enumeration stage (to search for wrong bits), allowed up to 2 64 search complexity Used new cost function w 1 =30, w 2 =1 with histogram punishment cost(y)=w 1 costNeg(y)+w 2 costHist(y)

Analogy Time I: Encryption Key Plaintext P Ciphertext C The Black Box Assumption – essentially considering encryption only as a mathematical function. In the public arena only really challenged in the 90’s when attacks based on physical implementation arrived Paul Kocher’s Timing Attacks Simple Power Analysis Differential Power Analysis Fault Injection Attacks (Belcore, and others) The computational dynamics of the implementation can leak vast amounts of information

Analogy Time II: Annealing Initialisation data Problem P Final Solution C The Black Box Assumption – virtually every application of annealing simply throws the technique at problem and awaits the final output. Is this really the most efficient use of information? Let’s look inside the box…..

Analogy Time III: Internal Computational Dynamics Initialisation data Problem P, e.g. minimise cost(y,A,Hist) Final Solution C The algorithm carries out 100 000s of cost function evaluations which guide the search. Why did it take the path it did? Bear in mind the whole search process is public and so we can monitor it.

Analogy Time IV: Fault Injection Initialisation data Warped or Faulty Problem P’ Final Solution C’ Invariably people assume you need to solve the problem at hand. Reflected in ‘well-motivated’ or direct cost functions What happens if we inject a ‘fault’ into the process? Mutate the problem into a similar but different one. Can we make use of the solutions obtained to help solve original problem?

PP Move Effects What limits the ability of annealing to find a PP solution? A move changes a single element of the current solution. Want current negative image values to go positive But changing a bit to cause negative values to go positive will often cause small positive values to go negative. 0123456701234567

Problem Fault Injection Can significantly improve results by punishing at positive value K For example punish any value less than K=4 during the search Drags the elements away from the boundary during search. Also use square of differences |W i -K| 2 rather than simple deviation 01234567

Problem Fault Injection Comparative results Generally allows solution within a few runs of annealing for sizes (201,217) Number of bits correct is generally worst when K=0. Best value for K varies between sizes (but can do profiling to test what it is) Has proved possible to solve for size (401,417) and higher. Enormous increase in power for essentially change to one line of the program Using powers of 2 rather than just modulus Use of K factor Morals… Small changes may make a big difference. The real issue is how the cost function and the search technique interact The cost function need not be the most `natural’ direct expresion of the problem to be solved. Cost functions are a means to an end. This is a form of fault injection on the problem.

Profiling Annealing But look again at the cost function templates Different weights w1 and w2 will given different results yet the resulting cost functions seem plausibly well-motivated. We can view different choices of weights as different viewpoints on the problem. Now carry out runs using the different costs functions. Very effective – using about 30 cost functions have managed to get agreement on about 25% of the key with less than 0.5 bits on average in error Additional cost functions remove incorrect agreement (but may also reduce correct agreement).

Radical Viewpoint Analysis Problem P Problem P 1 Problem P 2 Problem P n-1 Problem P n Essentially create mutant problems and attempt to solve them. If the solutions agree on particular elements then they generally will do so for a reason, generally because they are correct. Can think of mutation as an attempt to blow the search away from actual original solution

Profiling Annealing: Timing Simulated annealing can make progress, typically getting solutions with around 80% of the vector entries correct (but don’t know which 80%) But this throws away a lot of information – better to monitor the search process as it cools down. Based on notion of thermostatistical annealing. Watch the elements of the secret vector as the search proceeds. Record the temperature cycle at which the last change to an elements value occurs, i.e. +1 to –1 or vice versa At the end of the search all elements are fixed. Analysis shows that some elements will take some values early in the search and then never subsequently change. They get ‘stuck’ early in the search. The ones that get stuck early often do so for good reason – they are the correct values.

Profiling Annealing: Timing Tested 30 PPP instances (101,117) with 32 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy. Maximum number of initial bits fixed at correct values Some strategies far better than others – value of K is very important: K=13 seems very good candidate. Channel is highly volatile – hence need for repeated runs. Note also that some runs had up to 108 of 117 bits set correctly in final solution. For small K the minimum number of bits correct in final solution is radically worse than for larger values of K. <40 40-49 50-59 60-69 70-79 2 10 14 2 2

Profiling Annealing: Timing Tested 30 PPP instances (151,167) with 16 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy. Maximum number of initial bits fixed at correct values Similar general results as before. Also tried for (201,217) – some runs in excess of 100 initial stuck bits correct. <40 40-49 50-59 60-69 70-79 80+ 1 5 9 11 2 2

Some Questions Can you fix an element of the solution at +1 and –1 and determine likelihood of correctness based on distribution of results obtained? Affects of different parameters (e.g. power parameters)? How well can we profile the distribution of results in order to isolate those ones at the extremes of correctness? Can we apply similar profiling tricks to other NP-complete problems Permuted Kernel Problem Syndrome Decoding

Example – Permuted Kernel Problem Arithmetic carried out mod p

Example – Syndrome Decoding Arithmetic carried out mod 2 Small number k of bits in S set to 1

Some Questions Why does everyone try to find the secret/key directly? e.g. for Block ciphers can we use guided search techniques to generate better approximations? Use search to generate better (or more) cryptanalytic tools, e.g. multiple approximations? Very loose. What would happen if you tried to search for a key on a difficult traditional encryption algorithm? Encrypt(K: P)=C Suppose you tried a guided search based on Hamming Distance Encrypt(K’: P)=C’ Cost(C’,C)=hamming(C,C’) (or sum of such costs over Pi) No chance of success at all. But what is the distribution of the failures? Is there a cost function that would induce an exploitable distribution of solutions?

Some Questions Work combines fault injection and a `timing’ attack? What is the equivalent of differential power analysis for heuristic search?

Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK

Similar presentations

Presentation on theme: "Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK

Similar presentations

Presentation on theme: "Making and Breaking Security Protocols with Heuristic Optimisation John A Clark Dept. of Computer Science University of York, UK"— Presentation transcript:

Similar presentations

About project

Feedback