Remote Timing Attacks are Practical

Slides:



Advertisements
Similar presentations
Side-Channel Attacks on RSA with CRT Weakness of RSA Alexander Kozak Jared Vanderbeck.
Advertisements

SSL CS772 Fall Secure Socket layer Design Goals: SSLv2) SSL should work well with the main web protocols such as HTTP. Confidentiality is the top.
Remote Timing Attacks -Rashmi Kukanur. Agenda  Timing Attacks  Case Study : –David Brumley –Dan Boneh  Defenses.
RSA Attacks 1 RSA Implementation Attacks RSA Attacks 2 RSA  RSA o Public key: (e,N) o Private key: d  Encrypt M C = M e (mod N)  Decrypt C M = C d.
The RSA Cryptosystem and Factoring Integers (II) Rong-Jaye Chen.
Side-Channel Attacks on Smart Cards. Timing Analysis Cryptosystems take different amount of time to process different inputs. Performance optimisations.
Level ISA3: Information Representation
Cryptography1 CPSC 3730 Cryptography Chapter 9 Public Key Cryptography and RSA.
Side-Channel Attack: timing attack Hiroki Morimoto.
Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.
The RSA Algorithm Rocky K. C. Chang, March

RSA Ramki Thurimella.
Digital Signatures A primer 1. Why public key cryptography? With secret key algorithms Number of key pairs to be generated is extremely large If there.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Modular Arithmetic with Applications to Cryptography Lecture 47 Section 10.4 Wed, Apr 13, 2005.
Cryptography and Network Security Chapter 9 - Public-Key Cryptography
Slide 1 Vitaly Shmatikov CS 380S Timing Attacks. slide 2 Reading uKocher. “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”
Lecture 8 Overview. Analysis of Algorithms Algorithms – Time Complexity – Space Complexity An algorithm whose time complexity is bounded by a polynomial.
Scott CH Huang COM 5336 Cryptography Lecture 6 Public Key Cryptography & RSA Scott CH Huang COM 5336 Cryptography Lecture 6.
Modular (Remainder) Arithmetic n = qk + r (for some k; r < k) eg 37 = (2)(17) + 3 Divisibility notation: 17 | n mod k = r 37 mod 17 = 3.
Remote Timing Attacks are Practical David Brumley Dan Boneh [Modified by Somesh.
1/16 Seeing through M IST given a Small Fraction of an RSA Private Key Colin D. Walter Comodo Research Lab (Bradford, UK)
0x1A Great Papers in Computer Security Vitaly Shmatikov CS 380S
Lecture 6. RSA Use in Encryption to encrypt a message M the sender: – obtains public key of recipient PU={e,n} – computes: C = M e mod n, where 0≤M
1 The RSA Algorithm Rocky K. C. Chang February 23, 2007.
Revision. Cryptography depends on some properties of prime numbers. One of these is that it is rather easy to generate large prime numbers, but much harder.
CHAPTER 5: Representing Numerical Data
Chapter 3 Data Representation
Public Key Cryptography
Web Applications Security Cryptography 1
Simple Parity Check The simplest form of error detection is the parity check used with ASCII codes, originally on asynchronous modem links Each 7 bit ASCII.
Public Key Encryption Major topics The RSA scheme was devised in 1978
Mathematics of Cryptography
NUMBER SYSTEMS.
Attacks on Public Key Encryption Algorithms
Security Outline Encryption Algorithms Authentication Protocols
Mathematics of Cryptography
Chapter Applications of Number Theory Some Useful Results
CSE565: Computer Security Lecture 7 Number Theory Concepts
A Closer Look at Instruction Set Architectures
Advanced Algorithms Analysis and Design
Introduction to Number Theory
RSA and El Gamal Cryptosystems
Public-key Cryptography
Fundamental Concepts in Security and its Application Cloud Computing
Cache Memory Presentation I
Number Theory (Chapter 7)
Originally by Yu Yang and Lilly Wang Modified by T. A. Yang
Introduction to Pentium Processor
RSA Cryptosystem Bits PCs Memory MB ,000 4GB 1,020
Digital Signature Schemes and the Random Oracle Model
Efficient CRT-Based RSA Cryptosystems
Introduction to Computer Systems
Topic 25: Discrete LOG, DDH + Attacks on Plain RSA
Lecture 20 Guest lecturer: Neal Gupta
Foundations of Network and Computer Security
ICS 353: Design and Analysis of Algorithms
Rivest, Shamir and Adleman
Analysis of the RSA Encryption Algorithm
Cryptographic Timing Attacks
Fundamentals of Python: First Programs
Cryptography Lecture 11.
Introduction to Cryptography
Cryptography Lecture 16.
Cryptography Lecture 18.
COMP755 Advanced Operating Systems
Hardware is… Software is…
RSA Implementation Attacks
Public-Key Cryptography Quadratic Residues and „Rabin Lock“
Presentation transcript:

Remote Timing Attacks are Practical An Overview by - Rahul Deshpande

What are Timing Attacks Extracting secrets by observing time to respond to various queries E.g.. Kocher designed a timing attack to expose secret keys used for RSA.

Timing Attacks Usually used to attack weak computing devices such as Smart Cards Also applicable to general software systems Practical against network servers

Common Assumptions Attack only applicable to hardware security devices Attack cannot be used to against general purpose servers since decryption times are masked by many concurrent processes running on the system.

Challenging the Assumptions Remote timing attack against OpenSSL developed. OpenSSL: an SSL library commonly used in web servers and other SSL applications. Attack client measures the time an OpenSSL server takes to respond to the decryption queries. Client able to extract private key stored on the server.

Environments in which attack is applicable Network: between two machines in different buildings with multiple routers and switches between them. Interprocess: Between two processes running on the same machine. Virtual Machines: extracting RSA private key from secure Virtual Machine (VM), invalidating isolation provided by Virtual Machine Monitor (VMM)

OpenSSL Decryption RSA decryption done using modular exponentiation M = cd mod N where N= pq is the RSA modulus. OpenSSL uses Chinese Remainder Theorem to perform exponentiation. CRT computes exponentiation in two steps by computing m1 and m2 and then combining the two to get m. Decryption with CRT gives up to a factor of four speedup Timing attack can expose the factors of N used in CRT.

The Chinese Remainder Theorem It is possible to reconstruct integers in a certain range from their residues modulo a set of pair wise relatively prime moduli. E.g. The 10 integers in Z10(0,1….9) can be reconstructed from their two residues modulo 2 and 5 (relatively prime factors of 10). Provides a way to manipulate large numbers mod M in terms of tuples of smaller numbers. CRT can be formulated as: k M = ∏ mi i=1 Where, mi are pairwise relatively prime. Any integer in Zm can be represented by a k-tuple whose elements are in Zmi using the following correspondence; A <-> (a1, a2,….ak)

The Chinese Remainder Theorem n = n1n2…nk with gcd (ni; nj ) = 1 when i != j The system of congruencies x=x1(mod n1)=…=xk(mod nk) has a simultaneous solution x to all of the congruencies, and there exists exactly one solution x between 0 and n-1.

Speedup RSA with CRT Any message M<N is uniquely represented by the tuple [MP;MQ ], where MP = M(mod P) and MQ = M(mod Q). CP = C(mod P) and CQ = C(mod Q). DP = D(mod (P-1)) and DQ = D(mod (Q-1)) RP = QP-1(mod N) and RQ = PQ-1(mod N) MP = CPDP(mod P) and MQ = CQDQ(mod Q) SP = MPRP(mod N) and SQ = MQRQ(mod N) M = SP + SQ. If M>=N then calc M=M-N. Reference: Johann Großschädl, “The Chinese Remainder Theorem and its Application in a High-Speed RSA Crypto Chip”

Exponentiation Simplest algorithm to compute gd mod q is square and multiply. OpenSSL uses optimization of square and multiply called sliding window exponentiation

Sliding Window Exponentiation Block of bits (window) of d processed at each iteration. Requires precomputing a multiplication table, taking time proportional to 2w-1 +1 for a window size of w. For a 1024-bit modulus, OpenSSL uses a window size of five. Attack: Querying on many inputs g, attacker exposes information about bits of the factor q. Attack on sliding windows harder than on square and multiply because of fewer multiplications.

Montgomery Reduction A reduction modulo q done via multiprecision division and then returning the remainder is expensive. Montgomery proposed method for implementing reduction modulo q using series of operations efficient in hardware and software. Montgomery reduction transforms a reduction modulo q into a reduction modulo some power of two denoted by R Reduction modulo power of 2 faster since easily implemented in hardware. All variables must be put into Montgomery form.

Montgomery Reduction At the end of reduction, checked if output cR is greater than q. If cR>q, q subtracted from output to keep cR in the range [o,q). This extra step is called Extra Reduction. Extra Reduction causes timing difference for different inputs. Detecting timing differences from extra reduction tells how close g is to a multiple of one of the factors.

Multiplication Routines RSA operations make use of a multi-precision integer multiplication routine. OpenSSL implements two multiplication routines: Karatsuba and Normal. Karatsuba used when multiplying two numbers with equal number of words. Takes time O(n1.58). Normal Multiplication used when multiplying two numbers with unequal word sizes n and m. Takes time O(nm).

Multiplication Routines Normal Multiplication takes quadratic time for numbers of approximately same size. Multiplication of two unequal size words takes longer than multiplication of equal size words. This fact used in timing attack on OpenSSL. Underlying word multiplication algorithm dominates the total time for a decryption. In OpenSSL, it takes 30%-40% of total running time.

Comparison of Timing Differences Two algorithmic data dependencies in OpenSSL that cause time variance in RSA decryption: 1. Number of extra reductions in Montgomery Reduction. 2. Choice of multiplication routine. Effects of these optimizations counteract each other. Karatsuba: decryption of g<q faster than g>q and vice versa for Montgomery Reduction.

A Timing Attack on OpenSSL Exposes the factorization of the RSA modulus. Approximations built which get progressively closer as the attack proceeds. Can be viewed as a binary search for q. After recovering half-most bits of q, Coppersmith’s algorithm used to retrieve complete factorization. Value of decryption not needed, only the time required for decryption needed.

Timing Attack on OpenSSL g is an integer that has the same top i-1 bits as q and remaining bits of g are 0. ghi is same as g, with ith bit set to 1. If bit of q is 1 then g< ghi<q, otherwise g<q< ghi. Measure the time to decrypt both ug and ughi, represented as t1 and t2. Calculate the timing difference td = |t1-t2|. If bit i of q is 0, then td is large If bit i of q is 1, then td is small

Real World Scenarios Timing attack applies to SSL applications such as stunnel, Apache web server with mod_SSL, and trusted computing projects such as Microsoft’s NGSCB. RSA applications using a hardware crypto accelerator not vulnerable. Attacks apply to only software based RSA implementations.

Example of an Attack on SSL server In a standard full SSL handshake, SSL server performs RSA decryption using its private key. CLIENT-KEY-EXCHANGE message composed by encrypting PKCS 1 padded random bytes with server’s public key. In the attack, client substitutes properly formatted CLIENT-KEY-EXCHANGE with the guess g. Server generates ALERT message. Client computes time difference and repeats for various values.

Experiments Show that factorization of the RSA modulus N is vulnerable. Test effects of increasing decryption requests Compare effectiveness based upon different keys Compare effectiveness based upon machine architecture and common compile-time optimizations Compare effectiveness based upon source-based optimizations Compare inter-process vs. local network attacks Compare effectiveness against two common SSL applications: Apache web server with mod_SSL and stunnel

Experiment Setup Attack performed against OpenSSL 0.9.7 which does not blind RSA operations by default. Simple TCP server implemented that read an ASCII string Converted string to OpenSSL’s internal multi-precision representation The RSA decryption performed Decryption time: writing the ciphertext over the socket to receiving the reply.

Experiment 1- Number of Ciphertexts Parameters that determine the number of queries needed to expose a single bit of RSA factor: Neighborhood size: for every bit of q, measure the decryption time for a neighborhood of values g, g+1, g+2… g+n, denoted by n. Sample Size: For each value g+i, sample decryption time multiple time and compute mean decryption time. Number of times g+i is queried on denoted by s. Total number of queries needed to compute Tg= s*n.

Continued.. Zero-one gap: gap between when a bit of q is 0 and 1. Larger the gap, stronger the indicator that bit is 0, smaller the chance of error. Increasing the neighborhood size increases zero-one gap when bit is 0, but is steady when bit is 1. Total number of queries to recover a factor: (2ns)*log2(N/4) where N= RSA public modulus.

Experiment 2- Different Keys Several 1024-bit keys attacked, to determine the ease of breaking different moduli. Zero-one gap positive for first 32 bits due to Montgomery reductions. Normally, resulting zero-one gap shifts occur around the multiple of machine word size. Attacker must be aware that zero-one gap may flip signs when guessing bits that are around multiples of machine word size. If hard-to-guess bits encountered, neighborhood size can be increased to increase the zero-one gap.

Experiment 3- Architecture and Compile-Time effects Computer Architecture and compile-time optimizations affect the zero-one gap. Effect of Architecture: Programs with similar retirement counts may have different execution profiles. This is due to different run-time factors such as branch predictions, pipeline throughput, and the L1 and L2 cache behavior. Compile-time optimizations change the number of instructions and how efficiently instructions are executed on the hardware.

Continued… Effects of compile-time optimizations tested by compiling OpenSSL in three different ways: Optimized No Pentium flag Unoptimized Each different compile-time optimizations changes the zero-one gap.

Experiment 4 – Source-Based Optimizations Patches can change the code profile of RSA libraries resulting in timing vulnerability. After a CRT decryption, OpenSSl re-encrypts the result to verify if it is identical to original ciphertext. OpenSSL calculates both Montgomery parameters on every decryption. A patch allows OpenSSL to cache both the values between decryptions with the same key. This shifts the zero-one gap since resulting code has different execution profile. Patches may be used to increase the zero-one gap making the code vulnerable to timing attacks.

Experiment 5 – Interprocess vs. Local Network Attacks Noise from network eliminated by repeated sampling, giving similar zero-one gap to inter-process. Networks with less than1ms of variance are vulnerable. Attacker can take advantage of higher CPU speeds for increasing accuracy of timing measurements.

Experiment 6 – Attacking SSL Applications on the Local Network Apache+mod_SSL is a commonly used secure web server. Stunnel allows TCP/IP connections to be tunneled through SSL. Servers connected by a single switch are vulnerable to the attack. Attacker has access to a machine near the OpenSSL-based server. Timing attacks also work in larger networks where client and webserver are separated by multiple routers and switches on the network backbone. Run-time differences result in different zero-one gaps. Experiment highlights difficulty in determining minimum number of queries for a successful attack.

Defenses Three Possible Defenses: RSA Blinding: Calculates x is then decrypted as normal, followed by division by r. Since r is random, x is random and timing the decryption does not reveal information about the key. Performance penalty of 2%-10%.

Continued… 2. Try and make all RSA decryptions not dependent upon the input ciphertext. Harder to create and maintain the code when decryption time is not dependant upon ciphertext. 3. Require all RSA computations to be quantized i.e. always take multiples of some predefined time quantum. Preferred method is Blinding. Drawbacks is that it requires a good source of randomness to prevent attacks on blinding factor leading to a small performance degradation

Conclusion Experiments show that timing attacks are effective when carried out between machines separated by multiple routers. Timing attacks also effective on two processes on the same computer. Several Crypto libraries, including OpenSSL, now implement blinding by default to prevent timing attacks.