Template attacks Suresh Chari, Josyula R. Rao, Pankaj Rohatgi IBM Research.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Power, EM and all that: Is your crypto device really secure? Pankaj Rohatgi Dakshi Agrawal, Bruce Archambeault, Suresh Chari, Josyula R Rao IBM T.J. Watson.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Machine Learning Week 2 Lecture 1.
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
STAT 497 APPLIED TIME SERIES ANALYSIS
C ● O ● M ● O ● D ● O RESEARCH LAB Longer Keys may Facilitate Side Channel Attacks (Bradford, UK) Colin.
Observers and Kalman Filters
What is Statistical Modeling
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Ensemble Learning: An Introduction
Evaluating Hypotheses
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Lecture 4 Measurement Accuracy and Statistical Variation.
1 Validation and Verification of Simulation Models.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
SIDE CHANNEL ATTACKS Presented by: Vishwanath Patil Abhay Jalisatgi.
Ensembles of Classifiers Evgueni Smirnov
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Regression Analysis (2)
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
9th IMA Conference on Cryptography & Coding Dec 2003 More Detail for a Combined Timing and Power Attack against Implementations of RSA Werner Schindler.
Chapter 9 – Classification and Regression Trees
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Experimental Evaluation of Learning Algorithms Part 1.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Utilizing Performance Monitors for Compromising keys of RSA on Intel Platforms Sarani Bhattacharya and Debdeep Mukhopadhyay Dept. of Computer Science and.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CpSc 810: Machine Learning Evaluation of Classifier.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Analysis of Optimal and Suboptimal Discrete-Time Digital Communications Receivers Clemson University SURE Program 2005 Justin Ingersoll, Prof. Michael.
Academic Research Academic Research Dr Kishor Bhanushali M
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
A paper by: Paul Kocher, Joshua Jaffe, and Benjamin Jun Presentation by: Michelle Dickson.
Computer and Robot Vision II Chapter 20 Accuracy Presented by: 傅楸善 & 王林農 指導教授 : 傅楸善 博士.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1 Information Security – Theory vs. Reality , Winter Lecture 3: Power analysis, correlation power analysis Lecturer: Eran Tromer.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
- A Maximum Likelihood Approach Vinod Kumar Ramachandran ID:
Probability Theory and Parameter Estimation I
The general linear model and Statistical Parametric Mapping
Ch9: Decision Trees 9.1 Introduction A decision tree:
Statistical Models for Automatic Speech Recognition
Synaptic Dynamics: Unsupervised Learning
Clustering Evaluation The EM Algorithm
Data Mining Practical Machine Learning Tools and Techniques
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
The general linear model and Statistical Parametric Mapping
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Presentation transcript:

Template attacks Suresh Chari, Josyula R. Rao, Pankaj Rohatgi IBM Research

Side channel attacks Lots of sources: Power, EM, timing …….. Problem one of signal classification Which possible value of key bit/byte/nibble does signal correspond to? Many types of attacks(SPA/DPA & variants) Use coarse statistical methods Collect lots of samples to eliminate noise Differentiate based on expected signal.

Signal classification  Technique relies on precise modeling of noise (inherent statistical variations + other ambient noise) AND expected signal  Requires experimentation(offline)  Technique based on Signal Detection and estimation theory o Powerful statistical methods o Tools to classify a single sample.

Template attacks: overview Need single target sample from device under test Need programmable identical device. Build precise model of noise AND expected signal for all possible values of first portion of key Use statistical characterization to restrict first portion to small subset of possible values. Iterate to retain small number of possible values for entire key. Strong statistical methods extract ALL information from each target sample.

Plan Test case: RC4 Noise Modeling Classification Technique Variants Empirical Results Related work

Test case-RC4 Implementation: RC4 on smart card. Representative example for template attacks. Single sample of initialization with key. State changes on each invocation. Similar approach for most crypto algorithms. Other cases: hardware based DES, EM on SSL accelerators.

RC4- Initialization with key Simple implementation with no key dependent code (No SPA) No DPA possible due to single sample. Ideal for template attacks: Key byte independently affects iteration. i= j = 0; For(ctr=0, ctr < 256, ctr++) { j = key[i] + state[ctr] + j; SwapByte(state[ctr], state[j] ); i=i+1; }

Sample

Methodology Collect single sample of key initialization from device under test. With experimental device, collect large number(100s) of samples with all values of first key byte. Identify points on samples of first iteration directly affected by first key byte. For each distribution compute precise statistical characterization Use to classify target sample

Model Assumption: Gaussian model for noise. Noise characterization: Given L-point samples for a particular value of key compute Averages L point average A M Noise correlation matrix (L x L) M[i,j] = covariance( T[i]-A[i], T[j]-A[j]) for samples T Compute characterization for each of K values of the key byte. Probability of observing noise vector n for a sample from this distribution is inverse exponential in n T M – 1 n

Maximum likelihood Classification: Among K distributions, classify target sample S as belonging to distribution predicting highest probability for noise vector “ Best ” classifier in information theoretic sense. For binary hypotheses case, with same noise covariance error is inverse exponential in sqrt( (A 1 - A 2 ) T N -1 (A 1 – A 2 ) )

Classification Univariate Statistics Assume sample at points is independent Good results when keys are very different Not good if keys are close. Multivariate statistics: Assume points are correlated. Very low classification errors. Error of not identifying correct hypothesis is less than 5-6 %

Empirical result Correct Classification percentage improves dramatically Keys chosen to be very close Key byte 0xFE0xEE0xDE0xBE0x7E0xFD0xFB0xF70xED0xEB

Improvement Maximum Likelihood: Retain hypothesis predicting max probability for observed noise (P max ) Approximation: Retain ALL hypotheses predicting probability at least ( P max /c), c constant. Retain more than 1 hypothesis for each byte. Tradeoff between number of hypothesis retained and correctness.

Empirical Results Size c=1 Size c=e 6 Size c=e 12 Size c=e 24 Success probability Avg. number of hypothesis retained

Iteration: Extend and prune For each remaining possible value of first byte For each value of second byte Build template independently ONLY for second iteration ( less accurate) OR Build template for first 2 iterations together (twice as large) Classify using new template to reduce choices for first 2 bytes

Iteration: Empirical Result Using templates independently in each stage reduces entropy in RC4 case to about (1.5) k for k bytes of key Substantially better when templates include sample for all iterations upto now Error rates of not retaining correct hypothesis is almost same as single byte case. Number of retained hypothesis is smaller Able to correct previous bytes: After 2 iterations of attack no hypothesis with wrong first byte.

Related work [Messerges,Dabbish,Sloan][Walter] Use signal based iterative method based to extract exponent of device implementing RSA. [Fahn, Pearson] Use profiling of experimental device before attack on device-under-test. Signal based classification methods.

Countermeasures Use randomness as much as possible. Blinding/masking of data Templates can be built for masked data values Not feasible if lots of entropy. Caveat: Vulnerable if attacker has control of random source in experimental device.

Summary Formalized new type of attack. Powerful methodology Works with single/few samples Requires extensive work with experimental device Experimental results Works where SPA/DPA are not feasible

BACKUP

Averaging 5 samples

Averaging 50 samples

Univariate approximation Assume that the sample at each point is independent of other points Simplifies probability of observing noise Inverse exponential in n T M – 1 n (which is just sum of squares) Classification: Use maximum likelihood with simplified characterization Classification error is high in some cases but can distinguish very different keys.

Empirical Results Cross Classification probability.  Low error if key bytes have very different Hamming weights.  Possibility of high error in other cases

Intuition Samples and expected signals can be viewed as points in some L dimensional space. Approximation: Starting from received signal point keep all hypothesis falling in ball around received samples For binary case, classification error proportional to sqrt(1/c).

Other cases Template attacks verified in other cases EM emanations from hardware SSL accelerators Single sample noisy analogue of earlier work. Hardware based DES Attacking key checksum verification steps Other cases under investigation