Secret Codes, CD players, and Missions to Mars: An Introduction to Cryptology and Coding Theory Sarah Spence Adams Associate Professor of Mathematics Franklin W. Olin College of Engineering
AKA Sit back, relax, and learn something new while your combinatorics knowledge marinates for the test
Cryptology Cryptography Inventing cipher systems Cryptanalysis Breaking cipher systems Bob Alice Eve
Hidden Messages Through the Ages 440 BC – The Histories of Herodotus messages concealed beneath wax on wooden tablets tattoos on a slave's head concealed by regrown hair WWII – microdots Modern day – hidden information within digitial pictures
The Scytale of Ancient Greece Used by the Spartan military 5 th Century B.C.
Caesars Substitution Cipher Example Plaintext: OLINCOLLEGE Encryption: Shift forward by 3 Ciphertext: ROLQFROOHJH Decryption: Shift backwards by 3 (or forwards by 23) = 3 1 st Century B.C.
Modular Arithmetic 15 mod 12 = 3 27 mod 26 = 1 90 mod 10 = 0 a mod m is the remainder obtained upon dividing a by m Can be obtained by subtracting off multiples of m from a until the result is between 0 and m 1
Caesar Using Modular Arithmetic Map A,B,C,…Z to 0,1,2,…,25 View cipher as function e(p i ) = p i + k (mod 26) Example with key = k = 15 Plaintext: OLINCOLLEGE=14,11,8,13,2,14,11,11,4,6,4 Encryption: e(p i ) = p i + 15 (mod 26) Ciphertext: 3,0,23,2,17,3,0,0,19,21,19 = DAXCRDAATVT Decryption: d(c i ) = c i -15 (mod 26) or d(c i ) = c i +11 (mod 26) p
Kerckhoffs Principle Must assume the enemy knows the system Also known as Shannons Maxim
Cryptanalysis of Substitution Ciphers Caesars 26 shifts to test Generally 26! permutations to test If parsed, guess at common words or cribs Frequency analysis
Frequency Analysis (Al-Kindi ~850A.D.)
The Letter E
Vigenère Cipher (1586) Polyalphabetic cipher Involves multiple Caesar shifts Example Plaintext: O L I N C O L L E G E Key: S U N S U N S U N S U Encryption: e(p i ) = p i + k i (mod 26) Decryption: d(c i ) = c i - k i (mod 26)
One-Time Pads: The Ultimate Substitution Cipher Plaintext: MATHISUSEFULANDFUN Key: NGUJKAMOCTLNYBCIAZ Encryption: e(p i ) = p i + k i (mod 26) Ciphertext: BGO….. Decryption: d(c i ) = c i - k i (mod 26)
One-Time Pads Unconditionally secure Problem: Exchanging the key
Public-Key Cryptography Diffie & Hellman (1976) Known at GCHQ years before Uses one-way (asymmetric) functions, public keys, and private keys
Public Key Algorithms Based on two hard problems (traditionally) Factoring large integers (RSA) The discrete logarithm problem (ElGamal)
WWII: The Weather- Beaten Enigma 3 x ciphering possibilities; polyalphabetic Destroyed frequency counts Cracked thanks to traitors, captured machines, mathematicians, and human error
Not only do you want secrecy… …but you also want reliability! Enter coding theory…..
Communication System Digital SourceDigital Sink Source Encoding Source Decoding EncryptionDecryption Error Control Encoding Error Control Decoding ModulationChannelDemodulation
What is Coding Theory? Coding theory is the study of error-control codes, which are used to detect and correct errors that occur when data are transferred or stored
What IS Coding Theory? A mix of mathematics, computer science, electrical engineering, telecommunications Linear algebra Abstract algebra (groups, rings, fields) Probability&Statistics Signals&Systems Implementation issues Optimization issues Performance issues
General Problem We want to send data from one place to another over some channel telephone lines, internet cables, fiber-optic lines, microwave radio channels, cell phone channels, etc. or we want to write and later retrieve data… channels: hard drives, disks, CD-ROMs, DVDs, solid state memory, etc. BUT the data may be corrupted hardware malfunction, atmospheric disturbances, attenuation, interference, jamming, etc.
General Solution Introduce controlled redundancy to the message to improve the chances of recovering the original message Trivial example: The telephone game
Introductory Example We want to communicate YES or NO Message 1 represents YES Message 0 represents NO If I send my message and there is an error, then you will decode incorrectly
Repetition Code of Length 2 Encode message 1 as codeword 11 Encode message 0 as codeword 00 If one error occurs, you can detect that something went wrong
Repetition Code of Length 5 Encode message 1 as codeword Encode message 0 as codeword If up to two errors occur, you can correct them using a majority vote
Evaluating and Comparing Codes Important Code Parameters Minimum distance Determines error-control capability Code rate Ratio of information bits to codeword bits Measure of efficiency Length of codewords Number of codewords
Complicated Problem Want Large minimum distance for reliability Large number of codewords High rate for efficiency Conflicting goals Require trade-offs Inspire more sophisticated mathematical solutions
Inherent Trade-offs What are the ideal trade-offs between rate, error-correcting capability, and number of codewords? What is the biggest distance you can get given a fixed rate or fixed number of codewords? What is the best rate you can get given a fixed distance or fixed number of codewords?
The ISBN-10 Code x 1 x 2 … x 10 x 10 is a check digit chosen so that S x 1 + 2x 2 + … + 9x x 10 0 mod 11 If check digit should be 10, use X instead Can detect all single and all transposition errors
ISBN-10 Example Cryptology by Thomas Barr: ? Want 1(0) + 2(1) + 3(3) + 4(0) + 5(8) + 6(8) + 7(9) + 8(7) + 9(6) + 10(?) = multiple of 11 Compute 1(0) + 2(1) + 3(3) + 4(0) + 5(8) + 6(8) + 7(9) + 8(7) + 9(6) = 272 Ponder (?) = multiple of 11 The check digit must be 8
New ISBN-13 x 1 x 2 … x 13 x 13 = 10- ( (1x 1 + 3x 2 + … + 1x x 12 ) mod 10 If check digit should be 10, use 0 instead Convert ISBN-10 to ISBN-13 using 978 prefix and new check digit
Universal Product Code (UPC)
First number: Type of product 0, 6, 7: Standard 2: Random-weight items (fruits, meats, etc) 4: In-store 5: Coupons Next chunk: Manufacturer identification Next chunk: Item identification Last number: Check digit
UPC Check Digit x 1 x 2 … x 12 x 12 is a check digit chosen so that S = 3x 1 + 1x 2 + … + 3x x 12 0 mod 10 Can detect all single and most transposition errors Which transposition errors go undetected?
Hadamard Code Used in NASA Mariner Missions Pictures divided into pixels Pixels are assigned a level a darkness on a scale of 0 to 63
Binary Representation of Messages Express 64 levels of darkness (our messages) using binary strings …
Hadamard Matrices Map length 6 messages to length 32 codewords obtained from rows of Hadamard matrices Hadamard matrices (Sylvester, 1867) have special properties that give these codewords a minimum distance of 16 HH T = n I
Compare with Length 5 Repetition Code Send a message of length 6 Probability of bit error p=.01 Length 5 Repetition Code Sending 6 info bits requires 30 coded bits: Rate = 6/30 P(decode incorrectly) = Hadamard Code Sending 6 info bits requires 32 coded bits: Rate = 6/32 P(decode incorrectly) =
Voyager Missions (1980s-90s) Reed-Solomon codes use abstract algebra to get even better results Ideals of polynomial rings (Dedekind,1876) Same codes used to protect CDs from scratches!
Summary Cryptology and coding theory are all around us From Caesar to RSA…. from Repetition to Reed-Solomon Codes…. More sophisticated mathematics better ciphers/codes New uses for old mathematics, motivation for new mathematics Cryptology has existed for thousands of years… what ciphers and codes will be next?
Jefferson Disk - Bazeries Cylinder Alice rotates wheels to spell message in one row sends any other row of text Used by US Army from early 1920s to early 1940s Alice and Bob agree on order of disks – the key Bob spells out the ciphertext on wheel looks around the rows until he sees the (coherent) message
The Code Talkers Refers primarily to Navajo speakers in WWII Only unbroken wartime cipher Fast and efficient Interesting connections between cracking secret ciphers and cracking ancient languages
Major Questions What are the ideal trade-offs between rate, error-correcting capability, and number of codewords? What is the biggest distance you can get given a fixed rate or fixed number of codewords? What is the best rate you can get given a fixed distance or fixed number of codewords?
A Parity Check Code Suppose we want to send 4 messages 00, 01, 10, 11 Form codewords by appending a parity check bit to the end of each message Compare with length 2 repetition code Both detect all single errors using minimum distance 2 Now rate 2/3 compared with rate 1/2 Now 4 codewords compared with 2 codewords