Hashes and Message Digest Hash is also called message digest One-way function: d=h(m) but no h’(d)=m –Cannot find the message given a digest Cannot find m 1, m 2, where d 1 =d 2 Arbitrary-length message to fixed-length digest Randomness –any bit in the outputs ‘1’ half the time –each output: 50% ‘1’ bits
Birthday Problem How many people do you need so that the probability of having two of them share the same birthday is > 50% ? Random sample of n birthdays (input) taken from k (365, output) k n total number of possibilities (k) n =k(k-1)…(k-n+1) possibilities without duplicate birthday Probability of no repetition: –p = (k) n /k n 1 - n(n-1)/2k For k=366, minimum n = 23 n(n-1)/2 pairs, each pair has a probability 1/k of having the same output n(n-1)/2k > 50% n>k 1/2
How Many Bits for Hash? m bits, takes 2 m/2 to find two with the same hash 64 bits, takes 2 32 messages to search (doable) Need at least 128 bits
Using Hash for Authentication Alice to Bob: challenge r A Bob to Alice: MD(K AB |r A ) Bob to Alice: r B Alice to Bob: MD(K AB |r B ) Only need to compare MD results
Using Hash to Encrypt One-time pad with K AB –Compute bit streams using MD, and K b 1 =MD(K AB ), b i =MD(K AB |b i-1 ), … – with message blocks –Add a random 64 bit number (aka IV) b 1 =MD(K AB |IV), b i =MD(K AB |b i-1 ), …
General Structure of Secure Hash Code Iterative compression function –Each f is collision-resistant, so is the resulting hashing
MD5: Message Digest Version 5 input Message Output 128 bits Digest Until recently the most widely used hash algorithm –in recent times have both brute-force & cryptanalytic concerns Specified as Internet standard RFC1321
MD5 Overview
1.Pad message so its length is 448 mod Append a 64-bit original length value to message 3.Initialise 4-word (128-bit) MD buffer (A,B,C,D) 4.Process message in 16-word (512-bit) blocks: –Using 4 rounds of 128 bit operations on message block & buffer –Add output to buffer input to form new buffer value 5.Output hash value is the final buffer value
Padding Twist Given original message M, add padding bits “10 * ” such that resulting length is 64 bits less than a multiple of 512 bits. Append (original length in bits mod 2 64 ), represented in 64 bits to the padded message Final message is chopped 512 bits a block
MD5 Process As many stages as the number of 512-bit blocks in the final padded message Digest: 4 32-bit words: MD=A|B|C|D Every message block contains bit words: m 0 |m 1 |m 2 …|m 15 –Digest MD 0 initialized to: A= ,B=89abcdef,C=fedcba98, D= –Every stage consists of 4 passes/rounds over the message block, each modifying MD Each block 4 rounds, each round 16 passes
Processing of Block m i - 4 Passes ABCD=f F (ABCD,m i,T[1..16]) ABCD=f G (ABCD,m i,T[17..32]) ABCD=f H (ABCD,m i,T[33..48]) ABCD=f I (ABCD,m i,T[49..64]) mimi ++++ A B CD MD i MD i+1
Each Block Has 4 Rounds and 64 Steps Each step t (0 <= t <= 63): Input: –m t – a 32-bit word from the message With different shift every round –T t – int(2 32 * abs(sin(i))), 0<i<65 Provided a randomized set of 32-bit patterns, which eliminate any regularities in the input data –ABCD: current MD Output: –ABCD: new MD
MD5 Compression Function
Secure Hash Algorithm Developed by NIST, specified in the Secure Hash Standard (SHS, FIPS Pub 180), 1993 SHA is specified as the hash algorithm in the Digital Signature Standard (DSS), NIST
General Logic Input message must be < 2 64 bits –not really a problem Message is processed in 512-bit blocks sequentially Message digest is 160 bits SHA design is similar to MD5, but a lot stronger
Basic Steps Step1: Padding Step2: Appending length as 64 bit unsigned Step3: Initialize MD buffer 5 32-bit words Store in big endian format, most significant bit in low address A|B|C|D|E A = B = efcdab89 C = 98badcfe D = E = c3d2e1f0
Basic Steps... Step 4: the 80-step processing of 512-bit blocks – 4 rounds, 20 steps each. Each step t (0 <= t <= 79): –Input: W t – a 32-bit word from the message K t – a constant. ABCDE: current MD. –Output: ABCDE: new MD.
SHA-1 verses MD5 Brute force attack is harder (160 vs 128 bits for MD5) Not vulnerable to any known cryptanalytic attacks (compared to MD4/5) A little slower than MD5 (80 vs 64 steps) –Both work well on a 32-bit architecture Both designed as simple and compact for implementation
Revised Secure Hash Standard NIST have issued a revision FIPS adds 3 additional hash algorithms SHA-256, SHA-384, SHA-512 designed for compatibility with increased security provided by the AES cipher structure & detail is similar to SHA-1 hence analysis should be similar
Outline User authentication –Password authentication, salt –Challenge-Response –Biometrics –Token-based authentication Authentication in distributed systems (multi service providers/domains) –Single sign-on, Microsoft Passport –Trusted Intermediaries
Password authentication Basic idea –User has a secret password –System checks password to authenticate user Issues –How is password stored? –How does system check password? –How easy is it to guess a password? Difficult to keep password file secret, so best if it is hard to guess password even if you have the password file
Basic password scheme Password fileUser exrygbzyf kgnosfix ggjoklbsz … kiwifruit hash function
Basic password scheme Hash function h : strings strings –Given h(password), hard to find password –No known algorithm better than trial and error User password stored as h(password) When user enters password –System computes h(password) –Compares with entry in password file No passwords stored on disk
Unix password system Hash function is 25xDES –25 rounds of DES-variant encryptions Password file is publicly readable –Other information in password file … Any user can try “dictionary attack” –User looks at password file –Computes hash(word) for every word in dictionary “Salt” makes dictionary attack harder R.H. Morris and K. Thompson, Password security: a case history, Communications of the ACM, November 1979
Salt Password line walt:fURfuu4.4hY0U:129:129:Belgers:/home/walt:/bin/csh 25x DES Input Salt Key Constant, A 64-bit block of 0 Plaintext Ciphertext Compare When password is set, salt is chosen randomly 12-bit salt slows dictionary attack by factor of 2 12
Dictionary Attack – some numbers Typical password dictionary – 1,000,000 entries of common passwords people's names, common pet names, and ordinary words. –Suppose you generate and analyze 10 guesses per second This may be reasonable for a web site; offline is much faster –Dictionary attack in at most 100,000 seconds = 28 hours, or 14 hours on average If passwords were random –Assume six-character password Upper- and lowercase letters, digits, 32 punctuation characters 689,869,781,056 password combinations. Exhaustive search requires 1,093 years on average
Challenge-response Authentication Goal: Bob wants Alice to “prove” her identity to him Protocol ap1.0: Alice says “I am Alice” Failure scenario?? “I am Alice”
Authentication Goal: Bob wants Alice to “prove” her identity to him Protocol ap1.0: Alice says “I am Alice” in a network, Bob can not “see” Alice, so Trudy simply declares herself to be Alice “I am Alice”
Authentication: another try Protocol ap2.0: Alice says “I am Alice” in an IP packet containing her source IP address Failure scenario?? “I am Alice” Alice’s IP address
Authentication: another try Protocol ap2.0: Alice says “I am Alice” in an IP packet containing her source IP address Trudy can create a packet “spoofing” Alice’s address “I am Alice” Alice’s IP address
Authentication: another try Protocol ap3.0: Alice says “I am Alice” and sends her secret password to “prove” it. Failure scenario?? “I’m Alice” Alice’s IP addr Alice’s password OK Alice’s IP addr
Authentication: another try Protocol ap3.0: Alice says “I am Alice” and sends her secret password to “prove” it. playback attack: Trudy records Alice’s packet and later plays it back to Bob “I’m Alice” Alice’s IP addr Alice’s password OK Alice’s IP addr “I’m Alice” Alice’s IP addr Alice’s password
Authentication: yet another try Protocol ap3.1: Alice says “I am Alice” and sends her encrypted secret password to “prove” it. Failure scenario?? “I’m Alice” Alice’s IP addr encrypted password OK Alice’s IP addr
Authentication: another try Protocol ap3.1: Alice says “I am Alice” and sends her encrypted secret password to “prove” it. record and playback still works! “I’m Alice” Alice’s IP addr encryppted password OK Alice’s IP addr “I’m Alice” Alice’s IP addr encrypted password
Authentication: yet another try Goal: avoid playback attack Failures, drawbacks? Nonce: number (R) used only once –in-a-lifetime ap4.0: to prove Alice “live”, Bob sends Alice nonce, R. Alice must return R, encrypted with shared secret key “I am Alice” R K (R) A-B Alice is live, and only Alice knows key to encrypt nonce, so it must be Alice!
Authentication: ap5.0 ap4.0 doesn’t protect against server database reading can we authenticate using public key techniques? ap5.0: use nonce, public key cryptography “I am Alice” R Bob computes K (R) A - (K (R)) = R A - K A + and knows only Alice could have the private key, that encrypted R such that (K (R)) = R A - K A +
Biometrics Use a person’s physical characteristics –fingerprint, voice, face, keyboard timing, … Advantages – Cannot be disclosed, lost, forgotten Disadvantages –Cost, installation, maintenance –Reliability of comparison algorithms False positive: Allow access to unauthorized person False negative: Disallow access to authorized person –Privacy? –If forged, how do you revoke?
Biometrics Common uses –Specialized situations, physical security –Combine Multiple biometrics Biometric and PIN Biometric and token
Token-based authentication Smart Card With embedded CPU and memory Various forms –PIN protected memory card Enter PIN to get the password –Cryptographic challenge/response cards A cryptographic key in memory Computer create a random challenge Enter PIN to encrypt/decrypt the challenge w/ the card
Smart Card Example Some complications –Initial data shared with server Need to set this up securely Shared database for many sites –Clock skew ChallengeTime function Time Initial data
Outline User authentication –Password authentication, salt –Challenge-Response –Biometrics –Token-based authentication Authentication in distributed systems –Single sign-on, Microsoft Passport –Trusted Intermediaries
Single sign-on systems e.g. Securant, Netegrity, Oblix Rules Authentication Application Database Server LAN user name, password, other auth Advantages –User signs on once –No need for authentication at multiple sites, applications –Can set central authorization policy for the enterprise
Microsoft Passport Launched 1999 –Claim > 200 million accounts in 2002 –Over 3.5 billion authentications each month Log in to many websites using one account –Used by MS services Hotmail, MSN Messenger or MSN subscriptions; also Radio Shack, etc. –Hotmail or MSN users automatically have Microsoft Passport accounts set up Passport may continue to evolve; bugs have been uncovered
Four parts of Passport account Passport Unique Identifier (PUID) –Assigned to the user when he or she sets up the account User profile, required to set up account –Phone number or Hotmail or MSN.com address –Also name, ZIP code, state, or country, … Credential information – address or phone number –Minimum six-character password or PIN –Four-digit security key, used for a second level of authentication on sites requiring stronger sign-in credentials Wallet –Passport-based application at passport.com domain –E-commerce sites with Express Purchase function use wallet information rather than prompt the user to type in data
Passport log-in
Trusted Intermediaries Symmetric key problem: How do two entities establish shared secret key over network? Solution: trusted key distribution center (KDC) acting as intermediary between entities Public key problem: When Alice obtains Bob’s public key (from web site, , diskette), how does she know it is Bob’s public key, not Trudy’s? Solution: trusted certification authority (CA)
Key Distribution Center (KDC) Alice, Bob need shared symmetric key. KDC: server shares different secret key with each registered user (many users) Alice, Bob know own symmetric keys, K A-KDC K B-KDC, for communicating with KDC. K B-KDC K X-KDC K Y-KDC K Z-KDC K P-KDC K B-KDC K A-KDC K P-KDC KDC
Key Distribution Center (KDC) Alice knows R1 Bob knows to use R1 to communicate with Alice Alice and Bob communicate: using R1 as session key for shared symmetric encryption Q: How does KDC allow Bob, Alice to determine shared symmetric secret key to communicate with each other? KDC generates R1 K B-KDC (A,R1) K A-KDC (A,B) K A-KDC (R1, K B-KDC (A,R1) )
Certification Authorities Certification authority (CA): binds public key to particular entity, E. E (person, router) registers its public key with CA. –E provides “proof of identity” to CA. –CA creates certificate binding E to its public key. –certificate containing E’s public key digitally signed by CA – CA says “this is E’s public key” Bob’s public key K B + Bob’s identifying information digital signature (encrypt) CA private key K CA - K B + certificate for Bob’s public key, signed by CA
Certification Authorities When Alice wants Bob’s public key: –gets Bob’s certificate (Bob or elsewhere). –apply CA’s public key to Bob’s certificate, get Bob’s public key Bob’s public key K B + digital signature (decrypt) CA public key K CA + K B +
Single KDC/CA Problems –Single administration trusted by all principals –Single point of failure –Scalability Solutions: break into multiple domains –Each domain has a trusted administration
Multiple KDC/CA Domains Secret keys: KDCs share pairwise key topology of KDC: tree with shortcuts Public keys: cross-certification of CAs example: Alice with CA A, Boris with CA B –Alice gets CA B ’s certificate (public key p 1 ), signed by CA A –Alice gets Boris’ certificate (its public key p 2 ), signed by CA B (p 1 )
Backup Slides
MD5 Compression Function Each round has 16 steps of the form: a = b+((a+g(b,c,d)+X[k]+T[i])<<<s) a,b,c,d refer to the 4 words of the buffer, but used in varying permutations –note this updates 1 word only of the buffer –after 16 steps each word is updated 4 times where g(b,c,d) is a different nonlinear function in each round (F,G,H,I)
Functions and Random Numbers F(x,y,z) == (x y) (~x z) –selection function G(x,y,z) == (x z) (y ~ z) H(x,y,z) == x y z I(x,y,z) == y (x ~z)
Basic Steps... Only 4 per-round distinctive additive constants 0 <=t<= 19 K t = 5A <=t<=39 K t = 6ED9EBA1 40<=t<=59 K t = 8F1BBCDC 60<=t<=79 K t = CA62C1D6
Advantages of salt Without salt –Same hash functions on all machines Compute hash of all common strings once Compare hash file with all known password files With salt –One password hashed 2 12 different ways Precompute hash file? –Need much larger file to cover all common strings Dictionary attack on known password file –For each salt found in file, try all common strings