User Authentication Using Keystroke Dynamics Jeff Hieb & Kunal Pharas ECE 614 Spring 2005 University of Louisville
Three types of authentication Something you know. A password Something you have. An ID card or badge Something you are. Biometrics
Biometrics measure physical or behavioral characteristics of an individual. –Physical (do not change over time): –Fingerprint, iris pattern, hand geometry –Behavioral (may change over time): –Signature, speech pattern, keystroke pattern
Keystroke biometrics A keystroke dynamic is based on the assumption that each person has a unique keystroke rhythm. Keystroke features are: –Latency between keystrokes. –Duration of key presses. 4 possible authentication outcomes: i)Genuine individual is accepted. ii)Genuine individual is rejected. iii)Imposter is accepted. iv)Imposter is rejected. Biometric classification accuracy measures i)FRR – false rejection rate (ii) ii)FAR – false acceptance rate (iii) iii)EER – equal error rate FRR = FAR
Methods for classifying keystroke rhythms Statistical / probabilistic approaches Data Mining Techniques Neural Networks a)EBP networks b)CPNN (based on SOM) c)ART2 networks (unsupervised learning) d)LVQ networks e)RBFN
Project Description Authenticate users based on the keystroke times captured while typing their name. Use EBP to train a neural network to generate a user identification that can be compared to a known user identification. Result of the system will be either authentication failed or authentication successful.
Methodology flowchart
Implementation Capturing keystrokes: GUI in C# –Requirements Near microsecond accuracy (HiPerfTimer) Enrollment times and labels Authentication using captured times. Remote call Matlab to processes times. Processing Data, Matlab –Subroutines needed Error back propagation Evaluate a vector of authentication times using trained network Normalization of training times Normalization of authentication times
Capturing Training Times Time the interval between successive key_up and key_down events, keystroke latency. Maximum of 50 time intervals can be captured and stored. Unused elements are set to 0. User must correctly type name or trial is thrown out. Training times are stored in a text file. Additional training times are appended to this file. An enrollment is comprised of 7 successful (correct name typed) captures. After enrollment the neural network is retrained.
Labeling training times Each user is represented by a binary string –Ex. User Jeff Hieb: User Kunal Pharas:0 1 0 User Suman:0 0 1 Training labels are stored in a text file: Each line in the file is the user label for the same line in the training file. Additional training labels are appended to this file. When a new user enrolls a 0 is appended to all existing user labels in the file.
Training Data Files Sample of training times file: Sample of training labels file:
Training the Neural Network GUI calls Matlab function EBP(filename) where filename denotes the training times and training labels. EBP normalizes the data and stores the normalization parameters in a file Number of output neurons is determined by the training labels, 5 users 5 output neurons. Output layer uses uni-polar activation function. Trained weights are stored in file.
Authentication Capture keystrokes using same procedure as before. If user mistypes name, authentication fails, but user is informed why and trial is discarded. GUI calls matlab function evaluate(filename) where filename is a file containing the captured times. Evaluate normalizes the data using the parameters stored during training Evaluate then uses the stored weights to produce the output of the network, which are returned The GUI maps the network output to a string of 0’s and 1’s. If f(net) is greater than alpha (i.e..95) then the value is 1, otherwise the value is 0. This string is then compared to the desired user string. If there is a match, authentication is successful, other wise authentication fails.
Keystroke capture and authentication GUI
Testing and Results Enrolled 7 users (49 training pairs). Each user had at least 3 authentication attempts (total of 45 authentication trials). 42 imposter trials. The majority of imposter authentication attempts were made by us. Many authentication trials are for one user.
Plot of Normalized Training Times
Effect of hidden layers on accuracy Alpha =.95 C =.2 Emax =.0005
Effect of Training error on accuracy Alpha =.95 C =.2 Hidden Neurons = 24
Overall Classifier Accuracy Max error =.0005 C =.2 Hidden Neurons = 24 Best performance Alpha =.75 FRR = 7% FAR = 30%
Conclusions For users short name (less than 8 characters) or with long latency (not proficient typists) circumvention was high. Creating an interface that is acceptable and easy to use for a wide variety of users is not trivial. Not allowing for typographical errors is irritating to users and may effect acceptance. Don’t require imposter training samples.
Future Research Directions Ways of handling typographical errors. Ways to scale keystroke biometrics to large numbers of users. Explore other methods of evaluations, particularly unsupervised learning. Explore extraction of more sophisticated keystroke features.
Questions ?
References J. Bechtel, “Passphrase authentication based on typing style through an ART 2 Neural network,” IJCIA Vol. 2, No. 2 (2002) pp 1 –22. A. Peacock, “Typing Patters: A Key to User Identification,” IEEE Security and Privacy, September / October 2004, pp L. Araujo, “User Authentication Through Typing Biometrics Features,” IEEE Transactions on Signal Processing, Vol. 53, No. 2, February A. Guven, “Understanding users’ keystroke patters for computer access security,” Computers & Security, Vol. 22, No. 8, 2003, pp F. Monrose “Keystroke dynamics as a biometric for authentication,” Future Generation Computer Systems, Vol. 16, 2000, pp M. Obiadat, “An On-Line Neural Network System for Computer Access Security,” IEEE Transactions On Industrial Electronics, Vol. 40, No. 2, April 1993, pp