Keystroke Biometrics Study Software Engineering Project Team + DPS Student
2 Keystroke Biometric As with other biometrics, the keystroke one is becoming important for security apps Advantage - inexpensive and easy to implement, the only hardware needed is a keyboard Disadvantage - behavioral rather than physiological biometric, easy to disguise One of the least studied biometrics, thus good for dissertation studies
3 Focus of Study Previous studies mostly concerned short character string input Password hardening Short name strings We focus on large text input 200 or more characters per sample
4 Focus of Study (cont) Applications of interest Identification 1-of-n classification problem e.g., sender of inappropriate in a business environment with a limited number of employees Verification Binary classification problem, yes/no e.g., student taking online exam
5 Software Components Raw Keystroke Data Capture over the Internet (Java applet) Feature Extraction (SAS software) Classification (SAS software) Training Testing
6 Keystroke Data Capture (Java Applet) Raw data recorded for each entry Key’s character Key’s code text equivalent Key’s location on keyboard 1 = standard, 2 = left, 3 = right Time key was pressed (msec) Time key was released (msec) Number of left, right, double mouse clicks
7 Keystroke Data Capture (Java Applet)
8 Aligned Raw Data File (Hello World!)
9 SAS Statistical Software: Feature Extraction & Classification Powerful tool with its own programming language and development environment Data management Relational database built-in Many data manipulation functions Statistical analysis Library of procedures to do a wide variety of statistical analyses
10 Feature Extraction 10 Mean and 10 Std of key press durations 8 most frequent alphabet letters (e, a, r, i, o, t, n, s) Space & shift keys 10 Mean and 10 Std of key transitions 8 most common digrams (in, th, ti, on, an, he, al, er) Space-to-any-letter & any-letter-to-space 15 Total number of keypresses for Space, backspace, delete, insert, home, end, enter, ctrl, 4 arrow keys combined, shift (left), shift (right), total entry time, left, right, & double mouse clicks
11 Feature Measurement Sample
12 Feature Extraction Preprocessing Outlier removal Remove samples > 2 std from mean Prevents skewing of feature measurements caused by pausing of the keystroker Standardization x’ = (x - xmin) / (xmax - xmin) Scales to range 0-1 to give roughly equal weight to each feature
13 Classification Identification Nearest neighbor classifier using Euclidean distance Input sample compared to every training sample Verification Dichotomizer (feature difference model) Train with neural network
14 Experimental Design: Identification Experiment 15 subjects that know the purpose of exp. Training – 5 reps of text a (approx. 600 char) Testing 5 reps of text a 5 reps of text b (same length as text a) 5 reps of text c (half length of text a) 28 subjects don’t know purpose of input Subset of above training/testing data Also, arbitrary text input of reasonable length
15 Experimental Design: Instructions for Subjects All subjects will be told to make any necessary corrections to the input data (texts a, b, and c are Aesop fables) Knowing subjects will be told to input the data using their normal keystroke dynamics The experiments are designed so that subjects leave at least a day between entering samples
16 Experimental Design: Text a – about 600 characters This is an Aesop fable about the bat and the weasels. A bat who fell upon the ground and was caught by a weasel pleaded to be spared his life. The weasel refused, saying that he was by nature the enemy of all birds. The bat assured him that he was not a bird, but a mouse, and thus was set free. Shortly afterwards the bat again fell to the ground and was caught by another weasel, whom he likewise entreated not to eat him. The weasel said that he had a special hostility to mice. The bat assured him that he was not a mouse, but a bat, and thus a second time escaped. The moral of the story: it is wise to turn circumstances to good account.
17 Experimental Design (cont) Verification Basically the same as for identification The training and testing data consists of various text input samples collected over a period of approximately 10 weeks
18 Expected Outcomes: Recognition Accuracy Accuracy on text a > that on text b text a is the training text Accuracy on text b > that on text c text b is longer than text c Accuracy on texts a, b, c > arbitrary text texts a, b, & c are similar, all Aesop fables Accuracy on knowing subjects > that on unknowing ones Knowing subjects are more likely to use their normal keystroke dynamics for all input
19 Expected Outcomes: Analysis of Experimental Results Feature analysis – which are better? Key press durations or transitions More or less frequent letters/digrams Other feature measurements Determine the spread (std) of feature measurements within versus across subjects
20 Preliminary Results Reduced identification experiment Smaller text input “The quick brown fox jumps over the lazy dog.” Fewer subjects Three project team members Fewer feature measurements Mean and std for “e” and “o” key press durations Accuracy of 80%, which is promising
21 Questions/Comments? Focus or applications? Software implementation? Experimental design? Expected experimental outcomes?