Download presentation
Presentation is loading. Please wait.
Published byJasper Park Modified over 9 years ago
1
Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg School of Computer Science and Information Systems Pace University White Plains, NY 10606, USA DPS Defense April 11, 2014
2
Researched Questions This study focuses on biometric authentication using long bursts of arbitrary input and short bursts of fixed input with an improved classification system Long Input: 100 – 1500 characters ( paragraph, couple of sentences, etc. ) Short Input: 10 – 15 characters ( password, pass code, etc.) Arbitrary Input: Open unrestricted text ( up to the users choosing )
3
Research Questions (continued) 1)Can we accurately detect the intruder use of a computer system in an office environment? 2)How does the use of standard applications such as word processing, spreadsheet, browser impact intruder detection? 3)Is an intruder still detectable if using a web browser (low text environment) Purpose of the Study Long Input - Unauthorized User Detection 1)What is the accuracy between the two? 2) Which performs better on long input? 3)Which performs better on short input? 1)What is the detection accuracy of short fixed numeric keypad input? 2)Does the use of specific keypad features improve detection accuracy? Short Keypad Input – Detection Accuracy Classifier Comparison – Multi Match vs. Single Match
4
Background T. Olzak, Keystroke Dynamics: Low Impact Biometric Verification, Sep, 2006 Derived from raw timing data Based on key press duration and transition times Also known as Dwell and Flight time Statistical in nature, mainly Means and Standard Deviations Pre-processing to remove outliers and standardize between 0 – 1 Fallback procedure (Source of Features or Attributes)
5
Background (continued) Wikipedia.org http://en.wikipedia.org/wiki/Computer_keyboard, last updated: March 6, 2012 QWERTYNumeric Keypad Separate features for QWERTY and Keypad Durations and transitions for individual keys, groups of keys, etc. QWERTY: each letter, each number, vowels, consonants, all letters, etc. Keypad: each digit, each operator (+ - * /), all digits, all operators, etc (Target of Features or Attributes)
6
Background (continued) (Pace Classifier: Single Match) Dichotomy Model Uses vector differences Transforms a multi-class problem to a two-class problem K-Nearest Neighbor (k-NN) is used for classification Feature Vector Space 3 subjects, 4 samples Feature Difference Space 18 within, 48 between
7
Background (continued) (Pace Classifier: Multi Match) Authentication Process User Focused Reduction Method (reduces the training space) System performance obtained using the Leave-One-Out method “Left out” test sample is used to create differences of different vectors Each test difference is classified(k-NN) Results are grouped together Authentication decision based on all Feature Reduction Space 6 within, 32 between Feature Vector Space 3 subjects, 4 samples Feature Difference Space 18 within, 48 between
8
Background (continued) Receiver Operating Characteristic Curves (ROC) Historically used in signal detection such as RADAR in distinguishing an actual signal from noise Used in Biometrics to plot the FAR and FRR at various operating points (thresholds) (Performance: ROC Curves, Equal Error Rate) Equal Error Rate (EER) The point on the ROC curve where the FAR and FRR are equal The operating point on the ROC curve where the FAR and FRR intersect ROC CurveFAR / FRR Intersection
9
Data Collection Same subject generated each Type Only used subjects that produced at least 10 samples Rest period of at least one day between sessions 20 Subjects Spreadsheet 3 Microsoft ExcelSessions 10 Per Subject20 Subjects Browser 3 Sessions 10 Per Subject 20 Subjects Text 3 Sessions 10 Per Subject Microsoft Word (Applications)
10
Data Collection (continued) Only “perfect” samples were used (no mistakes) Rest period of at least one day between sessions Data entered into a spreadsheet using right hand 30 Subjects 914 193 7761 4 NumberSessions 20 Per Subject (Numeric Keypad)
11
Features AttributesMean (µ)Standard Deviation (σ)Total QWERTY (Non-Numeric) Durations:53 106 per (Type I and II)Transitions:3570 140 QWERTY (Numeric) Durations:27 54 per (Type I and II)Transitions:2652 104 Keypad Durations:29 58 per (Type I and II)Transitions:128256 512 Totals:298487 974 (Feature Attribute Summary)
12
All Keys Vowels a e i o u All Letters All Non Letters Frequent consonants t n s r h Next Frequent consonants l d c p f Least Frequent consonants m w y b g other Right Letters Left Letters Punctuation., ‘ “ ?! ; : ( ) Shift Left Shift Right Shift Ctrl Left Ctrl Right Ctrl Enter Escape Tab Space Bar Caps Lock Funct Keys Features (continued) (QWERTY Durations) Digits 0 1 2 3 45 6 7 8 9 Arithmetic Operators / * - + = Symbols @ # $ % ^ & - < >
13
space-> shift punct-> shift Non Letter-> Non Letter Any Key-> Any Key space-> any letter Non Letter-> Letter shift-> any letter Letter-> Non Letter any letter-> space any letter-> punct any letter-> enter left->right left->left right->left right->right Letter-> Letter vowel-> vowel e->a cons-> vowel h->e r->e t->i vowel-> cons a->t e->no->r o->n e->s e->r i->n a->n cons-> cons t->h s->t n->d Features (continued) (QWERTY Transitions) Any Digit-> Arithmetic Operators 1-> Arithmetic Operators 2-> Arithmetic Operators 3-> Arithmetic Operators 4-> Arithmetic Operators 5-> Arithmetic Operators 6-> Arithmetic Operators 7-> Arithmetic Operators 8-> Arithmetic Operators 9-> Arithmetic Operators 0-> Arithmetic Operators Any Digit->0 1->0 2->0 3->0 4->05->0 6->0 7->0 8->0 div-> digits Arithmetic Operator-> any digit mult-> digits sub-> digits add-> digits Numeric-> Numeric
14
Keypad Digits with Decimal 0 1 2 3 4 5 6 7 8 9. Arithmetic Operators with Num Lock and Enter Num Lock Enter /* - + All Keys Features (continued) (Keypad Durations) Print Screen, Sys Rq, Scroll Lock, Pause, Break Centerpad Home Page Up Page Dn End Del Ins Four Arrows
15
keypad -> keypad any digit-> any Digit 1->1,2,3…0 2->1,2,3…0 3->1,2,3…0 4->1,2,3…0 5->1,2,3…06->1,2,3…0 7->1,2,3…0 8->1,2,3…0 9->1,2,3…0 0->1,2,3…0 1->digits 2->digits 3->digits 4->digits 5->digits6->digits 7->digits 8->digits 9->digits 0->digits Any Digit-> Arithmetic Operators 1-> Arithmetic Operators 2-> Arithmetic Operators 3-> Arithmetic Operators 4-> Arithmetic Operators 5-> Arithmetic Operators 6-> Arithmetic Operators 7-> Arithmetic Operators 8-> Arithmetic Operators 9-> Arithmetic Operators 0-> Arithmetic Operators div-> digits Arithmetic Operator-> any digit mult-> digits sub-> digits add-> digits Any Key-> Any Key Features (continued) (Keypad Transitions)
16
Results – Long Input Experiments (Equal Error Rate for each input type per Classifier) Spreadsheet BrowserText Multi Match Single Match Multi Match Single Match Multi Match Single Match
17
Results – Long Input Experiments (continued) (ROC Curve for each input type per Classifier) Multi Match Classifier Single Match Classifier
18
Results – Long Input Experiments (continued) Independent Variable 1: Content Type (Spreadsheet v. Browser v. Text) Independent Variable 2: Classifier Conclusion 1: Text > Spreadsheet > Browser (fewer keystrokes) Conclusion 2: New Classifier much better than Old Classifier SpreadsheetBrowser Text Subjects 20 Samples per Subject 10 Total Samples (All Subjects) 200 EER % (New Classifier) 8.14%15.71%5.82% EER % (Old Classifier) 13.57%27.46%12.78% EER Improvement %40.01%42.79%54.46% (Independent Variables for the long input experiments)
19
Results – Short Input Experiments (Equal Error Rate for each keypad experiment per Classifier) 10 Subject 20 Subject30 Subject Multi Match Single Match Multi Match Single Match Multi Match Single Match
20
Results – Short Input Experiments (continued) (ROC Curve for each keypad experiment per Classifier) Multi Match ClassifierSingle Match Classifier 10 - 20: 10 Subjects, 20 samples each 20 - 20: 20 Subjects, 20 samples each 30 - 20: 30 Subjects, 20 samples each
21
Results – Short Input Experiments (continued) Numeric Keypad Subjects 102030 Samples per Subject 20 Total Samples (All Subjects) 200400600 EER % (Multi Match) 5.50%5.65%6.14% EER % (Single Match) 15.56%15.72%14.95% EER Improvement %64.65%64.06%58.93% Independent Variable 1: Number of Subjects Independent Variable 2: Classifier Conclusion 1: EER increases ˄ as Number of Subjects increases * Conclusion 2: New Classifier much better than Old Classifier * Except for old Classifier (Independent Variables for the short input experiments) (but not by much)
22
CMU Experiment - Keypad 914 193 7761 + Enter Key = 11 Characters 10 key-down ---> key-down 10 key-up ---> key-down 11 dwell times 31 Features Carnegie Melon Features (from their numeric keypad study *) (10 key-down ---> key-down) per µ, per σ = 20 (10 key-up ---> key-down) per µ, per σ = 20 (7 dwell) per µ, per σ = 14 54 Timing Features Pace University Features (from our numeric keypad study) (Features Set Comparison – CMU vs. PaceU) R. Maxion and K. Killourhy, "Keystroke Biometrics with Number-Pad Input,“ 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL, 2010, pp. 201-210. *
23
CMU Experiment – Keypad (continued) (Equal Error Rate and ROC Curves only using Multi Match) PU Data with CMU Features Equal Error RateROC Curves PU Features vs. CMU Features
24
CMU Experiment – Keypad (continued) Independent Variable: Feature Set Conclusion: PU Feature Set out performed CMU Feature Set (Independent Variable for the CMU Keypad experiment) Numeric Keypad (30 – 20) Features SetCMUPU Subjects 30 Samples per Subject 20 Total Samples (All Subjects) 600 EER % (Multi Match) 10.47%6.14% EER Improvement %41.36%
25
CMU Experiment - Password.tie5Roanl + Enter Key = 11 Characters 10 key-down ---> key-down 10 key-up ---> key-down 11 dwell times 31 Features Carnegie Melon Features (from their numeric keypad study *) (Classifier Comparison – CMU vs. PU) K. Killourhy and R. Maxion, "Comparing Anomaly-Detection Algorithms for Keystroke Dynamics,“ 2009 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Lisbon, Portugal, 2009, pp. 125-134. *
26
CMU Experiment – Password(continued) (Equal Error Rate and ROC Curves only using Multi Match) CMU Data + CMU Features + Equal Error RateROC Curve PU Classifier CMU Data + CMU Features + PU Classifier
27
CMU Experiment – Password(continued) Independent Variable: Classifier Conclusion: PaceU Classifier outperformed Classifiers from the CMU Study (Independent Variable for the CMU Keypad experiment) Data and Features from CMU Features31 Subjects 51 Samples per Subject 400 Equal Error Rate per Classifier Multi Match PU7.6% Manhattan ScaledCMU9.6% Nearest Neighbor MahalanobisCMU10.0% Outlier Count z-scoreCMU10.2% * K. Killourhy and R. Maxion, "Comparing Anomaly-Detection Algorithms for Keystroke Dynamics,“ 2009 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Lisbon, Portugal, 2009, pp. 125-134. * * * (only top 3 of about 12 CMU classifiers are shown above)
28
Keystrokes Lengths – Long Input Keystroke counts in long samples Spreadsheet Sample Keystroke Counts Browser Sample Keystroke Counts Text Sample Keystroke Counts
29
Keystrokes Lengths – Long Input (continued) Keystroke density rate in long samples Spreadsheet Sample Keystroke Density Rate Browser Sample Keystroke Density Rate Text Sample Keystroke Density Rate
30
Keystrokes Lengths – Long Input (continued) Per Subject µ and σ for keystrokes in long samples Over All Samples Mean: 395 Median: 393 Over All Samples Mean: 104 Median: 67 Over All Samples Mean: 824 Median: 560 Spreadsheet Browser Text
31
Keystrokes Lengths – Long Input (continued) Frequency Distributions for keystrokes in long samples Long Input Keystroke Frequencies (Keystroke Counts) Long Input Keystroke Frequencies (Keystroke Density Rate)
32
Conclusions Keystroke Biometrics can be effective at detecting the unauthorized use of a computer system in a closed environment (government office, school, business office, etc.) Performance Varied with Input Type: Spreadsheet: Good Performance (EER: 8.1%) Text: Very Good Performance (EER: 5.8%) Browser: Fair Performance (ER: 15.7%) Long Input Experiments – Intruder Detection Accuracy 1)Multi Match out performed Single Match significantly (EER Improvement from 50% - 64%) 2)Multi Match out performed detector study from CMU using their data and features (EER: 7.6%) Numeric Keypad yields very good performance (EER Range: 5.5% - 6.2%) PaceU Features Set is Effective: CMU features performed much worse (10.5% vs. 6.2%) Short Input Experiments – Detection Accuracy Classifier Comparison – Multi Match vs. Single Match
33
Conclusions (continued) Less optimal samples No designated entry window for sample collection (less control over quality of entry) Large fluctuations in the number of keystrokes Input types most likely had substantial mouse activity that “Interrupts” keystroke entry Possible sparseness of keystrokes (meaning less concentrated and spread out especially with browser entry) Long Input Performance: Weaker Performance compared to previous studies at PU… Why? Propose that correlating performance simply to Number of Keystrokes is not sufficient Need to factor in the density of the keystrokes as well Simply stated: It may take a lot more keystrokes to maintain an effective level of performance if the sparseness is high Future Considerations: Do keystroke counts tell the whole story?
34
Suggestions for Future Work Further studies on numeric entry from QWERTY Compare performance to numeric entry from keypad Study free text entry from keypad Feature Analysis Which features contributed to performance from the keypad? How do equivalent numeric features from QWERTY perform compared to keypad? Perform mixed mode experiments Collect input that combines spreadsheet, browser, and text Collect spreadsheet input which includes all numeric entry from keypad Incorporate Multi Biometric Keystroke + Mouse Movement + Stylometry
35
Backup Slides
36
Generate ROC Curves from kNN Data (vary m from 0 to k [m is the controlling or threshold parameter] ) R. Zack, C. Tappert, and S.Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using an Improved k-Nearest-Neighbor Classification Method," IEEE 4th Int Conf Biometrics (BTAS 2010), Washington D.C., 2010. The m-kNN procedure with k = 9 and m = 5 For each Q (questioned) test sample: Examine the top k nearest-neighbors count the number of within-class matches If the number of within-class matches >= a threshold of matches (m), the user is authenticated. Otherwise rejected. Generate the ROC curve as follows: vary m from 0 to k calculate FAR / FRR in each of the following cases: m = 0, authenticate if 0 or more of the k choices are within m = 1 authenticate of 1 or more of the k choices are within and so on until m = 9 in this case Linear Rank Weighting Method: 1st choice weight = k, 2 nd choice weight = k-1… weight = 1 Authenticate a user if the sum of the weighted-within-class choices >= the m threshold Threshold varies from 0 to k(k+1)/2 (maximum score)
37
Equal Error Rates (From the Literature) Long Input: Ferreiar and Santos: 1.4% Monaco using data from Villani: 1.7% Generate the ROC curve as follows: vary m from 0 to k calculate FAR / FRR in each of the following cases: m = 0, authenticate if 0 or more of the k choices are within m = 1 authenticate of 1 or more of the k choices are within and so on until m = 9 in this case
38
Multi Biometrics for Intrusion Detection Motor Control Level: keystroke + mouse movement Linguistic Level: stylometry (char, word, syntax) Semantic Level: target likely intruder commands Intruder Keystroke + Mouse Stylometry Motor Control Level Linguistic Level Semantic Level Future Work (continued)
39
Intruder Experiment Design (continued) Authenticate user on various window sizes, beginning 300-keystroke windows Window Type 1: use overlapping windows to: Minimize the “wait” period for the next authentication Maximize fast intruder detection 1300600900120015001800 300 KS 300 KS 300 KS 300 KS 300 KS 300 KS 150 300 KS 450750105013501650 300 KS 300 KS 300 KS 300 KS Figure 1.5-1 Overlapping Window Burst Authentication
40
Continuous vs Continual Authentication with Data Capture Windows Continuous (ongoing) burst authentication Continual burst authentication with pauses 05 min10 min 1 min 1 min 1 min Burst 1Burst 2Burst 3 08 min30 min 1 min 1 min 1 min Pause Threshold Burst 1Burst 2Burst 3 Pause Threshold 40EISIC 2012
41
Background (continued) DARPA (Defense Advanced Research Projects Agency) through their Cyber Genome Program is funding research for the development of new software based authentication biometric modalities These include keystrokes and targets a desktop environment running Microsoft Office applications as the standard computer system platform DARPA. Active Authentication Program. https://www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0, accessed 2014.www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0 The 2008 United States Higher Education Opportunity Act requires institutions of higher learning to make greater online access control efforts by adopting ubiquitous identification technologies HEOA. Higher Education Opportunity Act (HEOA) of 2008. http://www2.ed.gov/policy/highered/leg/hea08/index.html, accessed 2014.http://www2.ed.gov/policy/highered/leg/hea08/index.html
42
Spreadsheet Template 201120102009 Assets Cash Investments : Cash Equity Securities Corporate debt securities US government securities Private equity Real estate Total Investments0 0 0 Other Assets Total Assets$0 Liabilities and Net Assets Liabilities: Penalities Accounts Payable Advance from Lendor Federak excuse tax Total Liabilities0 0 0 Net Assets: Tangiable Non Tangiable Total Net Assets0 0 0 Total Net Assets and Liabilities$0 Special Journal Entries Enter Journal Entry name here Total Journal Entries$0.00
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.