Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yan Chen Lab of Internet and Security Technology (LIST)

Similar presentations


Presentation on theme: "Yan Chen Lab of Internet and Security Technology (LIST)"— Presentation transcript:

1 RiskCog: Privacy-preserving Unobtrusive Real-time Mobile User Authentication
Yan Chen Lab of Internet and Security Technology (LIST) Zhejiang University, China Northwestern University, USA

2 How many of you use your smartphones to make payments?
What about to storing sensitive data or personal photos?

3 52% 41% 99% 40% Use smartphone to make payments
Use it regularly to make payments 99% Store sensitive personal and professional information 40% Don’t protect their devices with a password This means millions on bank accounts, personal details and photos are out in the open People think info is invaluable Cumbersome to use Source:

4 WHAT HAS BEEN DONE SO FAR

5 PASSWORDS Hard to remember Hard to type on handheld devices
Prone to dictionary attacks

6 PASSWORDS IRIS SCAN Hard to remember Hard to type on handheld devices
Prone to dictionary attacks IRIS SCAN Cumbersome to use Not practical in all situations Still under active research

7 FINGERPRINTS Needs a fingerprint sensor Privacy concern
Vulnerable to Play-Doh attacks!

8 FINGERPRINTS PATTERN Needs a fingerprint sensor Privacy concern
Vulnerable to Play-Doh attacks! PATTERN Strong patterns are hard to use Prone to shoulder surfing Not practical in all situations vs

9 FACE RECOGNITION Needs a camera Privacy concern
Easily hackable (... using your photo!)

10 FACE RECOGNITION SPEECH RECOGNITION
Needs a camera Privacy concern Easily hackable (... using your photo!) SPEECH RECOGNITION Ambient noise affects the recognition Privacy concern still exists Still hackable (recording your voice)

11 what if… There was a way to and ‘unobtrusively’ identify the phone’s owner in an ‘implicit’ way (without compromising user privacy!)

12 Goal A learning-based mechanism for user fraud detection
Least user privacy required, high detection accuracy, training without labeled data Device-level approach: only one copy of data is uploaded Use of cheap and universal sensors Robust, hard to evade Work at constrained environment even when being disconnected, Internet scale users

13 Goal The data will be preprocessed in the server side and organized as the training set to produce the classier for each individual phone owner. The data is incrementally pushed to the server, and the classier for a phone owner will be recognized as being fully trained when the accuracy of validation achieves a pre-defined threshold. The follow-up data can be used to identify whether the phone is used by the owner. When various mobile payment vendors receive transaction requests from the end users, they can directly query locally regarding the user identify.

14 Problem Statement Fingerprinting Bob’s usage manner
Verify based on classification results

15 Challenges Lack of features Data availability & device placement
A feature set which is effective to fingerprint authorized user based on motions sensors Feature selection (6 values  56 features) Data availability & device placement A data collection mechanism that recognizes phone’s active state to resolve data availability issue A data preprocessing algorithm to remove the effect of dynamic device position 6 values  64 features, average, variance

16 Challenges Unlabeled data Imbalanced dataset
A semi-supervised online learning algorithm to handle the unlabeled data with supervised learning algorithm Imbalanced dataset Stratified sampling plus sample randomization to address the issue of imbalanced data set Constrained mobile environment Lack of feature: location track user privacy, deployment force sensor, integrated into app context

17 Data Preprocessing Filter useless data on client side
The device is put on a flat plane Identify motion state on server Each motion state has one corresponding classifier trained

18 SVM Algorithm optimization
Given ε = 0.01, change C from 1 to 90,000 and γ from 0 to 0.1, the model size We choose 80 users, who generate most number of data samples in daily usage, from our data set with 1,513 users that will be explained in detail in Section 5. Given the fixed parameter ε = 0.01, we change C from 1 to 90,000 and γ from 0 to 0.1. As shown in Figure 4, we find the model size decreases with the value of C. As the value of C exceeds 100, the system will get tiny decrement in the model size (cost C is depicted in the logarithmic scale). The model size will reach the minimal value when the value of γ equals to 0.01.

19 Unlabeled Data: Semi-supervised Online Learning

20 Evaluation Data Metrics
Collected with “Phone manager” by Tencent, 1513 users Labeled data collected in lab, 10 participants Provided by Alipay for benchmarking test, 34 users Metrics Accuracy True positive: owner is correctly identified False positive: other is incorrectly identified as owner False negative: owner is incorrectly identified as other True negative: other is correctly identified Powner = TP/(TP+FP), Rowner = TP/(TP+FN), Pother = TN/(TN+FN), Rother = TN/(TN+FP) ROC curve Overhead Robustness

21 Accuracy 1513 users with full data
Collect 60s per hour for 10 days (sample rate 50Hz) Size of training set: size of test set = 4:1

22 Accuracy Data of 34 users from Alipay
Combine 1,513 users from Tencent as others’ data Size of training set: size of test set = 4:1 TP:98.74% TN:92.02%

23 ROC Curve True positive rate v.s. False positive rate
TPR = TP/(TP+FN), FPR = FP/(FP+TN) Changes the classification threshold (0-1)

24 Overhead Impact on client (3 hours) Server latency (over 1,513 users)
Data collection Verification Phone Battery (mAh) CPU (%) Memory (MB) Samsung N9100 132.5/3000 10.34 14.4 8.80 21.4 Sony Xperia Z2 113.8/3200 1.82 18.0 9.00 26.0 MI 4 128.7/3080 1.30 14.0 12.00 24.3 #Training samples (steady) Training time (s) #Training samples (moving) 20648 148.36 9280 21.21

25 Overhead Overhead Offline verification Offline model size: about 200kb
Procedure Time (ms) Data collection 3211.6 Data preprocessing 0.5 Feature extraction 12.3 Decision 13.3 Overall 3237.7

26 Product Demo On Android

27 Robustness Brute-force attack
The classifier model for each authorized owner is pre-trained A set of 500K randomly generated samples Percentage of samples detected as non owner: 94.01% Amount of data. The sampling frequency of our data collection mechanism is 50Hz, and we only collect the effective data for one minute per hour. Thus, total 72,000 data samples will be gathered within one day, which is far lower than that generated by our brute-force attack. Data coverage. Manually handling the smartphone only involves a limited number of gestures. However, on the six values collected from acceleration sensor and gyroscope sensor, our brute-force attack generates the samples which have even distributions and fully cover the ranges reasonable on physics.

28 Robustness Human attack A pre-trained classifier for the owner
3 participants handle the phone with various gestures Each participant lunches 10 attacks Each attack last for 10 seconds Percentage of samples detected as non owner: 93.84%

29 Conclusion and Ongoing Work
RiskCog: The first device level user identification system with wild collected sensor data Deploy on the phone, to replace existing password/fingerprint authentication for apps. Enable offline detection Port to smart watches where no other user authentication system available yet.


Download ppt "Yan Chen Lab of Internet and Security Technology (LIST)"

Similar presentations


Ads by Google