Emerging Mobile Threats and Our Defense

Emerging Mobile Threats and Our Defense
Yan Chen (陈焰) Lab of Internet and Security Technology (LIST) Zhejiang University, China Northwestern University, USA

RiskCog: Privacy-preserving Unobtrusive Real-time Mobile User Authentication

How many of you use your smartphones to make payments?
What about to storing sensitive data or personal photos?

52% 41% 99% 40% Use smartphone to make payments
Use it regularly to make payments 99% Store sensitive personal and professional information 40% Don’t protect their devices with a password This means millions on bank accounts, personal details and photos are out in the open People think info is invaluable Cumbersome to use Source:

WHAT HAS BEEN DONE SO FAR

PASSWORDS Hard to remember Hard to type on handheld devices
Prone to dictionary attacks

PASSWORDS IRIS SCAN Hard to remember Hard to type on handheld devices
Prone to dictionary attacks IRIS SCAN Cumbersome to use Not practical in all situations Still under active research

FINGERPRINTS Needs a fingerprint sensor Privacy concern
Vulnerable to Play-Doh attacks!

FINGERPRINTS PATTERN Needs a fingerprint sensor Privacy concern
Vulnerable to Play-Doh attacks! PATTERN Strong patterns are hard to use Prone to shoulder surfing Not practical in all situations vs

FACE RECOGNITION Needs a camera Privacy concern
Easily hackable (... using your photo!)

FACE RECOGNITION SPEECH RECOGNITION
Needs a camera Privacy concern Easily hackable (... using your photo!) SPEECH RECOGNITION Ambient noise affects the recognition Privacy concern still exists Still hackable (recording your voice)

what if… There was a way to and ‘unobtrusively’ identify the phone’s owner in an ‘implicit’ way (without compromising user privacy!)

Goal A learning-based mechanism for user fraud detection
Least user privacy required, high detection accuracy, training without labeled data Device-level approach: only one copy of data is uploaded Use of cheap and universal sensors Robust, hard to evade Work at constrained environment even when being disconnected, Internet scale users

Goal The data will be preprocessed in the server side and organized as the training set to produce the classier for each individual phone owner. The data is incrementally pushed to the server, and the classier for a phone owner will be recognized as being fully trained when the accuracy of validation achieves a pre-defined threshold. The follow-up data can be used to identify whether the phone is used by the owner. When various mobile payment vendors receive transaction requests from the end users, they can directly query locally regarding the user identify.

Problem Statement Fingerprinting Bob’s usage manner
Verify based on classification results

Challenges Lack of features Data availability & device placement
A feature set which is effective to fingerprint authorized user based on motions sensors Feature selection (6 values  56 features) Data availability & device placement A data collection mechanism that recognizes phone’s active state to resolve data availability issue A data preprocessing algorithm to remove the effect of dynamic device position 6 values  64 features, average, variance

Challenges Unlabeled data Imbalanced dataset
A semi-supervised online learning algorithm to handle the unlabeled data with supervised learning algorithm Imbalanced dataset Stratified sampling plus sample randomization to address the issue of imbalanced data set Constrained mobile environment Lack of feature: location track user privacy, deployment force sensor, integrated into app context

Data Preprocessing Filter useless data on client side
The device is put on a flat plane Identify motion state on server Each motion state has one corresponding classifier trained

Algorithm optimization
Given ε = 0.01, change C from 1 to 90,000 and γ from 0 to 0.1, the model size We choose 80 users, who generate most number of data samples in daily usage, from our data set with 1,513 users that will be explained in detail in Section 5. Given the fixed parameter ε = 0.01, we change C from 1 to 90,000 and γ from 0 to 0.1. As shown in Figure 4, we find the model size decreases with the value of C. As the value of C exceeds 100, the system will get tiny decrement in the model size (cost C is depicted in the logarithmic scale). The model size will reach the minimal value when the value of γ equals to 0.01.

Unlabeled Data: Semi-supervised Online Learning

Evaluation Data Metrics
Collected with “Phone manager” (手机管家) by Tencent 1st batch dataset: 210 users 2nd batch dataset: 1513users Metrics Accuracy True positive: owner is correctly identified False positive: other is incorrectly identified as owner False negative: owner is incorrectly identified as other True negative: other is correctly identified Rowner = TP/(TP+FN), Rother = TN/(TN+FP) ROC curve Overhead Robustness

Accuracy 1513 users with full data
Collect 60s per hour for 10 days (sample rate 50Hz) Size of training set: size of test set = 4:1

Accuracy Data of 34 users from Alipay
Combine 1,513 users from Tencent as others’ data Size of training set: size of test set = 4:1 TP:98.74% TN:92.02%

ROC Curve True positive rate v.s. False positive rate
TPR = TP/(TP+FN), FPR = FP/(FP+TN) Changes the classification threshold (0-1)

Overhead Online verification(Client) Client
Sever latency (average over 1,513 users) Phone Battery (mAh/h) Traffic (KB/h) CPU (%) Memory (MB) MI 4 128.9/3080 58.9 1.3 14.3/2864.2 Samsung N9100 132.5/3000 56.1 10.3 14.4/2778.2 Sony Z2 113.8/3200 60.45 1.8 18.0/3072.0 #Samples in training set Training time (s) #Samples in test set Test time (s) 32812 8330 2.459

Overhead Overhead Offline verification Offline model size: about 200kb
Procedure Time (ms) Data collection 3211.6 Data preprocessing 0.5 Feature extraction 12.3 Decision 13.3 Overall 3237.7

产品演示安卓端应用程序

Robustness Brute-force attack
The classifier model for each authorized owner is pre-trained A set of 500K randomly generated samples Percentage of samples detected as non owner: 94.01% Amount of data. The sampling frequency of our data collection mechanism is 50Hz, and we only collect the effective data for one minute per hour. Thus, total 72,000 data samples will be gathered within one day, which is far lower than that generated by our brute-force attack. Data coverage. Manually handling the smartphone only involves a limited number of gestures. However, on the six values collected from acceleration sensor and gyroscope sensor, our brute-force attack generates the samples which have even distributions and fully cover the ranges reasonable on physics.

Robustness Human attack A pre-trained classifier for the owner
3 participants handle the phone with various gestures Each participant lunches 10 attacks Each attack last for 10 seconds Percentage of samples detected as non owner: 93.84%

Conclusion and Ongoing Work
RiskCog: The first device level user identification system with wild collected sensor data Deploy on the phone, to replace existing password/fingerprint authentication for apps. Enable offline detection Port to smart watches where no other user authentication system available yet.

Emerging Mobile Threats and Our Defense

Similar presentations

Presentation on theme: "Emerging Mobile Threats and Our Defense"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emerging Mobile Threats and Our Defense

Similar presentations

Presentation on theme: "Emerging Mobile Threats and Our Defense"— Presentation transcript:

Similar presentations

About project

Feedback