Towards the privacy leakage and user fraud detection of Android applications Zhengyang Qu 1 Northwestern University, IL, US,

Slides:



Advertisements
Similar presentations
Towards Remote Policy Enforcement for Runtime Protection of Mobile Code Using Trusted Computing Xinwen Zhang Francesco Parisi-Presicce Ravi Sandhu
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Northwestern University, IL, US,
Policy Weaving for Mobile Devices Drew Davidson. Smartphone security is critical – 1200 to 1400 US Army troops to be equipped with Android smartphones.
Android Permission Presenter: Zhengyang Qu.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Aurasium: Practical Policy Enforcement for Android Applications R. Xu, H. Saidi and R. Anderson Presented By: Rajat Khandelwal – 2009CS10209 Parikshit.
Aurasium: Practical Policy Enforcement for Android Applications R. Xu, H. Saidi and R. Anderson.
AutoCog: Measuring the Description-to-permission Fidelity in Android Applications Zhengyang Qu1, Vaibhav Rastogi1, Xinyi Zhang1,2, Yan Chen1, Tiantian.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Wangjun Hong, Zhengyang Qu, Northwestern University, IL, US,
Rage Against The Virtual Machine: Hindering Dynamic Analysis of Android Malware Thanasis Petsas, Giannis Voyatzis, Elias Athanasopoulos, Sotiris Ioannidis,
Mobile App Monetization: Understanding the Advertising Ecosystem Vaibhav Rastogi.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Presentation By Deepak Katta
Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps Bin Liu (SRA), Bin Liu (CMU), Hongxia Jin (SRA), Ramesh Govindan (USC)
Jarhead Analysis and Detection of Malicious Java Applets Johannes Schlumberger, Christopher Kruegel, Giovanni Vigna University of California Annual Computer.
Webpage Understanding: an Integrated Approach
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
A METHODOLOGY FOR EMPIRICAL ANALYSIS OF PERMISSION-BASED SECURITY MODELS AND ITS APPLICATION TO ANDROID.
LEVERAGING UICC WITH OPEN MOBILE API FOR SECURE APPLICATIONS AND SERVICES Ran Zhou 1 9/3/2015.
Lei Wu, Michael Grace, Yajin Zhou, Chiachih Wu, Xuxian Jiang Department of Computer Science North Carolina State University CCS 2013.
Introduction to Android Swapnil Pathak Advanced Malware Analysis Training Series.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Enhancing User Privacy on Android Devices Bachelor of Computer Science (Honours) Name: Quang Do Supervisor: Raymond Choo Associate Supervisor: Ben Martini.
Authors: William Enck The Pennsylvania State University Peter Gilbert Duke University Byung-Gon Chun Intel Labs Landon P. Cox Duke University Jaeyeon Jung.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
TEMPLATE DESIGN © Detecting User Activities Using the Accelerometer on Android Smartphones Sauvik Das, Supervisor: Adrian.
SUPOR : Precise and Scalable Sensitive User Input Detection for Android Apps Jianjun Huang, Zhichun Li, Xusheng Xiao, Zhenyu Wu, Kangjie Lu, Xiangyu Zhang,
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Android for Java Developers Denver Java Users Group Jan 11, Mike
University of Central Florida TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones Written by Enck, Gilbert,
AppShield: A Virtual File System in Enterprise Mobility Management Zhengyang Qu 1 Northwestern University, IL, US,
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
CSCE 201 Web Browser Security Fall CSCE Farkas2 Web Evolution Web Evolution Past: Human usage – HTTP – Static Web pages (HTML) Current: Human.
KAIST Internet Security Lab. CS710 Behavioral Detection of Malware on Mobile Handsets MobiSys 2008, Abhijit Bose et al 이 승 민.
Human Activity Recognition Using Accelerometer on Smartphones
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Checking More Alerting Less PRESENTED BY: AMIN ROIS SINUNG NUGROHO.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Database Laboratory Regular Seminar TaeHoon Kim Article.
DeepDroid Dynamically Enforcing Enterprise Policy Manwoong (Andy) Choi
THREATS, VULNERABILITIES IN ANDROID OS BY DNYANADA PRAMOD ARJUNWADKAR AJINKYA THORVE Guided by, Prof. Shambhu Upadhyay.
AppAudit Effective Real-time Android Application Auditing Andrew Jeong
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Authors: William Enck & Patrick McDaniel In collaboration with: Duke University and Intel Labs Presentation: Ed Novak 1.
What mobile ads know about mobile users
Joshua Garcia Institute for Software Research
More Security and Programming Language Work on SmartPhones
Free for All! Assessing User Data Exposure to Advertising Libraries on Android Campbell Foskin.
Emerging Mobile Threats and Our Defense
TriggerScope: Towards Detecting Logic Bombs in Android Applications
Zhengyang Qu, Shahid Alam. , Yan Chen, Xiaoyong Zhou
TaintART: A Practical Multi-level Information-Flow Tracking System for Android RunTime Sadiq Basha.
AppShield: Enabling Multi-entity Access Control Cross Platforms for Mobile App Management Zhengyang Qu1, Guanyu Guo2, Zhengyue Shao2, Vaibhav Rastogi3,
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Mining and Analyzing Data from Open Source Software Repository
Yan Chen Lab of Internet and Security Technology (LIST)
Suwen Zhu, Long Lu, Kapil Singh
iSRD Spam Review Detection with Imbalanced Data Distributions
Data Warehousing Data Mining Privacy
Towards Obfuscation Resilient Software Plagiarism Detection
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Presentation transcript:

Towards the privacy leakage and user fraud detection of Android applications Zhengyang Qu 1 Northwestern University, IL, US,

Outline Introduction Problem statement Solutions Conclusion 2

Android OS Dominance 3 Mobile OS Market Share, July 2014, by dazeinfo.com

Android Malware/Spyware 4

Source of Android Security Risks Diverse mobile application market places Ease of deployment Open nature of development – Java is the primary language – Alternatives: Java reflection, Dynamic code loading (DCL): bytecode and native code 5

Outline Introduction Problem statement Solutions Conclusion 6

Architecture 7

Outline Introduction Problem statement Solutions Conclusion 8

9 Risk management in mobile payment

Motivations The growing popularity of mobile payment Attack surface of smartphone  User’s financial loss Countermeasure: – G1: authentication – G2: risk management Heavy usage of user privacy (location etc.) Fragmentation 10

Goal A learning-based mechanism for user fraud detection – Least user privacy required, high detection accuracy – High portability 11

Goal 12

Challenges Lack of feature Data availability Imbalanced dataset Noise surrounding Unlabeled data 13

Challenges Lack of feature – Only based on acceleration sensor and gyroscope sensor – Feature selection (6 values  64 features) Data availability Imbalanced dataset Noise surrounding Unlabeled data 14

Challenges Lack of feature Data availability – Periodical data collection – User motion detection Imbalanced dataset Noise surrounding Unlabeled data 15

Challenges Lack of feature Data availability Imbalanced dataset – Control of distribution of training set – Random selection & Stratified sampling Noise surrounding Unlabeled data 16

Challenges Lack of feature Data availability Imbalanced dataset Noise surrounding – Calibrate sensor data based on gravity direction – Identify user motion state: sit or walk? Unlabeled data 17

Challenges Lack of feature Data availability Imbalanced dataset Noise surrounding Unlabeled data – Semi-supervised online learning 18

Data preprocess Filter the useless data on client – -1.5 < X < 1.5 AND -1.5 < Y < 1.5 AND (9 < Z < 10 AND -10 < Z < -9) Identify motion state on server 19

Training set construction 20

ML algorithm selection Decision TreekNNNaïve BayesSVM Accuracy in general Speed of Classification+++++ Tolerance to missing values Tolerance to irrelevant attributes Tolerance to redundant attributes Tolerance to noise Attempts for incremental learning MLA Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. "Supervised machine learning: A review of classification techniques." (2007): 3-24.

Semi-supervised online learning 22

Preliminary Evaluation Metrics – True positive: owner is correctly identified – False positive: other is incorrectly identified as owner – False negative: owner is incorrectly identified as other – True negative: other is correctly identified – Precision: – Recall: 23

Accuracy 80 users; each user has 4K samples in training set and 1.2K samples in test set. 24 Average precision: 72.33%Average recall: 73.49%

Robustness Brute-force attack – A set of 500K randomly generated samples – Percentage of samples detected as not the owner 25

26 DyDroid: Measuring dynamic code loading and its security in Android Applications

Motivation Android allows developers to load external code dynamically – ClassLoader: bytecode – Java-Native-Interface (JNI): native code Unpredictable, no security verification Ineffective dynamic analysis system (Google bouncer) 27

Motivation 28

Problems Source – Local/remote availability – Responsible entity Security benefits – Obfuscation Security risks/implications – Vulnerabilities – Privacy tracking – Malware 29

Challenges Dynamic code loading (DCL) recognition/interception – Static analysis: false positive – Dynamic analysis: high time latency Obfuscation identification – Bytecode encryption, loading interposed in app startup: general pattern? Responsible entity analysis 30

DyDroid 31

DCL recognition/interception Static analysis – Check invocation of ClassLoader and JNI Dynamic analysis – Instrument system APIs: DexClassLoader, PathClassLoader, load, loadLibrary  Complete mediation – Path to loaded file, directory of ODEX code, call site class – Android emulator based on QEMU 32

Measurement summary DEXNative Failure11762 (30.58%)5638 (20.84%) Rewriting failure618 (1.61%)94 (0.35%) Installation failure30 (0.08%)36 (0.13%) No activity2586 (6.72%)2620 (9.68%) Crash8528 (22.17%)2888 (10.68%) Exercised26697 (69.42%)21415 (79.16%) Captured19110 (49.69%)16192 (59.85%) Intercepted462 (1.2%)16185 (59.83%) 33

Source identification Check loaded file with unzipped APK archive Check call site class with application package name 34 RemoteLocalRemote & Local 3 rd -partyOwn3 rd -party & Own DEX18986 (99.35%) 136 (0.71%) 35 (0.18%)19089 (99.89%) 433 (2.27%) 412 (2.16%) Native93 (0.57%)16151 (99.75%) 52 (0.32%)14578 (90.03%) 2372 (14.65%) 758 (4.68%)

Obfuscation Technique#Apps (%)With DCL (%) Lexical89934 (89.95%)35.47% Reflection82629 (82.64%)38.27% Native16192 (16.20%)100% DEX encryption127 (0.13%)100% Anti-decompilation125 (0.13%)N/A 35

Unknown malware variant detection 87 Apps found to load malicious code from 91 files 36 Family#AppsSample App (#Download) DEXSwiss code monkeys1com.sktelecom.hoppin.mobile (10,000,000) Adware airpush minimob 2com.oshare.app (10,000) NativeChathook ptrace84com.com2us.tinyfarm.normal.freefull.goog le.global.android.common (10,000,000)

Vulnerabilities ?? The file dynamically loaded is writable by other parties 37 Category#AppsSample App (#Download) DEXInternal storage of other Apps 12com.keerby.mp3gain (100,000) External storage5com.fkccy.view (100,000) NativeInternal storage of other Apps 10fr.ikomobi.auchandrive (100,000) External storage0

Privacy tracking Mark sensitive APIs and content providers as source, total 19 types of privacy 38 Type#Apps (%)Exclusively 3 rd -party (%) Location276 (59.74%)99.64% IMEI220 (47.62%)99.55% Phone number29 (6.28%)100% Installed apps98 (21.21%)98.98% Contact68 (14.72%)98.53% Calendar131 (28.35%)100% Image124 (26.84%)99.19% …

Publication List Zhengyang Qu, V. Rastogi, X. Zhang, Y. Chen, T. Zhu, Z. Chen, “AutoCog: Measuring the Description-to- permission Fidelity in Android Applications” in ACM CCS 2014 (114/585, 19.5%) V. Rastogi, Zhengyang Qu, J. McClurg, Y. Cao, Y. Chen, W. Zhu, P. Xu, W. Chen, “Uranine: Real-time Privacy Leakage Detection and Prevention without System Modification for Android”, in SecureComm 2015 (30/108 = 27.8%). Zhengyang Qu, G. Guo, Z. Shao, V.Rastogi, Y. Chen, H. Chen, W. Hong, “AppShield: A Proxy-based Data Access Mechanism in Enterprise Mobility Management”, under submission. 39

Publication List Zhengyang Qu, S. Alam, Y. Chen, X. Zhou, W. Hong, R. Riley, “DYDROID : Measuring Dynamic Code Loading and Its Security Implications in Android Applications”, under submission. S.Alam, Zhengyang Qu, R. Riley, Y. Chen, V. Rastogi, “DroidNative: Semantic-Based Detection of Android Native Code Malware”, under submission. 40

41 Thank you! Questions?

Android Security Risks 42 User Smartphone App marketplace Download Meta data App usage DCL: DyDroid Malware: DroidNative Bring-your-own-device: AppShield Payment: Mobile Risk Management Privacy: Uranine User expectation vs. Permission: AutoCog

43 AutoCog: Measuring the description-to-permission fidelity of Android applications

Motivations Android Permission System – Access control by permission system – Few users can understand security implications from requested permissions User expectation v.s. Application Behavior – User expectation based on application description – Permission defines application behavior – Assess how well permission align with description 44

Challenges & Contributions Inferring description semantics – Similar meaning may be conveyed in a vast diversity of natural language text – “friends”, “contact list”, “address book” Correlating description semantics with permission semantics – A number of functionalities described may map to the same permission – “enable navigation”, “display map”, “find restaurant nearby” Leverage stat-of-the-art NLP techniques 2. Design a learning-based algorithm

System Overview 46

System Overview 47

System Overview 48

Ontology modeling Logical dependency between verb phrase and noun phrase – for CAMERA, for RECORD_AUDIO Logical dependency between noun phrases –, Noun phrase with possessive –, 49

Description Semantics Model (Contribution 1) Extract Abstract Semantics Explicit Semantic Analysis (ESA) – Computing the semantic relatedness of texts Leverage a big document corpus (Wikipedia) as the knowledge base and constructs a vector representation – Advantages: Rich semantic information, Quantitative representation of semantics 50

Description-to-Permission Relatedness (DPR) Model (Contribution 2) Learning-based method – Input: application permission, application description – Output: correlated with each sensitive permission 51

Samples in DPR Model PermissionSemantic Patterns WRITE_EXTERNAL_STORAGE, ACCESS_FINE_LOCATION,, ACCESS_COARSE_LOCATION, GET_ACCOUNTS, RECEIVE_BOOT_COMPLETED, CAMERA,, READ_CONTACTS, RECORD_AUDIO, WRITE_SETTINGS, WRITE_CONTACTS, READ_CALENDAR, 52

Learning Algorithm for DPR S1: Grouping noun phrases – Create semantic relatedness score matrix S2: Selecting Noun Phrases Correlated with Permissions – Not biased to frequently occurring noun phrases – Jointly consider conditional probabilities: – P(perm | np) and P(np | perm) 53

Learning Algorithm for DPR(cont’d) S3: Pairing np-counterpart with Noun Phrase – “Retrieve Running Apps permission is required because, if the user is not looking at the widget actively (for e.g. he might using another app like Google Maps)” 54

Accuracy Comparison 55 SystemPrecision (%)Recall (%)F-score (%)Accuracy (%) AutoCog Whyper [1] [1] Whyper, Pandita et al., USENIX Security 2013