Download presentation
Presentation is loading. Please wait.
Published byIrene Walton Modified over 9 years ago
1
WHYPER: Towards Automating Risk Assessment of Mobile Applications Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie ♠ Department of Computer Science North Carolina State University ♠ University of Illinois at Urbana-Champaign 0
2
Application Markets Apple App Store Google Play Microsoft Windows Phone “Application markets have played an important role in the popularity of smartphones and mobile devices.” 1
3
o Apple (Market’s Responsibility) o Apple performs manual inspection o Google (User’s Responsibility) o Users approve permissions for security/privacy o Bouncer (static/dynamic malware analysis) o Windows Phone (Hybrid) o Permissions / manual inspection Predominant approaches towards Market Security/Privacy 2
4
o Previous approaches look at permissions, code, and runtime behaviors o Caveat: what does the users expect? o GPS Tracker: record and send location o Phone-Call Recorder: record audio during phone call o One-Click Root: exploit a privilege escalation vulnerability o Others are more subtle Is Program Analysis sufficient? 3
5
o A first step in this direction o Focus on permission and application descriptions o permissions protecting user understandable resources should be discussed o low-level system permissions are unlikely to be mentioned Vision “Bridging the gap between user expectation and app behaviors” 4
6
WHYPER Overview 5
7
o Enhance user experience o while installing Apps o Functionality disclosure o enforce on part of developers o Complementing program analysis o to ensure more appropriate justifications Use Cases 6
8
Solution 7
9
Simple Solution? Keyword-based search on application descriptions Photo Credit: Ahora estoy en via Flickr 8
10
Problems with Ctrl + F o Confounding effects: o Certain keywords such as “contact” have a confounding meaning. o For instance, “... displays user contacts,...” vs “... contact me at abc@xyz.com”.abc@xyz.com o Semantic Inference: o Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”. o For instance, “share yoga exercises with your friends via email, sms”. 9
11
NLP techniques help computers understand NL artifacts NLP is still difficult NLP on domain specific sentences with specific styles is feasible Natural Language Processing (NLP) 10
12
o Parts Of Speech (POS) Tagging o E.g., noun, verb, prepositions… o Phrase and Clause Parsing o E.g., noun phrases (basketball players) and verb phrases (make sure)… o Stanford-Typed Dependencies o E.g., subject, object, adverbial modifiers… o Named Entity Recognition o E.g., ‘Pandora Internet Radio’ is a name, ‘$5’ refers to a currency amount… NLP Preliminaries 11
13
WHYPER Framework APP Description APP Permission Semantic Graphs Preprocessor Intermediate Representation Generator Semantic Engine NLP Parser Semantic Graph Generator API Docs Annotated Description FOL Representation WHYPER Preprocessor 12
14
o Period Handling o Decimals, ellipsis, shorthand notations ( Mr., Dr. ) o Sentence Boundaries o Tabs, bullet points, delimiters (:) o Symbols (*,-) and enumeration sentence o Named Entity Handling o E.g., “ Pandora internet radio ”, o Abbreviation Handling o E.g., “ Instant Message (IM)” Preprocessor 13
15
WHYPER Framework APP Description APP Permission Semantic Graphs Preprocessor Intermediate Representation Generator Semantic Engine NLP Parser Semantic Graph Generator API Docs Annotated Description FOL Representation WHYPER 14
16
“Also you can share the yoga exercise to your friends via Email and SMS.” AlsoyoucanshareyogaexercisetoyourfriendsviaEmailandSMS Also you can share exercise your friends Email SMS VB RB PRPMDNNDTNNNNSPRPNNP the yoga advmod nsubj aux dobj det nn prep_to poss prep_via conj_and the share to you yoga exercise owned you via friends and email SMS 15
17
WHYPER Framework APP Description APP Permission Semantic Graphs Preprocessor Intermediate Representation Generator Semantic Engine NLP Parser Semantic Graph Generator API Docs Annotated Description FOL Representation WHYPER 16
18
“Also you can share the yoga exercise to your friends via Email and SMS.” share to you yoga exercise owned you via friends and email SMS email share WordNet Similarity 17
19
o WHY o to perform deep semantic analysis o HOW o infer graphs from API documents Semantic-Graph Generator 18
20
o Systematic approach to infer graphs o Find related API documents based on PScout [CCS 2012] o Identify resource associated with the permissions from the API class name oContactsContract.Contacts o Inspect the member variables and member methods to identify actions and subordinate resources oContactsContract.CommonDataKinds.Email Semantic-Graph Generator 19
21
o Subjects o Permissions: o READ_CONTACTS o READ_CALENDAR o RECORD_AUDIO o 581/600* application descriptions (only English descriptions) o 9,953 sentences o Research Questions (RQs) o RQ1: What are the precision, recall and F-Score of WHYPER in identifying permission sentences? o RQ2: How effective WHYPER is in identifying permission sentences, compared to keyword-based searching ? Evaluation 20
22
Permissions#N#SSpSp READ_CONTACTS1903,379235 READ_CALENDAR1912,752283 RECORD_AUDIO2003,822245 TOTAL5819,953763 Statistics of Subject Applications 21
23
o True Positive (TP): o WHYPER( sentence ) && Manual( sentence ) o False Positive (FP): o WHYPER( sentence ) && ! Manual( sentence ) o True Negative (TN): o ! WHYPER( sentence ) && ! Manual( sentence ) o False Negative (FN): o ! WHYPER( sentence ) && Manual( sentence ) Classification 22
24
RQ1 Results: Effectiveness of WHYPER Low FPs and FNs out of 9,061 sentences, only 129 are flagged as FPs among 581 applications, 109 applications (18.8%) contain at least one FP among 581 applications, 86 applications (14.8%) contain at least one FN PermissionSISI TPFPFNTNPrec.RecallF-ScoreAcc READ_CONTACTS20418618492,93091.279.284.897.9 READ_CALENDAR28824147422,42283.785.284.596.8 RECORD_AUDIO25919564503,47075.379.677.497.0 TOTAL7516221291419,06182.881.582.297.3 23
25
PermissionKeywords READ_CONTACTS contact, data, number, name, email READ_CALENDAR calendar, event, date, month, day, year RECORD_AUDIO record, audio, voice, capture, microphone RQ2 Results: Comparison to Keyword-based Search 24
26
RQ2 Results: Comparison to Keyword-based Search PermissionDelta Precision Delta Recall Delta F-score Delta Accuracy READ_CONTACTS50.41.331.27.3 READ_CALENDAR39.31.526.49.2 RECORD_AUDIO36.9-6.624.36.8 WHYPER Improvement41.6-1.227.27.7 25
27
Incorrect parsing “MyLink Advanced provides full synchronization of all Microsoft Outlook emails (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB” Synonym analysis “You can now turn recordings into ringtones.” Result Analysis (False Positives) 26
28
Incorrect parsing Incorrect identification of sentence boundaries and limitations of underlying NLP infrastructure Limitations of Semantic Graphs Manual Augmentation microphone-blow into and call-record significant improvement of Delta Recalls: -6.6% to 0.6% Automatic mining from user comments and forums Result Analysis (False Negatives) 27
29
Generalization to other permissions user-understandable permissions: calls, SMS location and phone identifiers internet Discussions 28
30
Conclusion We propose the use of NLP techniques to help bridge the semantic gap between what mobile applications do and what users expect them to do. Our evaluation demonstrates an improvement over a simple keyword-based searching. 29
31
Thank You 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.