Tao Xie University of Illinois at Urbana-Champaign 0

Slides:



Advertisements
Similar presentations
© 2014 Systems and Proposal Engineering Company. All Rights Reserved Using Natural Language Parsing (NLP) for Automated Requirements Quality Analysis Chris.
Advertisements

LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Using the Self Service BMC Helpdesk
Android Permission Presenter: Zhengyang Qu.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Fòmasyon Itilizatè Ayiti Office 365 Fòmasyon. Why the Change? Partners in Health's new hosted Microsoft Office 365 solution allows users to access their.
Object-Oriented Analysis and Design
Query Processing and Reasoning How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? Esther Kaufmann and Abraham Bernstein.
AutoCog: Measuring the Description-to-permission Fidelity in Android Applications Zhengyang Qu1, Vaibhav Rastogi1, Xinyi Zhang1,2, Yan Chen1, Tiantian.
1 Static Testing: defect prevention SIM objectives Able to list various type of structured group examinations (manual checking) Able to statically.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
We are partners in learning.. Note: Office 365 works best in Internet Explorer V 9 or above. Some features do not work in PWCS’s Chrome Browser or in.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
SMS Module. CLOUD SMS GATEWAY SUGAR INSTANCE SMS PROVIDER MOBILE.
WHYPER: Towards Automating Risk Assessment of Mobile Applications Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie ♠ Department of Computer.
Adapting to a Mobile IT Landscape: From IT Silo to Enterprise Strategy Kimberly Hancher Chief Information Officer (CIO) U.S. Equal Employment Opportunity.
Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps Bin Liu (SRA), Bin Liu (CMU), Hongxia Jin (SRA), Ramesh Govindan (USC)
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Mining and Summarizing Customer Reviews
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
BPMN By Hosein Bitaraf Software Engineering. Business Process Model and Notation (BPMN) is a graphical representation for specifying business processes.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Study of Automated Extraction of Security Policy from Natural-Language Software Documents * Nov. 21, 2013, Kaidi Ma, Man Sun Computer Information Science.
SUPOR : Precise and Scalable Sensitive User Input Detection for Android Apps Jianjun Huang, Zhichun Li, Xusheng Xiao, Zhenyu Wu, Kangjie Lu, Xiangyu Zhang,
Presented by: Tom Staley. Introduction Rising security concerns in the smartphone app community Use of private data: Passwords Financial records GPS locations.
Xusheng Xiao, Tao Xie North Carolina State University Amit Paradkar IBM T.J. Watson Research Center
Unit B: Expanding Your Productivity Page: 24 to 37.
Mining Software Data: Text
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
ADV. NETWORK SECURITY CODY WATSON What’s in Your Dongle and Bank Account? Mandatory and Discretionary Protections of External Resources.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
MedKAT Medical Knowledge Analysis Tool December 2009.
Information Retrieval
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
AppContext: Differentiating Malicious and Benign Mobile App Behavior Under Contexts Tao Xie Joint Work w/ David Yang, Sihan Li (Illinois) Xusheng Xiao,
ITS Lunch & Learn November 13, What is Office 365? Office 365 is Microsoft’s software as a service offering. It includes hosted and calendaring.
ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In.
Engineering Quality Software Week02 J.N.Kotuba1 SYST Engineering Quality Software.
Free for All! Assessing User Data Exposure to Advertising Libraries on Android Campbell Foskin.
Towards a framework for architectural design decision support
ServiceNow Implementation Knowledge Management
Software Documentation
Ontology Evolution: A Methodological Overview
File Stream and Team Drives
Tools of Software Development
Verification and Validation Unit Testing
CSc4730/6730 Scientific Visualization
Data Warehousing Data Mining Privacy
Engineering Quality Software
Presentation transcript:

Tao Xie University of Illinois at Urbana-Champaign 0

Mobile App Markets Apple App Store Google Play Microsoft Windows Phone

App Store beyond Mobile Apps!

What If Formal Specs Are Written?! 3 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements informal: app description, etc. permission list, etc.

Informal App Functional Requirements: App Description 4 App Code App Code App Permissions App Permissions

App Security Requirements: Permission List 5

What If Formal Specs Are Written?! 6 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements informal: app description, etc. permission list, etc.

Example Andriod App: Angry Birds 7

What If Formal Specs Are Written?! 8 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements In reality, few of these requirements are (formally) specified!!  Hope?!: Bring human into the loop: user perception + judgment informal: app description, etc. permission list, etc.

Our Yin-Yang View on Mobile App Security 9 App Description App Code App Code App Permissions App Permissions User-Perceived Information App Security Behavior o Reason about user-perceived info, e.g., WHYPER ( ) o Push app security behavior across the boundary (  ) o Check consistency across the boundary (  ) o Reduce user judgment effort ( ) App UIs, App categories, App metadata, User forums, … [functional] [security]

o Apple (Market’s Responsibility) o Apple performs manual inspection o Google (User’s Responsibility) o Users approve permissions for security/privacy o Bouncer (static/dynamic malware analysis) o Windows Phone (Hybrid) o Permissions / manual inspection Assuring Market Security/Privacy 10

o Previous approaches look at permissions  code (runtime behaviors) o What does the users expect? o GPS Tracker: record and send location o Phone-Call Recorder: record audio during phone call Need More Than Program Analysis 11 App Description App Code App Code App Permissions App Permissions

o User expectations o user perception + user judgment o Focus on permission  app descriptions o permissions (protecting user understandable resources) should be discussed Vision “Bridging the gap between user expectation  app behaviors” 12 App Description Sentence Permission Linkage

WHYPER Overview 13 Pandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security Enhance user experience while installing apps Enforce functionality disclosure on developers Complement program analysis to ensure justifications

Example Sentence in App Desc. 14 E.g., “Also you can share the yoga exercise to your friends via and SMS.” –Implication of using the contact permission –Permission sentences Keyword-based search on application descriptions

Problems with Ctrl + F Confounding effects: – Certain keywords such as “contact” have a confounding meaning – E.g., “... displays user contacts,...” vs “... contact me at Semantic inference: – Sentences often describe a sensitive operation without actually referring to keywords – E.g., “share yoga exercises with your friends via and SMS” 15

Natural Language Processing Natural Language Processing (NLP) techniques help computers understand NL artifacts In general, NLP is still difficult NLP on domain specific sentences with specific styles is feasible – Text2Policy: extraction of security policies from use cases [FSE 12] – APIInfer: inferring contracts from API docs [ICSE 12] – WHYPER: domain knowledge from API docs [USENIX Security 13]

Overview of WHYPER 17 APP Description APP Permission Semantic Graphs Preprocessor Intermediate Representation Generator Semantic Engine NLP Parser Semantic Graph Generator API Docs Annotated Description FOL Representation WHYPER Domain Knowledge

Preprocessor 18 Period Handling –Decimals, ellipsis, shorthand notations (Mr., Dr.) Sentence Boundaries –Tabs, bullet points, delimiters (:) –Symbols (*,-) and enumeration sentence Named Entity Handling –E.g., “Pandora internet radio” Abbreviation Handling –E.g., “Instant Message (IM)”

Intermediate-Representation Generator Alsoyoucanshareyogaexercisetoyourfriendsvia andSMS VB RB PRPMDNNDTNNNNSPRPNNP the Also you can share exercise your friends SMS yoga advmod nsubj aux dobj det nn prep_to poss prep_via conj_and the 19 share to you yoga exercise owned you via friends and SMS Predicate Governing Entity Dependent Entity

Semantic Engine share to you yoga exercise owned you via friends and SMS share WordNet Similarity 20 Inferred from API Docs Governing Entity Dependent Entity

Systematic approach to infer graphs o Identify resource associated with the permissions from the API class name oContactsContract.Contacts o Inspect the member variables and member methods to identify actions and subordinate resources oContactsContract.CommonDataKinds. Semantic-Graph Generator 21

Evaluation 22 Subjects – Permissions: READ_CONTACTS READ_CALENDAR RECORD_AUDIO – 581 application descriptions – 9,953 sentences Evaluation setup – Manual annotation of the sentences – WHYPER for identifying permission sentences – Comparison to keyword-based searching

Evaluation Results Precision and recall of WHYPER – Average precision (82.8%) and recall (81.5%) Comparison to keyword-based searching – Improving precision (41.6%) and recall (-1.2%) – E.g., microphone-blow into and call-record 23 PermissionKeywords READ_CONTACTS contact, data, number, name, READ_CALENDAR calendar, event, date, month, day, year RECORD_AUDIO record, audio, voice, capture, microphone

Access Control Policies (ACP) in Requirements Document Access control is often governed by security policies called Access Control Policies (ACP) –I–Includes rules to control which principals have access to which resources A policy rule includes four elements –S–Subject – HCP –A–Action – edit –R–Resource - patient's account –E–Effect - deny “The Health Care Personnel (HCP) does not have the ability to edit the patient's account.” ex.

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation Xiao et al. Automated Extraction of Security Policies from Natural-Language Software Documents. FSE

Example Technical Challenges in ACP Extraction Semantic Structure Variance – different ways to specify the same rule Negative Meaning Implicitness – verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.

Road Ahead: Yin-Yang View 27 App Description App Code App Code App Permissions App Permissions User-Perceived Information App Security Behavior o Reason about user-perceived info, e.g., WHYPER ( ) o Push app security behavior across the boundary (  ) o Check consistency across the boundary (  ) o Reduce user judgment effort ( ) App UIs, App categories, App metadata, User forums, … [functional] [security]

Text Analytics for Mobile App Security and Beyond 28 App Description App Code App Code App Permissions App Permissions App UIs, App categories, App metadata, User forums, … Acknowledgments: Supported in part by NSA Science of Security (SoS) Lablet, NSF SaTC, NSF SHF, NSF CAREER

29

Problems with Ctrl + F o Confounding effects: o Certain keywords such as “contact” have a confounding meaning. o For instance, “... displays user contacts,...” vs “... contact me at o Semantic Inference: o Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”. o For instance, “share yoga exercises with your friends via , sms”. 30

NLP techniques help computers understand NL artifacts NLP is still difficult NLP on domain specific sentences with specific styles is feasible Natural Language Processing (NLP) 31

RQ1 Results: Effectiveness of WHYPER Low FPs and FNs out of 9,061 sentences, only 129 are flagged as FPs among 581 applications, 109 applications (18.8%) contain at least one FP among 581 applications, 86 applications (14.8%) contain at least one FN PermissionSISI TPFPFNTNPrec.RecallF-ScoreAcc READ_CONTACTS , READ_CALENDAR , RECORD_AUDIO , TOTAL ,

Incorrect parsing “MyLink Advanced provides full synchronization of all Microsoft Outlook s (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB” Synonym analysis “You can now turn recordings into ringtones.” Result Analysis (False Positives) 33

Incorrect parsing Incorrect identification of sentence boundaries and limitations of underlying NLP infrastructure Limitations of Semantic Graphs Manual Augmentation microphone-blow into and call-record significant improvement of Delta Recalls: -6.6% to 0.6% Automatic mining from user comments and forums Result Analysis (False Negatives) 34

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation

Linguistic Analysis Incorporate syntactic and semantic analysis – syntactic structure -> noun group, verb group, etc. – semantic meaning -> subject, action, resource, negative meaning, etc. Provide New techniques for model extraction – Identify ACP and AS sentences – Infer semantic meaning

Common Techniques Shallow parsing Domain dictionary Anaphora resolution An HCP can view patient’s account. He is disallowed to change the patient’s account. SubjectMain Verb GroupObject NPPNP UPDATE HCP VG

Technical Challenges (TC) in ACP Extraction TC1: Semantic Structure Variance – different ways to specify the same rule TC2: Negative Meaning Implicitness – verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.

Semantic-Pattern Matching Address TC1 Semantic Structure Variance Compose pattern based on grammatical function An HCP is disallowed to change the patient’s account. ex. passive voiceto-infinitive phrase followed by

Negative-Expression Identification Address TC2 Negative Meaning Implicitness Negative expression – “not” in subject: – “not” in verb group: Negative meaning words in main verb group No HCP can edit patient’s account. ex. HCP can not edit patient’s account. HCP can never edit patient’s account. ex. An HCP is disallowed to change the patient’s account.

AS: Syntactic-Pattern Matching Syntactic elements – Subject, Main verb, Object Subject and Object Checking – subject is a not a user or object is not a resource Filtering negative-meaning sentences – Negative sentences tend not to describe ASs The prescription list should include medication, the name of the doctor... ex.

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation

ACP Model-Instance Construction Identify subject, action, and resource: – Subject: HCP – Action: change – Resource: patient’s account Infer effect: – Negative Expression: none – Negative Verb: disallow – Inferred Effect: deny An HCP is disallowed to change the patient’s account. ex. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny

AS Model-Instance Construction Use case patterns – industry use cases [DSN’09] – public use cases Model-Instance Construction The patient views access log. ex. Action Step ActorActionResource patient OUTPUT – view access log

Technical Challenges in Action-Step Extraction TC4: Transitive Subject TC5: Perspective Variance AS 1:He edits the account. AS 2: The system updates the account. AS 3: The system displays the updated account. HCP HCP views the updated account.

Subject Flow Tracking Address TC4 Transitive Subject Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system updates the account. Tracking Only system as subject replaced with HCP as subject

Perspective Conversion Address TC5 Perspective Variance Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system shows the updated account. Tracking Only system as subject and action is output Converting to “HCP views the updated account”

Evaluation – RQs RQ1: How effectively does Text2Policy identify ACP sentences in NL documents? RQ2: How effectively does Text2Policy extract ACP rules from ACP sentences? RQ3: How effectively does Text2Policy extract action steps from action-step sentences?

Evaluation – Subject iTrust open source project – – 448 use-case sentences (37 use cases) – preprocessed use cases Collected ACP sentences – 100 ACP sentences – From 17 sources (published papers and websites) A module of an IBMApp (financial domain) – 25 use cases

RQ1 ACP Sentence Identification Apply Text2Policy to identify ACP sentences in iTrust use cases and IBMApp use cases Text2Policy effectively identifies ACP sentences with precision and recall more than 88% Precision on IBMApp use cases is better – proprietary use cases are often of higher quality compared to open-source use cases

Evaluation – RQ2 Accuracy of Policy Extraction Apply Text2Policy to extract ACP rules from ACP sentences Text2Policy effectively extracts ACP model instances with accuracy above 86%

Evaluation – RQ3 Accuracy of Action-Step Extraction Apply Text2Policy to extract action steps from iTrust and IBMApp use cases Text2Policy effectively extracts AS model instances with accuracy above 81% Limitations: – Subordinate conjunction or else and long phrases

Detected Inconsistencies No violation between ASs against the extracted ACPs Inconsistent names used for referring to the same entity (e.g., user) across different use cases editor used in UC 4 of iTrust use cases actually refers to HCP, admin, and all users in UCs 1, 2, and 4 ex.

Summary Natural Language Processing (NLP) for domain- specific purposes is feasible – Challenging for general documents – Feasible for domain-specific sentences with specific styles New techniques are required – Addressing unique challenges in software engineering