Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A

Free for All! Assessing User Data Exposure to Advertising Libraries on Android
Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A. Gunter Christina Bell cbel

introduction Advertisers aim to generate ad conversions for their ad impressions Ad networks help by matching ads to users Assess potential risks – all possible behaviours (not only current behaviour) 4 major attack channels Unprotected APIs Protected APIs Access to host app files Observing user input Developed Pluto framework Analyse app and help developer assess potential risks

Background: Mobile Advertising
Many developers monetise their apps through ads. Data brokers: incorporate ad libraries through applications collect targeted data - user attributes and interests sell the user profiles to the advertising companies Data brokers collaborate with advertisers to create more suitable ads. Accurate data → targeting correct users → more clicks → $$$

Background: android Each app has unique UID and PID – which extends to the ad library Host apps share their privileges and resources with its ad libraries Linux DAC security system allows the ad library to access files generated by host app. Ad libraries have already been collating user info without user knowledge.

Background: NLP Data miners use NLP to determine if words are data points and to determine which part of speech each word is. Targeted data can be vague so NLP is used to determine the semantic meaning. Word net (English semantic dictionary) Similarity metric used to determine if words are associated.

Threat model Risk: potential compromise of an asset through the exploit of a vulnerability done by a threat. Asset – targeted data Vulnerability – What allows ad libraries to gain sensitive information. Threat – opportunistic ad library Attack channels are divided into categories In-app: dependent on the ad library host app Out-app: independent of the host app

In-app channels Ad libraries can leverage their position within their host app to access exposed data 1. Parse local files generated by host app at runtime 2. Inherit the permissions granted to its host app 3. Peek on host app user input Manual inspection of real world free apps Several data points were found to be exposed. “I’m Pregnant” (1-5 million downloads) exposed weight, height, current pregnancy month and day “TalkLife” (10-50 thousand downloads) exposed address, birth date, first name, password in plain text

In-app channels cont. Level One Inspection (L1-I): Attack technique that examines local files and protected APIs. Manual inspection of 262 applications Level Two Inspection (L2-I): Attack technique that utilises L1-1 as well as eavesdropping on user input. Manual inspection of 35 applications.

Out-app channels Access targeted data independently of their host application Public APIs that return app bundles. getInstalledPackages() getInstalledApplications() 12.54% of apps examined incorporate ad libraries that called either of these methods. Issue as user is not informed. Children apps don’t get permission of parent. E.g. Radio Disney

Pluto framework Modular framework for estimating in-app and out-app targeted data exposure of an app. In-app Pluto focuses on analysing local files generated by the host app. Layout file Resource file Manifest file Runtime generated files Out-app Pluto uses app bundles. Predicts which apps will be installed together Machine learning to draw inferences

In-app pluto Dynamic analysis module (DAM) File miners
Runs host app in emulator Decompiles app and extracts file File miners User attributes and interests as a matching goal Goal reached when data point present in file Content disambiguation layer uses NLP similarity metric to determine whether to accept match - droidLESK

Out app pluto Out-app Pluto aims to estimate what is the potential data exposure to an ad library that uses getInstalledPackages() and getInstalledApplications() Explore what data points can be exposed from this list Co-installation patterns (CIP) Frequent pattern mining (FPM) to find application co-installation patterns. Confidence Facebook => skype, viber with 70% confidence. Supervised learning Infer user attributes from the CIP estimated app bundles Classifiers used to class patterns found to relevant data points.

Criticism/recommendations
Pluto NLP similarity metric takes advantage of software best practices. When extracting words camel case and snake case are assumed to be used. E.g. userProfile and user_profile detected but not userprof, uProf or up. This means more research into other naming conventions is needed. Research was limited to the four attack channels discussed. This could be expanded into other attack areas, such as camera, gyroscope, accelerometer or audio. Size of the study was limited to the 2535 apps that were analysed. This is small compared to the 2.8 million apps on the Google Play store. [1] [1]

Criticism/recommendations
The ‘risk’ of an app that is reported by Pluto is generalised. Each user is different. User has no input to what data points they consider as “risky” If an app crashed it was simply removed from the study No investigation Obfuscated applications or hidden ad libraries could pose a problem for Pluto Pluto had a large number of false positives in evaluation Tried to combat with droidLESK

Questions?

Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A

Similar presentations

Presentation on theme: "Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A

Similar presentations

Presentation on theme: "Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang and Carl A"— Presentation transcript:

Similar presentations

About project

Feedback