Download presentation
Presentation is loading. Please wait.
Published by侗重 嵇 Modified over 6 years ago
1
Towards Obfuscation Resilient Software Plagiarism Detection
Sencun Zhu Joint work with Fangfang Zhang, Xinran Wang, Yoon-Chan Jhi, Xiaoqi Jia,Dinghao Wu,Peng Liu The Pennsylvania State University 1
2
Blossom of open source projects
SourceForge.net has over 430,000 registered open source projects as of March 2014 3.7 million developers 41.8 million users 4.8 million downloads a day Mobile apps development - a fast growing industry Over 1 million apps on Google Play and iTunes stores in the end of 2013 2
3
Software Piracy/Theft/Plagiarism
Business software alliance publishes a study report about illegal copying and unauthorized resale of applications every year, indicating 51.4 billion of huge loss in 2009 In 2012, Microsoft accused La Familia, Mexico Drug Cartel, for suspicious piracy of Office 2007 in Mexico, and this unauthorized business earns $2.2 million dollars every day In 2005, IBM had to pay $400 millions to Compuware because of code theft
4
Smartphone Application Repackaging
Repackage mobile apps to make profit App repackaging is also a favorable vehicle for malware propagation leveraging the popularity of mobile apps 5% to 13% of apps in the third-party app markets repackaged the apps from the official Android market [1] 1083 (or 86.0%) of 1260 malware samples were repackaged versions of legitimate apps with malicious payloads [2]
5
Algorithm Plagiarism Patented algorithms Detection is important when
Implemented by others Detection is important when One wants to know if the algorithm is illegally used Or prevent your own employee from violating the IP law The manifest ¯le lists the package name, version number, critical components of the app, and the associate permissions to each component. The resource folder includes all the raw resource ¯les, such as images and audio ¯les, and the XML ¯les which describe the layouts of user interfaces. The Dalvik executable contains all the classes that implement the functionality of all the primary components of an app.
6
Related Work PC Apps Smartphone Apps User Interface -- Code Logic
ViewDroid Code Logic Static source code: [31] Static opcode: [7] Whole program path: [9] PDG: [32], GPLAG [4] API: [10, 33, 8, 11] System call: [18, 17] Clone Detection: [35, 36, 37, 5, 38] Opcode: DroidMOSS [2] Juxtapp [25] AST: [34] PDG: DNADroid [24] Program Semantics VaPD [3], LoPD Algorithm-level ValPD
7
Problem Statement Design detection methods that are the following features High Accuracy Obfuscation Resilience Scalability Under the following attack models Lazy attack Amateur attack Malware smartphone apps are user behavior intensive and Android event-driven, and the interactions between users and apps are performed through user interfaces (i.e., app views). Some characters of views (e.g. the navigation between views) are unique for each independently developed app. Second, in both types of repackaging, because attackers want to leverage the popularity of a target app, they will keep the repackaged apps' look- and-feel similar to the original one in the user interface level. Speci¯cally, it is built upon a robust birthmark called view graph, which is a graph constructed from all views through static analysis and catches the navigation relation among app views. 4/10/2019
8
How do we model app’s look-and-feel?
Motivation Observation 1: Apps are user behavior intensive and Android event driven The interactions between users and apps through UI Observation 2: Attackers leverage the popularity of a target app keep the repackaged apps' look and feel similar to the original one in the user interface level (i.e., app views). Some characters of views (e.g. the navigation between views) are unique for each independently developed app How do we model app’s look-and-feel?
9
Android App Background
.apk file – download from app market Manifest file: AndroidManifest.xml Resource files: files in the res directory A compiled dalvik executable: classes.dex Activities Four components communicate through intent message Activity: screen views, organized by a stack Service: background tasks, no user interface Broadcast Receivers: listen to broadcast messages Content Provider: manage data sharing, query etc. The manifest ¯le lists the package name, version number, critical components of the app, and the associate permissions to each component. The resource folder includes all the raw resource ¯les, such as images and audio ¯les, and the XML ¯les which describe the layouts of user interfaces. The Dalvik executable contains all the classes that implement the functionality of all the primary components of an app.
10
Our Birthmark View View Graph Feature View Graph A user interface
Its corresponding activity View Graph A directed graph Nodes: Views Edges <a, b>: View navigates from a to b Statically constructed Feature View Graph
11
System Architecture
12
View Graph Construction
Generate view nodes Activity: onCreate() setContentView() / addPreferencesFromResource() Extract view node features Invocation vector: Android framework specific APIs Generate edges startActivity() / startActivityForResult() Intent objects as the parameter Extract edge features onClick(), onTouch(), OnItemSelected()
13
View Graph Example a
14
View Graph Example a
15
Graph Similarity VF2 algorithm Pre-filters:
16
Evaluation 10, 311 top Android apps from Google Play 20 categories
Totally 573; 872 app pairs are compared.
17
Results 129 false positives Attack types 112: common libraries
17: views are too simple Attack types 262 lazy attacks 187 amateur attacks 93 malware Most (112 out of 129) of the false matches are caused by the invocations of ad libraries. When two apps share the same ad libraries and one app's graph size is relatively small, the matched nodes related to the common ad libraries will result in a high similarity score. These false matches can be eliminated by whitelisting known ad libraries. The other 17 false matches are due to that one of the apps in each pair is very simple.
18
Malware Reported by virustotal.com Virus:BAT/Rbtg.gen
19
Repackaging Clustering
Keyword: Sudoku Airpush Adware, which aggressively shows ads in the Android noti¯cation bar
20
Repackaging Clustering
Keyword: Flashlight
21
Obfuscation Resilience
22
Thank you ! 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.