Download presentation
Presentation is loading. Please wait.
1
Dynamic information-flow tracking
Landon Cox March 24, 2017
2
Information flow Crucial goal of secure system
Prevent inappropriate information flows Can model “appropriateness” with a lattice of tags i.e., only allow “low” objects to flow into “high” objects Non-interference := all flows are appropriate Information-flow analysis Helps track where sensitive data goes Getting this right is tricky
3
Information flow Building blocks Tracking information
Storage objects (information receptacles) Processes (move information to/from objects) Tracking information Tag (or label) describes information sensitivity Each storage object is assigned a tag Need to update tags as processes execute
4
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged What must we assume about any of P’s outputs? Must assume that they contain sensitive information Which processes are allowed to communicate with P? Other processes that are allowed to read D Why is this problematic? Probably want P to communicate with processes that can’t access D Hard to do anything useful otherwise
5
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client Password file
6
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file
7
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file
8
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file
9
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw How do you solve this? Password file
10
Often use a trusted “declassifier”
Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw How do you solve this? Password file Often use a trusted “declassifier”
11
Small piece of code trusted to remove tags
Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Small piece of code trusted to remove tags Declassifier Password file
12
Information flow Issue 1: precision
Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged What else could we do to improve precision? Use finer-grained storage objects Tag program variables or memory words What are the implications for performance? Have to update tags much more frequently i.e., every time an instruction executes Can introduce a lot of overhead
13
Tracking explicit flows
Propagate taint tags with data flows c ← a op b taint(c) ← taint(a) ∪ taint(b) setTaint(a,t) taint(a) ← {t} c = a + b taint(c) ← {t} ∪ {} = {t} Send(c,foo.net) Can foo.net see a?
14
Information flow Issue 2: explicit vs implicit flows
Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z); } Each line is an explicit flow from source operands to destination operand
15
Information flow Issue 2: explicit vs implicit flows
Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z); } Very easy to implement: just interpose on each instruction to update each var’s tag
16
Where is the implicit flow?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } y = 10; print (x); print (y); Where is the implicit flow?
17
How would you update x’s tag?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } y = 10; print (x); print (y); How would you update x’s tag?
18
What is tricky about this code?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } else { y = 10; } print (x); print (y); What is tricky about this code?
19
What is trickier about this code?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { baz (&x); } else { bar (&y); } print (x); print (y); What is trickier about this code?
20
Where is the implicit flow here?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0); } else { exit(1); } y = 10; print (x); print (y); Where is the implicit flow here?
21
How would you track this?
Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0); } else { exit(1); } y = 10; print (x); print (y); How would you track this?
22
Hidden channels Get system to communicate in unintended ways
Example: tenex (supposedly secure OS) Created a team to break in Team had all passwords within 48 hours … oops. Goal: require 256^8 tries to see if password is right Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; }
23
Hidden channels: tenex
Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; } How to break? (user passes in input buffer, virtual mem faults are visible) Specially arrange the input’s layout in memory Force a page fault if second character is read If you get a fault, the first character was right Do again for third, fourth, … eighth character Can check the password in 256*8 tries
24
Course administration
Project proposals Due today (ok if you send it to me by Monday) Guidelines in the syllabus One page should be fine Amount of work Three weeks of effort Focus on answering one interesting question
25
Cloud large-scale analysis, collection, dissemination.
Mobile present at work, home, and play. Sensors rich, personal data. High-level overview of today’s modern phone-based system. Devices place computation, communication and sensing at the heart of nearly all human activity Sensors have access to lots of rich, personal data. Connectivity to the cloud allows users to participate in large-scale services that make use of this rich, personal data. ••••••••• Username Password
26
App-centric operating systems
Apps access sensitive information in many contexts Location, images, and communication Home, work, and play Apps run on behalf of many stakeholders Users, services, developers, platform providers, advertisers How do we manage apps instead of users?
27
Monitoring app behavior
Permissions are coarse. No insight into what is collected and by whom.
28
Consumer: “Why is my wallpaper app sending my phone number to another country?”
29
Enterprise: “Who is collecting information about our workers?”
30
Wider interest in the issue
Earlier
31
Emerging malware threat
New mobile malware1 New mobile malware family or variant2 1McAfee Threats Report: Q 2F-Secure Mobile Threat Report Q
32
Where does data go after you grant access?
Add a big picture of how tainting works. Anchor to what audience knows.
33
Monitoring goals Monitor where apps send data Monitor apps at runtime
What happens after you grant access? Is observed behavior expected? Monitor apps at runtime Want users to monitor their own apps Must balance accuracy and efficiency Solution: TaintDroid Original collaboration with Penn State, Intel Will Enck (NCSU), Jaeyeon Jung (Samsung), others Better mesh intro to tainting
34
Check tags of emitted data Track how information propagates
Taint tracking TaintDroid: system-wide taint tracking for Android Records “explicit” data dependencies via taint tags Does not capture “implicit” data dependencies Check tags of emitted data Track how information propagates Tag data as enters app ••••••••• Username Password
35
Taint tracking TaintDroid: system-wide taint tracking for Android
Records “explicit” data dependencies via taint tags Does not capture “implicit” data dependencies Key issues for tag propagation How are tags stored? What is the tag-propagation logic? Is tracking precise and efficient? Project website:
36
Tag propagation Goal: balance precision and efficiency Process-grained
Fast Process-grained (All outputs tainted) Ideal Instruction-grained (2-20x overhead) Slow Imprecise Precise
37
Native system libraries
Multi-level approach Variable-level tracking through Dalvik VM (DEX instructions) Patch state after native method invocation Extend tracking to IPC and file system Message-level tracking Application code Application code msg Dalvik VM Dalvik VM Variable-level tracking Native system libraries Method-level tracking Network File system File-level tracking
38
Variable-level tracking
Tag-propagation logic for Dalvik executables (DEX)
39
Variable-level tracking
out0 Modified Dalvik VM Store and propagate 32-bit tags Local vars and args Store tags adjacent to vars on stack Correspond to VM registers 64-bit vars require two tags Class fields Store tags inside heap objects Arrays One tag per array Trade precision for efficient storage Performance optimizations Per-variable tags reduce storage overhead Adjacent tags provide spatial locality out0 taint tag out1 out1 taint tag SP (unused) VM goop FP v0 == local0 v0 taint tag v1 == local1 v1 taint tag v2 == in0 … v4 taint tag
40
Method-grained tracking
Huge opportunity for performance gains JNI code is often CPU intensive Challenge for method-grained tracking In worst case, must manually reason about side-effects Luckily, a very simple heuristic works most of the time class java.lang.Math { public static double cos (double d); }
41
Method-grained tracking
Tainting heuristic “Assign union of arguments’ tags to return value on exit.” Most JNI methods have no side effects Many JNI methods operate on native types When it doesn’t work, use method profiles Generic framework for defining argument/retval dependencies So far, only needed to define for IBM charset converter See paper for more details … class java.lang.Math { public static double cos (double d); }
42
Method-grained tracking
Found 2,844 JNI methods in Android source 913 did not use Object references Others could induce false negatives Third-party JNI is not supported Apps must be written entirely in Java Survey of Android Market, ~25% used .so file Subject of ongoing research
43
Evaluation Is TaintDroid fast and precise? Process-grained
(All outputs tainted) TaintDroid Instruction-grained (2-20x overhead) Slow Imprecise Precise
44
Performance evaluation
20% overhead (extra memory accesses) Not shown 4.4% memory overhead 14% overhead (higher is better)
45
Performance evaluation
Reasons for efficiency (1) Method-grained tracking of JNI calls (2) Spatial locality of taint tags (3) One tag per array (higher is better)
46
App study Selected 30 apps from Android Market App permissions
Biased toward popular apps Sampled from 12 categories App permissions Access to Internet Access to location, camera, phone state, mic No native libraries Ran apps manually under TaintDroid
47
App study Of 105 flagged connections, only 37 to expected servers
48
App study: location 15 of 30 apps shared location with ad server
admob.com, ad.qwapi.com, ads.mobclix.com, data.flurry.com Most traffic was plaintext (e.g., AdMob HTTP GET) data.flurry.com used binary format In no cases were users informed by EULA In one case, app sent location every 30 seconds ...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85 B717D9195A6722A9&d%5Bcoord%5D= %2C &...
49
App study: phone identifiers
7 apps sent device id (IMEI) 2 apps sent phone info (Ph. #, IMSI*, ICC-ID) Done without informing the user One app’s EULA indicated the IMEI was sent Another app sent the hash of the IMEI Frequency was app-specific One sent info every time the phone booted
50
appanalysis.org Source code available http://appanalysis.org/
Most recent version is for Android 4.3 Great platform for research Compatible with vast majority of Android apps Playground for all kinds of information-flow projects Video demo by Peter Gilbert
51
TaintDroid demo
52
Media coverage Earlier
53
Limitations Implicit flows Native code Fundamentally difficult problem
Can handle passwords (SpanDex, USENIX Sec) Native code Ongoing work Talk to Ali!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.