Dynamic information-flow tracking

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

William Enck, Peter Gilbert, Byung-Gon Chun, Landon P
Mobile Handset Memory Management
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
OS Spring’03 Introduction Operating Systems Spring 2003.
COMP 14: Intro. to Intro. to Programming May 23, 2000 Nick Vallidis.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones Presented By: Steven Zittrower William Enck ( Penn St) (Duke)
D2Taint: Differentiated and Dynamic Information Flow Tracking on Smartphones for Numerous Data Sources Boxuan Gu, Xinfeng Li, Gang Li, Adam C. Champion,
Authors: William Enck The Pennsylvania State University Peter Gilbert Duke University Byung-Gon Chun Intel Labs Landon P. Cox Duke University Jaeyeon Jung.
Java for enterprise networks Version 2.3 Feb 2008 JSP Validation and Exception handling Why validate? Client side validation.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
A Presentation Of TaintDroid & Related Topics
University of Central Florida TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones Written by Enck, Gilbert,
CS 11 java track: lecture 1 Administrivia need a CS cluster account cgi-bin/sysadmin/account_request.cgi need to know UNIX
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Unit 1: Computing Fundamentals. Computer Tour-There are 7 major components inside a computer  Write down each major component as it is discussed.  Watch.
Enhancing Mobile Apps to Use Sensor Hubs without Programmer Effort Haichen Shen, Aruna Balasubramanian, Anthony LaMarca, David Wetherall 1.
Information flow Landon Cox April 1, Information flow Crucial goal of secure system –Prevent inappropriate information flows –Can model “appropriateness”
Better Performance Through Thread-local Emulation Ali Razeen, Valentin Pistol, Alexander Meijer, and Landon P. Cox Duke University.
Authors: William Enck & Patrick McDaniel In collaboration with: Duke University and Intel Labs Presentation: Ed Novak 1.
Privacy in Mobile Systems Karthik Dantu and Steve Ko.
The Basics of Android App Development Sankarshan Mridha Satadal Sengupta.
Dynamic Allocation in C
Object Lifetime and Pointers
Chapter 8: Recursion Data Structures in Java: From Abstract Data Types to the Java Collections Framework by Simon Gray.
User-Written Functions
Understanding Android Security
Segmentation COMP 755.
EMERALDS Landon Cox March 22, 2017.
WWW and HTTP King Fahd University of Petroleum & Minerals
Introduction to Computer Science / Procedural – 67130
Scheduler activations
Outline What does the OS protect? Authentication for operating systems
TaintART: A Practical Multi-level Information-Flow Tracking System for Android RunTime Sadiq Basha.
How will execution time grow with SIZE?
CSE 374 Programming Concepts & Tools
Swapping Segmented paging allows us to have non-contiguous allocations
A Closer Look at Instruction Set Architectures
Outline What does the OS protect? Authentication for operating systems
Threads and Cooperation
Suwen Zhu, Long Lu, Kapil Singh
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Variables Title slide variables.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lecture Topics: 11/1 General Operating System Concepts Processes
File Storage and Indexing
Prof. Leonardo Mostarda University of Camerino
Names and Binding In Text: Chapter 5.
CSE451 Virtual Memory Paging Autumn 2002
Introduction to Data Structure
ECE 352 Digital System Fundamentals
CSCI 380: Operating Systems William Killian
Understanding Android Security
Data Structures & Algorithms
Mobile Programming Dr. Mohsin Ali Memon.
Outline System architecture Current work Experiments Next Steps
CS703 – Advanced Operating Systems
COMP755 Advanced Operating Systems
Student : Yan Wang student ID:
Interrupts and System Calls
Dynamic Binary Translators and Instrumenters
CSE 542: Operating Systems
IS 135 Business Programming
Running & Testing Programs :: Translators
Presentation transcript:

Dynamic information-flow tracking Landon Cox March 24, 2017

Information flow Crucial goal of secure system Prevent inappropriate information flows Can model “appropriateness” with a lattice of tags i.e., only allow “low” objects to flow into “high” objects Non-interference := all flows are appropriate Information-flow analysis Helps track where sensitive data goes Getting this right is tricky

Information flow Building blocks Tracking information Storage objects (information receptacles) Processes (move information to/from objects) Tracking information Tag (or label) describes information sensitivity Each storage object is assigned a tag Need to update tags as processes execute

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged What must we assume about any of P’s outputs? Must assume that they contain sensitive information Which processes are allowed to communicate with P? Other processes that are allowed to read D Why is this problematic? Probably want P to communicate with processes that can’t access D Hard to do anything useful otherwise

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client Password file

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Password file

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw How do you solve this? Password file

Often use a trusted “declassifier” Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw How do you solve this? Password file Often use a trusted “declassifier”

Small piece of code trusted to remove tags Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged accept uid/pw; if (pw not in file) { return error; } else { fork/exec shell; } SSH client uid/pw Small piece of code trusted to remove tags Declassifier Password file

Information flow Issue 1: precision Say that storage object is an address space If process P reads sensitive data item D P’s entire address space is tagged What else could we do to improve precision? Use finer-grained storage objects Tag program variables or memory words What are the implications for performance? Have to update tags much more frequently i.e., every time an instruction executes Can introduce a lot of overhead

Tracking explicit flows Propagate taint tags with data flows c ← a op b taint(c) ← taint(a) ∪ taint(b) setTaint(a,t) taint(a) ← {t} c = a + b taint(c) ← {t} ∪ {} = {t} Send(c,foo.net) Can foo.net see a?

Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z); } Each line is an explicit flow from source operands to destination operand

Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z); } Very easy to implement: just interpose on each instruction to update each var’s tag

Where is the implicit flow? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } y = 10; print (x); print (y); Where is the implicit flow?

How would you update x’s tag? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } y = 10; print (x); print (y); How would you update x’s tag?

What is tricky about this code? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1; } else { y = 10; } print (x); print (y); What is tricky about this code?

What is trickier about this code? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { baz (&x); } else { bar (&y); } print (x); print (y); What is trickier about this code?

Where is the implicit flow here? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0); } else { exit(1); } y = 10; print (x); print (y); Where is the implicit flow here?

How would you track this? Information flow Issue 2: explicit vs implicit flows Two ways to propagate information Explicitly := direct transfer from one object to another Implicitly := indirect transfer usually via control flow // a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0); } else { exit(1); } y = 10; print (x); print (y); How would you track this?

Hidden channels Get system to communicate in unintended ways Example: tenex (supposedly secure OS) Created a team to break in Team had all passwords within 48 hours … oops. Goal: require 256^8 tries to see if password is right Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; }

Hidden channels: tenex Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; } How to break? (user passes in input buffer, virtual mem faults are visible) Specially arrange the input’s layout in memory Force a page fault if second character is read If you get a fault, the first character was right Do again for third, fourth, … eighth character Can check the password in 256*8 tries

Course administration Project proposals Due today (ok if you send it to me by Monday) Guidelines in the syllabus One page should be fine Amount of work Three weeks of effort Focus on answering one interesting question

Cloud  large-scale analysis, collection, dissemination. Mobile  present at work, home, and play. Sensors  rich, personal data. High-level overview of today’s modern phone-based system. Devices place computation, communication and sensing at the heart of nearly all human activity Sensors have access to lots of rich, personal data. Connectivity to the cloud allows users to participate in large-scale services that make use of this rich, personal data. ••••••••• me@gmail.com Username Password

App-centric operating systems Apps access sensitive information in many contexts Location, images, and communication Home, work, and play Apps run on behalf of many stakeholders Users, services, developers, platform providers, advertisers How do we manage apps instead of users?

Monitoring app behavior Permissions are coarse. No insight into what is collected and by whom.

Consumer: “Why is my wallpaper app sending my phone number to another country?” http://blog.mylookout.com/2010/07/mobile-application-analysis-blackhat/

Enterprise: “Who is collecting information about our workers?”

Wider interest in the issue Earlier http://online.wsj.com/article/SB20001424052748703806304576242923804770968.html

Emerging malware threat New mobile malware1 New mobile malware family or variant2 1McAfee Threats Report: Q1 2012 - http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q1-2012.pdf 2F-Secure Mobile Threat Report Q1 2012 - http://www.f-secure.com/weblog/archives/MobileThreatReport_Q1_2012.pdf

Where does data go after you grant access? Add a big picture of how tainting works. Anchor to what audience knows.

Monitoring goals Monitor where apps send data Monitor apps at runtime What happens after you grant access? Is observed behavior expected? Monitor apps at runtime Want users to monitor their own apps Must balance accuracy and efficiency Solution: TaintDroid Original collaboration with Penn State, Intel Will Enck (NCSU), Jaeyeon Jung (Samsung), others Better mesh intro to tainting

Check tags of emitted data Track how information propagates Taint tracking TaintDroid: system-wide taint tracking for Android Records “explicit” data dependencies via taint tags Does not capture “implicit” data dependencies Check tags of emitted data Track how information propagates Tag data as enters app ••••••••• me@gmail.com Username Password

Taint tracking TaintDroid: system-wide taint tracking for Android Records “explicit” data dependencies via taint tags Does not capture “implicit” data dependencies Key issues for tag propagation How are tags stored? What is the tag-propagation logic? Is tracking precise and efficient? Project website: http://appanalysis.org

Tag propagation Goal: balance precision and efficiency Process-grained Fast Process-grained (All outputs tainted) Ideal Instruction-grained (2-20x overhead) Slow Imprecise Precise

Native system libraries Multi-level approach Variable-level tracking through Dalvik VM (DEX instructions) Patch state after native method invocation Extend tracking to IPC and file system Message-level tracking Application code Application code msg Dalvik VM Dalvik VM Variable-level tracking Native system libraries Method-level tracking Network File system File-level tracking

Variable-level tracking Tag-propagation logic for Dalvik executables (DEX)

Variable-level tracking out0 Modified Dalvik VM Store and propagate 32-bit tags Local vars and args Store tags adjacent to vars on stack Correspond to VM registers 64-bit vars require two tags Class fields Store tags inside heap objects Arrays One tag per array Trade precision for efficient storage Performance optimizations Per-variable tags reduce storage overhead Adjacent tags provide spatial locality out0 taint tag out1 out1 taint tag SP (unused) VM goop FP v0 == local0 v0 taint tag v1 == local1 v1 taint tag v2 == in0 … v4 taint tag

Method-grained tracking Huge opportunity for performance gains JNI code is often CPU intensive Challenge for method-grained tracking In worst case, must manually reason about side-effects Luckily, a very simple heuristic works most of the time class java.lang.Math { public static double cos (double d); }

Method-grained tracking Tainting heuristic “Assign union of arguments’ tags to return value on exit.” Most JNI methods have no side effects Many JNI methods operate on native types When it doesn’t work, use method profiles Generic framework for defining argument/retval dependencies So far, only needed to define for IBM charset converter See paper for more details … class java.lang.Math { public static double cos (double d); }

Method-grained tracking Found 2,844 JNI methods in Android source 913 did not use Object references Others could induce false negatives Third-party JNI is not supported Apps must be written entirely in Java Survey of Android Market, ~25% used .so file Subject of ongoing research

Evaluation Is TaintDroid fast and precise? Process-grained (All outputs tainted) TaintDroid Instruction-grained (2-20x overhead) Slow Imprecise Precise

Performance evaluation 20% overhead (extra memory accesses) Not shown 4.4% memory overhead 14% overhead (higher is better)

Performance evaluation Reasons for efficiency (1) Method-grained tracking of JNI calls (2) Spatial locality of taint tags (3) One tag per array (higher is better)

App study Selected 30 apps from Android Market App permissions Biased toward popular apps Sampled from 12 categories App permissions Access to Internet Access to location, camera, phone state, mic No native libraries Ran apps manually under TaintDroid

App study Of 105 flagged connections, only 37 to expected servers

App study: location 15 of 30 apps shared location with ad server admob.com, ad.qwapi.com, ads.mobclix.com, data.flurry.com Most traffic was plaintext (e.g., AdMob HTTP GET) data.flurry.com used binary format In no cases were users informed by EULA In one case, app sent location every 30 seconds ...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85 B717D9195A6722A9&d%5Bcoord%5D=47.661227890000006%2C-122.31589477&...

App study: phone identifiers 7 apps sent device id (IMEI) 2 apps sent phone info (Ph. #, IMSI*, ICC-ID) Done without informing the user One app’s EULA indicated the IMEI was sent Another app sent the hash of the IMEI Frequency was app-specific One sent info every time the phone booted

appanalysis.org Source code available http://appanalysis.org/ Most recent version is for Android 4.3 Great platform for research Compatible with vast majority of Android apps Playground for all kinds of information-flow projects Video demo by Peter Gilbert

TaintDroid demo http://www.youtube.com/watch?v=qnLujX1Dw4Y

Media coverage Earlier

Limitations Implicit flows Native code Fundamentally difficult problem Can handle passwords (SpanDex, USENIX Sec) Native code Ongoing work Talk to Ali!