Process Coloring: An Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu, Ryan Riley Department of Computer Science and Center for Education and Research in Information Assurance and Security (CERIAS) Purdue University Xuxian Jiang Department of Computer Science North Carolina State University NICIAR PI Meeting, Washington, DC, September 24, 2008
One-sentence summary: Propagating and logging provenance information (“colors”) along OS-level information flows for malware detection and sensitive data protection Process Coloring (PC) Overview
httpd s80httpdrcinit s45named s30sendmail s55sshd s80httpd s30sendmail s45named s55sshd /bin/sh wget Rootkit Local files netcat /etc/shadow Confidential Info /etc/shadow Confidential Info Initial coloring Coloring diffusion Syscall Log Capability 3: Color-based log partition for contamination analysis Capability 3: Color-based log partition for contamination analysis PC Usage Scenario: Server-Side Malware Attack Capability 1: PC malware alert “No shell process should have the color of Apache” Capability 1: PC malware alert “No shell process should have the color of Apache” Capability 2: Color-based identification of malware break-in point Capability 2: Color-based identification of malware break-in point Demo at:
firefox notepad turbotax warcraft Web Browser Tax Editor Games Agobot Tax files PC Usage Scenario: Client-Side Malware Attack Agobot PC malware alert “Web browser and tax colors should never mix” PC malware alert “Web browser and tax colors should never mix” Demo at:
Heilmeier Question 1: What are you trying to do? Tracking and logging OS-level information flows Being extended to both OS and language levels (“PC+DDFA”) Tainting processes and data with provenance information (“colors”) for Detecting and investigating malware activities Enforcing sensitive data protection policies Using virtualization for stronger tamper-resistance
Heilmeier Question 2: How is it done now? Information flow tracking at multiple levels OS level Only considering direct causality in each system call No provenance (“color”) tainting and propagation Language level Only tracking information Flow within a program No information flow tracking across programs Instruction level Difficult to understand attack semantics Significant runtime performance overhead
Heilmeier Question 3: What’s new and why will it succeed? What’s new? Color-based malware alert and sensitive data protection Supporting on-line detection and off-line forensics One of the first to combine OS and language-level information flows Why will it succeed? Practical, deployable system based on classic theory Running prototype showing effectiveness and practicality Attracting external interests (SwRI, Lockheed Martin)
Heilmeier Question 4: If successful, what difference will it make? A system-level framework for attack/violation detection, investigation and recovery Specification and enforcement of color-based policies for malware alert and data protection Ready for virtualization-based infrastructures (e.g. honeynets, enterprises and data centers)
Timeline Heilmeier Question 5: Your timeline, cost and success metrics? 6/200712/076/0812/08 - Basic PC prototype for server-side operation - PC prototype for client- side operation (“brown problem” solution) - Set up “living lab” VM for evaluation - PC prototype for client- side operation (“brown problem” solution) - Set up “living lab” VM for evaluation - Extensive evaluation - Design, prototyping and demonstration of “PC+DDFA” integration - Extensive evaluation - Design, prototyping and demonstration of “PC+DDFA” integration - Recovery and replay - PC across machines - Data lifetime analysis for data theft defense - Recovery and replay - PC across machines - Data lifetime analysis for data theft defense
Summary of Achievement (Since April) Improved sink insulation implementation Cleaned up log management and visualization Set up “living lab” client VM for evaluation Performed benchmark evaluation of PC Started technology transfer activities Completed preliminary design and prototype for “PC+DDFA” Joint presentation in a moment
“Living Lab” VM: End User’s View
“Living Lab” VM: Administrator’s View
Evaluation Metrics – Efficiency
Evaluation with Malware (Agobot, PUD bot…)
APPROACH Track OS-level information flows Taint processes/data based on their influence between each other Record color(s) in log entries Integrate with intra-process DDFA PLAN / PROGRESS Model process color diffusion in real OS (done) Demonstrate PC prototype in a malware scenario Includes both server (done) and client (done) side solutions Mitigate color saturation effect in malware alert Profiling and visualization (done) Reducing false positives caused by legitimate color mixing (done) Proof-of-concept demo of “PC+DDFA” (Dec.08) Evaluate PC in “living lab” VMs (July.08 – Dec.08) Process Coloring (PC) For Malware Alert and Investigation - An OS-level Information Flow Preserving Approach LSSD NEW CAPABILITIES Color-based malware alert Color-based malware break-in point identification Color-based log partitioning APPLICATIONS System monitoring and malware (e.g. bots) detection Malware forensics Sensitive data protection
Thank you! For more information about the Process Coloring project: