Enabling Internet Malware Investigation and Defense Using Virtualization Dongyan Xu Department of Computer Science and Center for Education and Research in Information Assurance and Security (CERIAS) Purdue University
Collaborators Florian Buchholz (James Madison U.) Xuxian Jiang (George Mason U.) Junghwan Rhee (Purdue U.) Ryan Riley (Purdue U.) Eugene H. Spafford (Purdue U.) AAron Walters (Fortify Research) Helen Wang (Microsoft Research) Yi-Min Wang (Microsoft Research)
Motivation: Rampant Malware Outbreaks Blaster Nimda CodeRed Source: Symantec Internet Security Threat Report Internet malware remains a top threat Malware: Virus, Worm, Spyware, Keylogger, Bot…
Motivation: Stealthy Malware Recruiting Vulnerable Nodes (e.g. to create Botnet) Zero-day exploits w/o software patches Low-and-slow propagation New attack strategies Exploiting vulnerable client-side software, such as IE Propagating malware with RFID tags Providing “Value-Added” Service (or rather, harm) DDoS, spamming, identity theft, … Sell/rent botnets for profit
Reality & Challenges Lack of investigation platform that enables Early detection and capture of malware incidents Replay and observation of malware behavior At Internet scale this is hard to build Increased spreading speed, sophistication, and malice Slammer Worms infect 75,000 hosts in 10 minutes (Moore et al, 2003) Stealthy Malware, Zero-day Exploits, Mutations, …
Our Integrated Malware Research Framework Malware Trap Behavioral Footprinting Contamination Tracking Malware Playground Back-End: vGround Playground External Infection Internal Contamination System Randomization Front-End: Collapsar Honeyfarm Collapsar: Security’04, NDSS’06, JPDC’06 vGround: RAID’05 Proc. Coloring: ICDCS’06 InvestigationDefenseDetection WORM’06
Part I: Malware Capture Malware Trap Behavioral Footprinting Contamination Tracking Front-End: Collapsar* Malware Playground Back-End: vGround System Randomization Collapsar: Security’04, NDSS’06, JPDC’06 vGround: RAID’05 Coloring: ICDCS’06 WORM’06
Existing Approach: Honeypot Domain B Domain A Domain C Internet Two Weaknesses Manageability vs. Detection Coverage Security Risks On-Site Attack Occurrences
Our Approach: Collapsar Domain B Domain A Domain C Front-End VM-based Honeypots Management Station Collapsar Center Correlation Engine Redirector Collapsar Honeyfarm Redirector Benefit 1: Centralized management of honeypots w/ distributed (virtual) presence Benefit 1: Centralized management of honeypots w/ distributed (virtual) presence Benefit 2: Off-site attack occurrences Benefit 2: Off-site attack occurrences Benefit 3: New possibilities for real-time attack correlation and log mining Benefit 3: New possibilities for real-time attack correlation and log mining
VM-based Honeypots Domain B Domain A Domain C Front-End Collapsar Center Redirector Collapsar as a Server-side Honeyfarm Passive Honeypots w/ Vulnerable Server-side Software Web Servers (e.g., Apache, IIS, …) Database Servers (e.g., Oracle, MySQL, …) Blaster (2003)Sasser (2004)Zotob (2005)
Malicious Web Server VM-based Honeypots Domain B Domain A Domain C Front-End Collapsar Center Redirector Collapsar as a Client-side Honeyfarm Active Honeypots w/ Vulnerable Client-side Software Web Browsers (e.g., IE, Firefox, …) Clients (e.g., Outlook, …) [ HoneyMonkey, NDSS’06] PlanetLab (310 sites) 288 malicious sites / 2 zero-day exploits
Upon Clicking a malicious URL Result: 22 unwanted programs are installed without user’s consent! MS MS MS * {CURSOR: url(" try{ document.write('<object data=`ms-its: mhtml:file: //C:\fo'+'o.mht!'+' 'm::/targ'+'et.htm` type=`text/x-scriptlet`> '); }catch(e){} A Real Incident: Exploitation of Client-side Vulnerability
Related Work Honeyd [ Security’04 ] iSink[ RAID’04 ] IMS[ NDSS’05 ] honeyclient [ RECON’05 ] Domino [ NDSS’04 ] NetBait[‘ 03 ] Potemkin [ SOSP’05 ] GQ[’06] Collapsar [ Security’04, JPDC’06 ] High-Interaction w/ Real Services Off-Site Attack Occurrences Aggregation of Scattered Unused Address Space Passive & Active Honeypots Passive Active Passive & Active
Part II: Malware Playground Malware Trap Behavioral Footprinting Contamination Tracking Front-End: Collapsar Malware Playground Back-End: vGround* System Randomization Collapsar: Security’04, NDSS’06, JPDC’06 vGround: RAID’05 Coloring: ICDCS’06
Challenges Fidelity Real worms Confinement Destructive worms Scalability Epidemic propagation pattern Experimental Efficiency
A Virtualization-Based Worm Playground paris.cs.purdue.edu High Fidelity VM: Full-System Virtualization Strict Confinement VN: Link-Layer Network Virtualization Easy Deployment Locally deployable Efficient Experiments Images generation time: 60 seconds Boot-strap time: 90 seconds Tear-down time: 10 seconds A Worm Playground Virtualization In “Fighting Computer Virus Attacks”, Peter Szor, USENIX Security Symp., 2004
Challenge in Achieving Scalability Three Main Techniques: VM Footprint Minimization Redhat 9.0: 1G 32M Delta Virtualization (a.k.a., Copy-on-Write) Worm-driven vGround Runtime Expansion virtual nodes in 10 physical machines
Worm Expert’s Comments on vGround
vGround Impact & Applications Evaluation Correctness of documented worm/malware analysis Effectiveness of defense mechanisms Education Potentials
Part III: Malware Defense Malware Trap Behavioral Footprinting Contamination Tracking Front-End: Collapsar Malware Playground Back-End: vGround System Randomization Internal Contamination Collapsar: Security’04, NDSS’06, JPDC’06 vGround: RAID’05 Coloring: ICDCS’06
Malware Forensics For each malware incident, it is desirable to find out: Break-in Point: How did the malware break into the system? Contaminations: What did the malware do after the break-in?
Current Approach httpd /bin/sh wget Root kit Local files Alert httpd netcat /etc/shado w Confidential Info /etc/shado w Confidential Info Question 1: How did the malware break into the system? Question 1: How did the malware break into the system? Question 2: What did the malware do after break-in? Question 2: What did the malware do after break-in?
httpd /bin/sh wget Root kit Local files httpd netcat /etc/shado w Confidential Info /etc/shado w Confidential Info “httpd” READS an incoming request “httpd” CREATES a new process “/bin/sh” “/bin/sh” CREATES a new process “netcat” “netcat” READS “/etc/shadow” file “/bin/sh” MODIFIES local files “/bin/sh” CREATES a new process “wget” “wget” CREATES local file(s) - “Root kit” Current Approach Log 1: Online Log Collection Alert
1: Online Log Collection httpd /bin/sh wget Root kit Alert Backward Tracking Current Approach Log 2: Offline Backward Tracking “wget” CREATES local file(s) - “Root kit” “httpd” CREATES a new process “/bin/sh” “/bin/sh” CREATES a new process “wget” Break-in Point ! [King+, SOSP’03]
1: Online Log Collection httpd /bin/sh wget Root kit Local files Alert netcat /etc/shado w Confidential Info /etc/shado w Confidential Info Current Approach Log 2: Offline Backward Tracking 3: Offline Forward Tracking Forward Tracking “httpd” CREATES a new process “/bin/sh” “/bin/sh” CREATES a new process “netcat” “netcat” READS “/etc/shadow” file “/bin/sh” CREATES a new process “wget” “wget” CREATES local file(s) - “Root kit” Break-in Point ! “/bin/sh” MODIFIES local files
Weaknesses of Current Approach Backward Tracking Break-in Point Inputs: Detection point and the entire Log Forward Tracking Contaminations Inputs: Break-in point and the entire Log time Intrusion Detected Intrusion Occurred Long Detection Period Analyze the entire log ! High Volume Log Data: 1.2 gigabytes per day under server workload
Log A suspicious log entry Main Idea: Information Flow-Preserving Logging Apache Sendmail DNS MySQL Our Approach - Process Coloring
httpd Our Approach - Process Coloring s80httpdrcinit s45named s30sendmail s55sshd s80httpd s30sendmail s45named s55sshd /bin/sh wget Root kit Local files Alert netcat /etc/shado w Confidential Info /etc/shado w Confidential Info 1: Initial Coloring 2: Coloring Diffusion Log Benefit 2: Color-based log partition for contamination analysis Benefit 2: Color-based log partition for contamination analysis Benefit 1: Immediate identification of break-in point Benefit 1: Immediate identification of break-in point
Color Diffusion Model Color Diffusion Model OS-level Information Flow (Buchholz 2005) OperationDiffusion syscalls CREATE create color(o 1 ) = color(s 1 ) color(s 2 ) = color(s 1 ) create, mkdir, link fork, vfork, clone READ read color(s 1 ) = color(s 1 ) υ color(o 1 ) color(s 1 ) = color(s 1 ) υ color(s 2 ) read, readv, recv ptrace WRITE write color(o 1 ) = color(s 1 ) υ color(o 1 ) color(s 2 ) = color(s 1 ) υ color(s 2 ) write, writev, send Ptrace, wait, signal ---- DESTROY destroy unlink, rmdir, close exit, kill
... BLUE: 673["sendmail"]: 5_open("/proc/loadavg", 0, 438) = 5 BLUE: 673["sendmail"]: 192_mmap2(0, 4096, 3, 34, , 0) = BLUE: 673["sendmail"]: 3_read(5, " ", 4096) = 25 BLUE: 673["sendmail"]: 6_close(5) = 0 BLUE: 673["sendmail"]: 91_munmap( , 4096) = 0... RED: 2568["httpd"]: 102_accept(16, sockaddr{2, cbbdff3a}, cbbdff38) = 5 RED: 2568["httpd"]: 3_read(5, "\1281\1\0\2\0\24...", 11) = 11 RED: 2568["httpd"]: 3_read(5, "\7\0À\5\0\128\3\...", 40) = 40 RED: 2568["httpd"]: 4_write(5, 1090) = 1090 … RED: 2568["httpd"]: 4_write(5, "\128\19Ê\136\18\...", 21) = 21 RED: 2568["httpd"]: 63_dup2(5, 2) = 2 RED: 2568["httpd"]: 63_dup2(5, 1) = 1 RED: 2568["httpd"]: 63_dup2(5, 0) = 0 RED: 2568["httpd"]: 11_execve("/bin//sh", bffff4e8, ) RED: 2568["sh"]: 5_open("/etc/ld.so.prelo...", 0, 8) = −2 RED: 2568["sh"]: 5_open("/etc/ld.so.cache", 0, 0) = 6 Process Coloring Log – Slapper Worm
Evaluation LionSlapperSARS Time period being analyzed 24 hours # worm- related entries 66,504195,88419,494 Exploited Service BIND (CVE ) Apache (CAN ) Samba (CAN ) % of Log Inspected 48.7%65.9%12.1% Benefit for Backward Tracking: Immediate identification of break-in point Benefit for Backward Tracking: Immediate identification of break-in point Benefit for Forward Tracking: Reduced log volume for contamination analysis Benefit for Forward Tracking: Reduced log volume for contamination analysis
Question : Can we trust a compromised system to collect log information? Question : Can we trust a compromised system to collect log information? Challenge in Log Collection OS Kernel User Process 1 User Process 2 Logging System Call Interception
OS Kernel User Process 1 Host OS Kernel + VMM ptraceptrace User Process 2 Logging Virtual Machine Guest OS Kernel/UML Interception on system virtualization path Virtual Machine Introspection [ Garfinkel+, NDSS’03 ] More tamper-resistant
On-going Work Multi-Dimensional Worm Profiling & Identification Content Fingerprinting Unique recurring content Behavioral Footprinting Unique recurring behavior Infection Cycle Probing Exploitation Replication Payload
MSBlaster/Windows Worm BlasterTarget/RPC Exploits target on port 135/TCP 2. Binds svchost.exe to port 4444/TCP via injected code 3. Connects to target on port 4444/TCP 4. Creates a shell “cmd.exe” and binds it to port 4444/TCP 5. Creates “TFTP Server” on port 69/UDP 6. Sends “TFTP” command to shell 7. Runs TFTP command; “teleports” msblast.exe file 8. Sends “START msblast.exe” command 9. Runs worm on target! 10. Closes connection >tftp –I GET msblast.exe 11. Shell closes alert ip $EXTERNAL_NET any -> $HOME_NET 135 (msg:"RPC DCOM exploit/ Blaster Worm Attack"; content:"| b d6 93 CD C2 94 EA 64 F0 21 8F A F2 EC 8C B CF 2E 39 0B |"; …)
Worm NameInfection VectorBehavioral Footprints MSBlasterRPC-DOM alert ip $EXTERNAL_NET any -> $HOME_NET 135 (msg:"RPC DCOM exploit/ Blaster Worm Attack"; content:"| b d6 93 CD C2 94 EA 64 F0 21 8F A F2 EC 8C B CF 2E 39 0B |"; …) Exploitation Replication
Worm NameInfection VectorBehavioral Footprints MSBlaster Welchia Sasser Ramen Lion Slapper SARS RPC-DOM LSASS LPRng WU-FTPD NFS-UTILS BIND APACHE SAMBA
Summary Domain B Domain A Domain C Front-End Redirector vGround II vGround I Collapsar Design and evaluation of advanced malware defense mechanisms using our unique integrated malware research platform
Thank you. For more information: URL:
Backup Slides
Another Example Incident: Windows XP Server-side Honeypot/VMware Vulnerability RPC DCOM vulnerability (Microsoft Security Bulletin MS03-026) Time-line Deployed: 22:10:00pm, 11/26/03 MSBlast: 00:36:47am, 11/27/03 Enbiei: 01:48:57am, 11/27/03 Nachi: 07:03:55am, 11/27/03
Host OS / VMM vGround: Network Virtualization Host OS / VMM Virtual Machine 1Virtual Machine 2 Virtual Switch 1 IP-IP Option 1: Network-Layer Virtualization (e.g., X-Bone) Option 2: Link-Layer Virtualization (e.g., VIOLIN) Guest OS
Logging Integrity -- Existing Approach User Space Kernel Space fork(“/bin/sh”) System Call Dispatcher System Call Table 2 fork restart exit sys_restart_syscall sys_exit sys_fork read write ni_syscall sys_read sys_write sys_ni_syscall result log_restart_syscall log_exit log_fork log_read log_write log_ni_syscall System call interception Unreliable!
Virtual Machine Introspection [ Garfinkel+, NDSS’03 ] Interception at System Virtualization Path Virtual Machine Monitor (VMM) Guest OS 1Guest OS 2 Hardware Type 1 VMM Virtual Machine Monitor (VMM) Guest OS 1Guest OS 2 Hardware Host OS Type 2 VMM Guest OS 2 Logging Tamper- Resistant!
Process Coloring -- Slapper Worm inet_sock(80) 2568: httpd 2568(execve): /bin//sh 2568(execve): /bin/bash -i 2586: /bin/rm –rf /tmp/.bugtraq.c 2587: /bin/cat /tmp/.uubugtraq/tmp/.bugtraq.c fd 5 recv execve fork, execve open, dup2, writeunlink accept dup2, read
Process Coloring Log – Slapper Worm inet_sock(80) 2568: httpd 2568(execve): /bin//sh 2568(execve): /bin/bash -i 2586: /bin/rm –rf /tmp/.bugtraq.c 2587: /bin/cat /tmp/.uubugtraq/tmp/.bugtraq.c fd 5 recv execve fork, execve open, dup2, writeunlink accept dup2, read
Counter-attacks against Proc. Coloring Coloring mixing attack Good news: an important anomaly itself Bad news: need for advanced filtering policies Low-level attack Kernel integrity (e.g. CoPilot, Livewire, Pioneer) Shadow structure via VMM Diffusion-cutting attack Covert channels
Footprinting Representation 1st TCP handshake 135/TCP 2nd TCP handshake 4444/TCP (shell) MSBlaster Worm 69/UDP (tftp) RST Sending “tftp …” RST alert ip $EXTERNAL_NET any -> $HOME_NET 135 (msg:"RPC DCOM exploit/ Blaster Worm Attack"; content:"| b d6 93 CD C2 94 EA 64 F0 21 8F A F2 EC 8C B CF 2E 39 0B |"; …)