Rozzle De-Cloaking Internet Malware Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research.

Slides:



Advertisements
Similar presentations
Cristian Cadar, Peter Boonstoppel, Dawson Engler RWset: Attacking Path Explosion in Constraint-Based Test Generation TACAS 2008, Budapest, Hungary ETAPS.
Advertisements

Cookies, Sessions. Server Side Includes You can insert the content of one file into another file before the server executes it, with the require() function.
Introducing JavaScript
Rozzle De-Cloaking Internet Malware Presenter: Yinzhi Cao Slides by Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft.
Java Script Session1 INTRODUCTION.
GATEKEEPER MOSTLY STATIC ENFORCEMENT OF SECURITY AND RELIABILITY PROPERTIES FOR JAVASCRIPT CODE Salvatore Guarnieri & Benjamin Livshits Presented by Michael.
Nozzle: A Defense Against Heap-spraying Code Injection Attacks
Nozzle: A Defense Against Heap Spraying Attacks Ben Livshits Paruj Ratanaworabhan Ben Zorn.
Fast and Precise In-Browser JavaScript Malware Detection
Finding Malware on a Web Scale Ben Livshits Microsoft Research Redmond, WA.
By Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna Network and Distributed System Security(NDSS ‘07)
JShield: Towards Real-time and Vulnerability-based Detection of Polluted Drive-by Download Attacks Yinzhi Cao*, Xiang Pan**, Yan Chen** and Jianwei Zhuge***
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
Nozzle: A Defense Against Heap-spraying Code Injection Attacks Paruj Ratanaworabhan, Cornell University Ben Livshits and Ben Zorn, Microsoft Research (Redmond,
Tutorial 10 Programming with JavaScript
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
Computer Security and Penetration Testing
Automatic Creation of SQL Injection and Cross-Site Scripting Attacks 2nd-order XSS attacks 1st-order XSS attacks SQLI attacks Adam Kiezun, Philip J. Guo,
 What I hate about you things people often do that hurt their Web site’s chances with search engines.
Norman SecureSurf Protect your users when surfing the Internet.
JShield: Towards Real-time and Vulnerability-based Detection of Polluted Drive-by Download Attacks Yinzhi Cao*, Xiang Pan**, Yan Chen** and Jianwei Zhuge***
FLOWFOX A WEB BROWSER WITH FLEXIBLE AND PRECISE INFORMATION CONTROL.
Jarhead Analysis and Detection of Malicious Java Applets Johannes Schlumberger, Christopher Kruegel, Giovanni Vigna University of California Annual Computer.
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
Charles Curtsinger UMass at Amherst Benjamin Livshits and Benjamin Zorm Microsoft Research Christian Seifert Microsoft 20 th USENIX Security Symposium.
Dynamic Web Pages (Flash, JavaScript)
VEX: VETTING BROWSER EXTENSIONS FOR SECURITY VULNERABILITIES XIANG PAN.
Finding Malware on a Web Scale
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
Created by, Author Name, School Name—State FLUENCY WITH INFORMATION TECNOLOGY Skills, Concepts, and Capabilities.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Client Scripting1 Internet Systems Design. Client Scripting2 n “A scripting language is a programming language that is used to manipulate, customize,
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Finding Malware on a Web Scale
CMPS 211 JavaScript Topic 1 JavaScript Syntax. 2Outline Goals and Objectives Goals and Objectives Chapter Headlines Chapter Headlines Introduction Introduction.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Tutorial 10 Programming with JavaScript. XP Objectives Learn the history of JavaScript Create a script element Understand basic JavaScript syntax Write.
Tutorial 10 Programming with JavaScript
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
AjaxScope & Doloto: Towards Optimizing Client-side Web 2.0 App Performance Ben Livshits Microsoft Research (joint work with Emre.
Defending Browsers against Drive-by Downloads:Mitigating Heap-Spraying Code Injection Attacks Authors:Manuel Egele, Peter Wurzinger, Christopher Kruegel,
Detecting Targeted Attacks Using Shadow Honeypots Authors: K.G. Anagnostakis, S. Sidiroglou, P. Akritidis, K. Xinidis, E. Markatos, A.D. Keromytis Published:
Chapter 6 Server-side Programming: Java Servlets
Client-side processing in JavaScript.... JavaScript history Motivations –lack of “dynamic content” on web pages animations etc user-customised displays.
XP Tutorial 10New Perspectives on HTML and XHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial.
DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS AUTHORS: K. G. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, A. D. Keromytisz.
Web Security Lesson Summary ●Overview of Web and security vulnerabilities ●Cross Site Scripting ●Cross Site Request Forgery ●SQL Injection.
1 Isolating Web Programs in Modern Browser Architectures CS6204: Cloud Environment Spring 2011.
Zozzle: Low-overhead Mostly Static JavaScript Malware Detection.
© 2011 IBM Corporation Hybrid Analysis for JavaScript Security Assessment Omer Tripp Omri Weisman Salvatore Guarnieri IBM Software Group Sep 2011.
Tutorial 10 Programming with JavaScript. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Learn the history of JavaScript.
Tutorial 10 Programming with JavaScript. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Learn the history of JavaScript.
Perfecto Mobile Automation
Introduction to JavaScript MIS 3502, Spring 2016 Jeremy Shafer Department of MIS Fox School of Business Temple University 2/2/2016.
SpyProxy SpyProxy Execution-based Detection of MaliciousWeb Content Execution-based Detection of MaliciousWeb Content Hongjin, Lee.
JavaScript and AJAX 2nd Edition Tutorial 1 Programming with JavaScript.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
By: Chuqing He. Android Overview - Purchased by Google in First Android Phone was sold in Oct Linux-based - Holds 75% of the worldwide.
Powerpoint presentation on Drive-by download attack -By Yogita Goyal.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Applications Active Web Documents Active Web Documents.
Application Communities
CMSC 345 Defensive Programming Practices from Software Engineering 6th Edition by Ian Sommerville.
Introduction to Dynamic Web Programming
Detecting Targeted Attacks Using Shadow Honeypots
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
HYPERTEXT PREPROCESSOR BY : UMA KAKKAR
Exploring DOM-Based Cross Site Attacks
Presentation transcript:

Rozzle De-Cloaking Internet Malware Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research

Static – Dynamic Analysis Spectrum 2 Entirely staticEntirely runtime + High coverage - Low precision - May not scale + High coverage - Low precision - May not scale + High precision - High overhead - Low coverage + High precision - High overhead - Low coverage Symbolic execution DART, SAGE, KLEE + High precision + Scales reasonably well ? High coverage DART, SAGE, KLEE + High precision + Scales reasonably well ? High coverage Multi-execution + High precision + High scalability + High coverage - Watch out for resource usage + High precision + High scalability + High coverage - Watch out for resource usage

Blacklisting Malware in Search Results 3

Motivation 4 Haha, I cannot belive this guy actually does this!! LOL

Drive-by Malware Detection Landscape 5 runtime static online (browser-based) offline (honey-monkey) Nozzle [Usenix Security ’09] Zozzle [Usenix Security ’11] Instrumented browser Looks for heap sprays Moderately high overhead Instrumented browser Looks for heap sprays Moderately high overhead Mostly static detection Low overhead, high reach Can be deployed in browser Mostly static detection Low overhead, high reach Can be deployed in browser

6 Search Engine Crawling

detect crawler Server side 7 Malware Cloaking Source IP Request (User- Agent, Browser ID) detect vulnerable target Client side Fingerprint browser & plugin versions Do this using JavaScript if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); }

Client-side Cloaking Defense Traditional Rozzle Single browser, one visit Appear as vulnerable as possible Single browser, one visit Appear as vulnerable as possible 8

Background & Motivation: Cloaking Detecting Internet Malware Rozzle: Fighting Evasion Experiments Overview

Detecting Internet Malware 10 Dynamic Detection Nozzle Dynamic Detection Nozzle Static Detection Zozzle Static Detection Zozzle Zozzle: Low-overhead Mostly Static JavaScript Malware Detection [Usenix Security 2011] Bayesian classification of hierarchical features of the JavaScript abstract syntax tree. In the browser (after unpacking) Zozzle: Low-overhead Mostly Static JavaScript Malware Detection [Usenix Security 2011] Bayesian classification of hierarchical features of the JavaScript abstract syntax tree. In the browser (after unpacking)

Nozzle: Runtime Heap Spraying Detection 11 Normalized attack surface (NAS) good bad

Object Surface Area Calculation Each block starts with its own size as weight Weights are propagated forward with flow Invalid blocks don’t propagate Iterate until a fixpoint is reached Compute block with highest weight 12 An example object from visiting google.com

// Shellcode var shellcode=unescape(‘%u9090%u9090%u9090%u9090%uceba%u11fa%u291f%ub1c9%udb33 […]′); bigblock=unescape(“%u0D0D%u0D0D”); headersize=20;shellcodesize=headersize+shellcode.length; while(bigblock.length<shellcodesize){bigblock+=bigblock;} heapshell=bigblock.substring(0,shellcodesize); nopsled=bigblock.substring(0,bigblock.length-shellcodesize); while(nopsled.length+shellcodesize<0×25000){nopsled=nopsled+nopsled+heapshell} // Spray var spray=new Array(); for(i=0;i<500;i++){spray[i]=nopsled+shellcode;} // Trigger function trigger(){ var varbdy = document.createElement(‘body’); varbdy.addBehavior(‘#default#userData’); document.appendChild(varbdy); try { for (iter=0; iter<10; iter++) { varbdy.setAttribute(‘s’,window); } } catch(e){ } window.status+=”; } document.getElementById(‘butid’).onclick(); shellcode unescape bigblock unescape %u0D0D%u0D0D shellcodesize shellcode.length bigblock.substring shellcodesize nopsled bigblock.substring bigblock.length shellcodesize nopsled.length shellcodesize nopsled nopsled nopsled heapshell spray spray nopsled shellcode varbdy.addBehavior #default#userData document.appendChild varbdy varbdy.setAttribute butid Zozzle: Static/Statistical Detection 13

Naïve Bayes Classification 14 * P(malicious) eval(""+O( )+O( )+O( )+O(648464)+O ( )+O( )+O( )+O( )+O( )+O( )+O( )+O( )+O( )+O( )+O( )+O(644519)+O ( )+O( )+O( )+O( )+O( )+O( )+O( )+O( )+O(809214)+O( )+O(747602)+O( )+O( )+O( )+O( )+O( )+O(745897)+ O( )+O( )+O( )+O( )+O( )+O(748985)+...);

Background & Motivation: Cloaking Detecting Internet Malware Rozzle: Fighting Evasion Experiments Overview

Environment Fingerprinting Prevents Detection Nozzle Zozzle if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } 16 var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 6’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 6’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } Is this a practical problem for our malware detectors? In 7.7% of JS files, code gets a reference to environment In 1.2%, code branches on such sensitive values 89.5% of malicious JS branches on such values In 7.7% of JS files, code gets a reference to environment In 1.2%, code branches on such sensitive values 89.5% of malicious JS branches on such values

Typical Malware Cloaking 17

More Complex Fingerprinting 18 Fingerprint: Q F127J14

Avoiding Dynamic Crawlers 19

Avoiding Static Detection 20

How to Allocate Detection Resources? … … Rozzle Clearly does not scale How many resources should be allocated to filter malicious sites? What if the site simply is not malicious? What if the site simply is not malicious? 21

Execute individual branches sequentially to increase coverage Static analysis: Retain much of runtime precision Branch on environment- sensitive checks No forking No snapshotting Branch on environment- sensitive checks No forking No snapshotting Symbolic execution: re- verting to a previous state similar to running multiple browsers in parallel Rozzle Multi-path execution framework for JavaScript Multiple browser profiles on single machine What it is/does Cluster of machines: too resource consuming What it is not 22

Multi-Execution in Rozzle var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 7’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } else if (adobeVersion == ’8.0.1’) { var x=unescape(‘%u4073%u8279%u77 […]’); eval(x); } … 23

Challenges 24 Consistent updates of variables Introduce concept of Symbolic Memory: Multiple concrete values associated with one variable New JavaScript data type Symbolic 3 subtypes symbolic value / formula / conditional Weak updates for conditional assignments Introduce concept of Symbolic Memory: Multiple concrete values associated with one variable New JavaScript data type Symbolic 3 subtypes symbolic value / formula / conditional Weak updates for conditional assignments

Symbolic Memory var userAgentString=0; userAgentString = navigator.userAgent; var isIE; isIE = (userAgentString.indexOf(‘IE’)>=0); … Variable: userAgentString Value: 0 Symbolic: no Variable: userAgentString Value: 0 Symbolic: no Variable: userAgentString Value: Symbolic: yes Variable: userAgentString Value: Symbolic: yes Hooks into engine, return symbolic values for Sensitive global objects: navigator.userAgent, navigator.platform, … Sensitive functions: ScriptEngine(), allocation of ActiveXObject, … Hooks into engine, return symbolic values for Sensitive global objects: navigator.userAgent, navigator.platform, … Sensitive functions: ScriptEngine(), allocation of ActiveXObject, … 25 Variable: isIE Value: = 0 > Symbolic: yes Variable: isIE Value: = 0 > Symbolic: yes

Symbolic Memory var isIE=false; var isIE7=false; if (navigator.userAgent.indexOf(‘IE’)>=0) { isIE=true; if (navigator.userAgent.indexOf(‘IE 7’)>=0) { isIE7=true; } if (isIE7) { … Variable: isIE7 Value: false Symbolic: no Variable: isIE7 Value: false Symbolic: no Variable: isIE Value: false Symbolic: no Variable: isIE Value: false Symbolic: no Current path predicate Value: =0 > Symbolic: yes Current path predicate Value: =0 > Symbolic: yes Variable: isIE Value: =0 > ? true : false Symbolic: yes Variable: isIE Value: =0 > ? true : false Symbolic: yes Current path predicate Value: =0 > && =0 > Symbolic: yes Current path predicate Value: =0 > && =0 > Symbolic: yes Variable: isIE7 Value: Symbolic: yes Variable: isIE7 Value: Symbolic: yes 26

Symbolic Memory index Of 0 0 ‘IE’ navigator. userAgent navigator. userAgent Variable: isIE Value: =0 > ? true : false Symbolic: yes Variable: isIE Value: =0 > ? true : false Symbolic: yes ? ? >= true false 27

Challenges Consistent updates of variables Handling loops Indirect control flow: Exception handling I/O Consistent updates of variables Handling loops Indirect control flow: Exception handling Loop condition might be symbolic, number of iterations unknown! Unroll k iterations (currently k=1) Instruction pointer checks (endless loops/recursion) Loop condition might be symbolic, number of iterations unknown! Unroll k iterations (currently k=1) Instruction pointer checks (endless loops/recursion) try -blocks regularly used to test availability of plugins ( ActiveXObjects ) catch -blocks set default values, cannot be ignored Execute catch -statement similar to else branch, add virtual if -condition: “ ActiveX supported ” try -blocks regularly used to test availability of plugins ( ActiveXObjects ) catch -blocks set default values, cannot be ignored Execute catch -statement similar to else branch, add virtual if -condition: “ ActiveX supported ” Handling symbolic values when they are… —… written to the DOM —… sent to a remote server —… executed (as part of eval ) Lazy evaluation to concrete values (only when needed) Handling symbolic values when they are… —… written to the DOM —… sent to a remote server —… executed (as part of eval ) Lazy evaluation to concrete values (only when needed) 28

Experiments Offline Controlled Experiment 7x more Nozzle detections Online Similar to Bing crawling Almost 4x more Nozzle detections 10.1% more Zozzle detections Overhead 1.1% runtime overhead 1.4% memory overhead 29

+595% runtime detections Offline Exploits hosted on our server Minimize external influences 70,000 known malicious scripts (flagged by Zozzle) Fully unrolled/de-obfuscated exploits, wrapped in HTML 30

List of URLs recently crawled by Bing Pre-filtering: Increase likelihood of finding malicious sites 57,000 URLs over the last week Online Dedicated machine for crawling the web Clone of the Bing malware crawler 31 Nozzle Detections % runtime detections Zozzle Detections 225 2,

List of URLs recently crawled by Bing Pre-filtering: Increase likelihood of finding malicious sites 57,000 URLs over the last week Online Dedicated machine for crawling the web Clone of the Bing malware crawler

Overhead 500 randomly selected URLs crawled by Bing Slightly biased towards malicious sites (pre-filtering) Memory Overhead Median:0.6% 80 th Percentile:1.4% Runtime Overhead Median:0.0% 80 th Percentile:1.1% Average numbers of 3 repeated runs per configuration Base runs (cookie setup) 33

Overhead Numbers

For most sites, virtually no overhead Take Away 35 Tremendous impact on runtime detector due to increased path coverage Tremendous impact on runtime detector due to increased path coverage Visible impact on static detector Visible impact on static detector More important with growing trend to obfuscation Also improves other existing tools: Exposes detectors to additional site content

if (navigator.userAgent.toLowerCase().indexOf( "\x6D"+"\x73\x69\x65"+"\x20\x36")>0) document.write(" "); if (navigator.userAgent.toLowerCase().indexOf( "\x6D"+"\x73"+"\x69"+"\x65"+"\x20"+"\x37")>0) document.write(" "); try { var a; var aa=new ActiveXObject("Sh"+"ockw"+"av"+"e"+"Fl"+[…]); } catch(a) { } finally { if (a!="[object Error]") document.write(" "); } try { var c; var f=new ActiveXObject("O"+"\x57\x43"+"\x31\x30\x2E\x53"+[…]); } catch(c) { } finally { if (c!="[object Error]") { aacc = " "; setTimeout("document.write(aacc)", 3500); } Online … an example pulled from our DB… "\x6D"+"\x73\x69\x65 "+"\x20\x36" = "msie 6" "\x6D"+"\x73\x69\x65 "+"\x20\x36" = "msie 6" "\x6D"+"\x73"+"\x69"+"\ x65"+"\x20"+"\x37" = "msie 7" "\x6D"+"\x73"+"\x69"+"\ x65"+"\x20"+"\x37" = "msie 7" "O"+"\x57\x43"+"\x31\x30\x2E\x5 3"+"pr"+"ea"+"ds"+"he"+"et" = "OWC10.Spreadsheet" "O"+"\x57\x43"+"\x31\x30\x2E\x5 3"+"pr"+"ea"+"ds"+"he"+"et" = "OWC10.Spreadsheet" 36

Malware Roulette 37

Related Work Multipath Execution Symbolic Execution JavaScript Analysis Cloaking and Fingerprinting Multipath Execution Symbolic Execution JavaScript Analysis Godefroid et al, SAGE Saxena et al., A Symbolic Execution Framework for JavaScript Godefroid et al, SAGE Saxena et al., A Symbolic Execution Framework for JavaScript Cova et al, Detection and Analysis of Drive-by- Download Attacks and Malicious JavaScript Code Chen et al, WebPatrol: Automated Collection and Replay of Web-based Malware Scenarios Bisht et al, NoTamper: Automatic blackbox detection of parameter tampering opportunities in web app.s Cova et al, Detection and Analysis of Drive-by- Download Attacks and Malicious JavaScript Code Chen et al, WebPatrol: Automated Collection and Replay of Web-based Malware Scenarios Bisht et al, NoTamper: Automatic blackbox detection of parameter tampering opportunities in web app.s Eckersley, How Unique Is Your Web Browser? Plenty related research in binary malware analysis Eckersley, How Unique Is Your Web Browser? Plenty related research in binary malware analysis Moser et al., Exploring Multiple Execution Paths for Malware Analysis Brumley et al., MineSweeper Moser et al., Exploring Multiple Execution Paths for Malware Analysis Brumley et al., MineSweeper 38

Summary Rozzle: Multi-profile execution – Look as vulnerable as possible – Improve existing malware detectors Implementation: – Implemented on top of IE9’s JavaScript engine – Still some flaws, promising results Idea of multi-execution is promising in other contexts 39

Static – Dynamic Analysis Spectrum 40 Entirely staticEntirely runtime + High coverage - Low precision - May not scale + High coverage - Low precision - May not scale + High precision - High overhead - Low coverage + High precision - High overhead - Low coverage Symbolic execution DART, SAGE + High precision + Scales reasonably well ? High coverage DART, SAGE + High precision + Scales reasonably well ? High coverage Multi-execution + High precision + High scalability + High coverage - Watch out for resource usage + High precision + High scalability + High coverage - Watch out for resource usage