A Crawler-based Study of Spyware on the Web A.Moshchuk, T.Bragin, D.Gribble, M.Levy NDSS, 2006 * Presented by Justin Miller on 3/6/07
A Quick Joke… “I caught a little of that computer virus that’s been going around… I haven’t been myself since”
Overview vs. User visits website Web spyware infects computer Computer is unhappy
Background Spyware study Infected 80% of AOL users 93 spyware components (known) Goals Locate spyware on the internet Gather Internet spyware statistics Quantitative analysis of spyware-laden content on the web
Outline What is spyware? Crawling the web Web executables Drive-by downloads Results Improvements
Definition Spyware – software that collects personal information about users No user knowledge Spyware techniques: Log keystrokes Collect web history Scan documents on hard disk
Types of Spyware Spyware-infected executables Content-type header URL extension Drive-by downloads Malicious web content Produce event triggers
Part I: Executable files Finding executables Content-type (HTTP header) contains.exe URL contains.exe,.cab, or.msi Hidden executables Embedded file (.zip) URL hidden in JavaScript Missed executables Hidden URL on dynamic page
Part I: Executable files DL, install, run in a clean VM Tool to automate installer framework EULA agreements Radio buttons and check boxes Analyze file Ad-Aware software Log identifies spyware program
Web Crawling Heritrix public domain Web crawler Search 2,500+ web sites c|net’s download.com for DL executables Randomly selected web sites Google keyword search Depth of 3 links Find.exe hosted on separate Web servers
Changing Spyware Environment 2 separate program crawls May, October 2005 Generated list of crawling seeds Most recent anti-spyware program used October crawl detect mores vulnerabilities
Executable Results 2 separate program crawls May 2005 – 18 million URLs Oct 2005 – 22 million URLs No appreciable change in spyware One site dropped # of infected executables
Executable Results Overall spyware 3.8% in May % in Oct 2005 Individual programs 82 in May in Oct 2005
Infected Executables May 2005October 2005
Web Categories Web categories infected with spyware
Spyware Functions Spyware-infected executables Contain various spyware functions Executables may have multiple functions
Spyware Upgrades Spyware-infected executables May have multiple spyware functions 1,294 infected.exe found in Oct detected 414 variants
Blacklisting Spyware Block clients from accessing listed sites Done by firewall or proxy Blacklisting is ineffective
Part II: Drive-by Downloads Spyware from visiting a web page Javascript embedded in HTML Modifies files System/registry Render web pages with unmodified browser
Event Triggers for DB-DLs Event occurs that matches a trigger Trigger Conditions Process creation File activity (creation) Suspicious process (file modification) Registry file modified Browser/OS crash
Complex Web Content “Time Bomb” attack Speed up virtual time of guest OS JavaScript when page closes Fetch a clean URL before closing Pop-up windows Allow all to open before closing
IE Browser Configuration Security-related IE dialog boxes
Drive-by Results 3 web crawls May 2005 – 45K URLs Oct 2005 – Same URLs Oct 2005 – New URLs Decrease in infectious URLs Increase in unique spyware programs
Drive-by Results
Origin of Drive-by DLs Top 6 web categories (IE): Pirate sites Celebrity Music Adult Games Wallpaper
Spyware Top 10 May 2005October 2005
Spyware Top 10 May 2005October 2005
Spyware Trends Decline in total # of spyware programs Increase of anti-spyware tools Automated patch installations Lawsuits against spyware distributors
IE vs Firefox Security Internet Explorer v cfg_y 92 - cfg_n Firefox v cfg_y 0 - cfg_n
Drive-by Summary Performed 3 URL crawls Reduction in % of domains hosting DB-DLs Small # of domains host majority of infectious links Drive-by DLs attempted in 0.4% of URLs Drive-by attacks in 0.2% of URLs
Strengths Analysis method Studies density of spyware on the Web Produces spyware trends over time Calculated frequency of spyware on web Distinguished security prompts (y/n) Found 14% of spyware is malicious Density of spyware is substantial
Weaknesses Missed executables URL hidden in JavaScript, dynamic page Limited by what Ad-Aware is able to detect Method weakness Different anti-spyware programs (May/Oct) Did not crawl entire web Cannot relate density of spyware on the Web and the presence of threats on desktops
Improvements Test multiple browsers Additional anti-spyware programs Crawl more URLs Find geographic patterns of hosts
Questions? Ask me! Reasons to ask questions: Class discussion is 20% of your grade You can’t leave until 5:45 anyway Of the two of us, I’m probably the only one that read the entire paper (except Dr. Zou)