All Your iFRAMEs Point to Us Cheng Wei
Acknowledgement This presentation is extended and modified from The presentation by Bruno Virlet All Your iFRAMEs Point to Us The presentation by YouZhi Bao
Motivation Generally improve safety of web browsing Report owners of malicious nets to authorities Study distribution of malicious sites Study relationship between user browsing habits and exposure to malware. Study Malware Distribution Network
Introduction What is a drive-by download and why should we even care? Malware delivery: – Social engineering(attackers use various social engineering techiques to entice visitors of a website to download and run malware.) – Browser vulnerabilities(automatically download and run),luer users to connect to malicious servers.
Injection techniques Adversaries use a number of techniques to inject content under their control into benign websites.(adversaries exploit web servers via vulnerable scripting applications,use invisible HTML components(IFrames) to hide the injected content.) Advertisements(Adversary inject content into) -Particularly dangerous as they target popular sites -75% of malicious landing sites delivered malware via ads Another common is to use websites that allow users to contribute their own content.Such as, use of forum, blogs or advertisements to inject exploit URL (we focus on this)
What is the characteristic of malicious sites?
Infrastructure and Methodology Our primary goal is to identify malicious web sites and help improve the safety of Internet. Useful terms: Malicious URL: denote URLs that initiates drive-by download when users visit them Landing site: group of URLs according to top level domain names,we refer to the resulting set as the landing sites. Distribution site: host of the malicious payload (loaded via an IFRAME or a script from a remote site)
Preprocessing phase our goal is to inspect URLs from this repository and identify the ones that trigger drive-by downloads. Web repository maintained by Google (exhausive inspection of each URL in repository is expensive due to large number of URLs in the repository,so we use light-weight techiques to extract URLs that are likey malicious then subject them to verification phase) For each website extract: – Out of place iFrames – Obfuscated JavaScript – iFrames to known distribution sites Pages that proceed to more expensive verification process: – Those labeled as suspicious from the above procedure (1 million / day) – Random selection of several hundred thousands URLS – URL reported to
Pre-processing Phase – Extract several features and translate them into a likelihood score using machine learning framework Map-reduce 5-fold cross-validation These URLs are randomly sampled from popular URLs as well as from the global index. We also process URLs reported by users. 1 billion -> 1 million
Preprocessing phase
Verification Process this phrase aims to verify whether a candidate URL from pre- processing phase is malicious. – Equipment: a large scale web-honeynet runs Microsoft Windows images in virtual machine. – Method: Execution based heuristics &results from Anti- virus engine(to detect malicious URL) – for each visited URL,we run VM for 2 minutes and monitor system behavior for abnormal state changes Heuristics score: the number of create process; the number of observed registry changes; the number of file system changes Met threshold: suspicious
Constructing the Malware Distribution Network Malware distribution network=> set of malware delivery trees from the landing site (leafs & nodes) to the distribution site (root) Used the ‘Referer’ header from requests( To construct the delivery tree,we extract edges that connecting these nodes by inspecting the Referer header from Http requests.) – A set of malware delivery trees, which consists of landing sites(leaf), hop points and distribution site(root) – REFER headers in HTTP request
Constructing the Malware Distribution Network
Prevalence of drive-by downloads 1.3% of the overall incoming search queries in Google returns at least one malicious result based on data collected over a period of 10 months From the top 1 million URLs appearing in Google search engine results, about 6,000 belong to sites that are verified as malicious (the most popular landing page had rank of 1.588)
4 Prevalence of Drive-by Downloads Jan Oct in top 1 million, uniformly distributed
Geographic locality of web based malware Above founding provide Evidence of poor security practices from administrators (running outdated and/or unpatched versions of web server software) Correlation between distribution site and landing site,we see that the malware distribution networks are highly localized within common geographical boundaries.
Malware Distribution Infrastructure 45% of the detected malware distribution sites used only a single landing site at a time. 70% of the malware distribution sites have IP addresses within 58.* * and 209.* * network ranges.
Impact of browsing habits DMOZ: knowledge base(measure prevalence of malicious websites across different website functional categories for about 50% of URLs) Random selection of 7.2 million URLs mapped to corresponding DMOZ category
Detecting malicious s
Malicious content Injection: Drive-by Downloads via Ads Majority of web advertisements are distributed in the form of third party content to the advertising web site. A web page is only as secure as its weakest component! Insecure Ad content posses risk(even if the web page itself does not contain any exploits,insecure Ad content poses a risk to advertising web sites) Frequent fact: – An advertiser sells advertising space => to another advertising company => who sells the advertising space to and other company and so it goes… Somewhere along the chain something can go wrong
Related Work This paper differs from all of these works in that it offers a far more comprehensive analysis of the different aspects of the problem posed by web-based malware, including an examination of its prevalence, the structure of the distribution networks, and the major driving forces.
Conclusion Our study uses a large scale of data collectiion infrastructure that continuously detects and monitors the behavior of websites that perpetrate drive-by downloads. our analysis reveals several forms of relations between some distribution sites and networks. we show that merely avoiding the dark corners of the Internet does not limit exposure to malware(even the anti-virus engines are lacking in their ability to protect against drive-by downloads)
Thank you Questions ?