Download presentation
Presentation is loading. Please wait.
Published byDonna Hancock Modified over 9 years ago
1
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu
2
A worm is malicious code that propagates over a network, with or without human assistance worm authors are looking for new ways to acquire vulnerable targets search worms propagates automatically by copying itself to target systems search worms can severely harm search engines worms send carefully crafted queries to search engines which evade identification mechanisms that assume random scanning 2
3
Search worms generate search queries, analyze search results and infects identified targets return as many unique targets as possible using a list of prepared queries search for popular domains to extract email addresses prune search results, remove duplicates, ignore URLs that belong to the search engine itself exploit identified targets, reformat URLs to include the exploit and bootstrapping code 3
4
MyDoom.O, a type of search worm requires human intervention to spread 4 spreads via email containing an executable file as an attachment searches local hard drive for email addresses figure below shows the number of infected hosts and the number of MyDoom.O queries that Google received per second Peak scan rate, more than 30,000 queries per second.
5
Santy is the first search worm to propagate automatically, without any human intervention 5 written in Perl, exploits a bug in phpBB bulletin board system after injecting arbitrary code into Web server running phpBB, uses google to search for more targets and connects infected machine to an IRC botnet graph below shows a time-line of infected IP addresses for three different Santy variants in December 2004 each variant manages to infect about four thousand different IP addresses.
6
Graphical description of the dependencies between different Santy variants using a honeypot 6 shows the dependency between Santy variants from August 2005 to May 2006 each node is labelled by the filename downloaded to the infected host, two nodes are connected with an edge if their line difference computed via diff is minimal in respect to all other variants this graph shows that some variants of Santy have been continuously modified for over six months
7
architecture of the worm mitigation system is split into three phases: 7 Anomaly identification step Signature generation step Index based filtering
8
Identifying abnormal traffic automatically blocks parts of the worm traffic after observing IP addresses 8 classify the IP addresses responsible for abnormal traffic maintaining a map of frequent words which are used to compute the compound probability for a query flag an IP address abnormal which sends too many low probability queries
9
signature generation step generates signatures based on Polygraph 9 extracts tokens from bad queries to create signatures matching the bad traffic hierarchical clustering is used to merge signatures until a predefined false positive threshold is reached false positives are computed by matching signatures against a good query set. following signature was generated in an experiment token extraction on a cluster of 85 2.4 GHz Intel Xeon machines GET /search\?q=.*\+-modules&num=[0-9][0-9]+&start=
10
Index-based filtering modifies search index to handle multiple search queries mapping to similar result pages 10 search worm relies on a search engine to obtain a list of potentially vulnerable targets. If the search engine does not provide any vulnerable targets in the search results, the worm fails to spread tag all pages that seem to contain vulnerable information while crawling query results are not returned if they have pages from many hosts and when majority of them are tagged as vulnerable
11
Conclusion 11 worms spread by querying a search engine for new targets to infect and uses the information collected by search engines signature generation along with anomaly identification is not effective in preventing a worm from spreading proposed solution is CPU efficient and is query independent as well as classifies web pages as vulnerable if they belong to an exploitable server or contain potential infection targets
12
Pros and Cons Pros query independent index- based filtering using word based features(tokenization), Phishing URLs contain several suggestive word tokens. Cons signature-based approach is a good option if given good seed queries cannot find new attacks for which we have no prior knowledge lacks a module which could analyze malicious pages to automatically extract the searches which in turn can help in finding vulnerable targets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.