John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab2
Introduction A framework that identifies malicious queries from massive search engine logs to uncover their relationship with potential attacks. Use a small set of malicious queries as seed, and generates regular expressions for detecting new malicious queries. Advanced Defense Lab3
Introduction Two stage: Identification Investigation SearchAudit identifies malicious queries. Analyzing those queries and the attacks of which they are part. Advanced Defense Lab4
Introduction Enhanced detection capability 400 becomes 4 million. Low false-positive rates. 2% Ability to detect new attacks Forum spaming Facilitation of attack analysis Analyze a series of phishing attacks that lasted for more than one year. Advanced Defense Lab5
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab6
Related Work Advanced Defense Lab7 There’s a significant amount of automated Web traffic on the Internet. Another research showed that more than 3% of the entire search traffic may be generated by stealthy search bots. What’s the motivation of those search bots? Search engine competitors Studying search quality Click fraud for monetary gain Spreading infection (MyDoom, Santy) Identifying victims
Related Work Advanced Defense Lab8 Using regular expression patterns Hon-eycomb Polygraph Hamsa AutoRE (A way to generate RE from another research)
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab9
Architecture Let attackers be our guides Follow their activities and predict their future attacks. Advanced Defense Lab10
Architecture Platform Dryad/DryadLINQ Query Expansion Taking a small set of seed queries and expand them Extract IPs and search again Regular Expression Generation Signature Generation (AutoRE)AutoRE Eliminating Redundancies Eliminating Proxies Advanced Defense Lab11
Arch. – Eliminating Redundancies Advanced Defense Lab12 Algorithm REGEX_CONSOLIDATE
Architecture – Eliminating Proxies Advanced Defense Lab13 Most users in a geographical region have similar query patterns. Mostly legitimate users’ queries will have a large overlap with the popular queries from the same /16 IP prefix. We label an IP as a proxy if K most popular queries from that IP and the K most popular queries from that prefix overlap in m queries. K = 100, m = 5
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab14
Data Description and Sys Setup Use 3 months of search logs from the Bing search engine.Bing search February 2009 (when it was known as Live Search) December 2009 January 2010 Each month of sampled data contains around 2 billion pageviews. The seed 500 malicious queries are obtained from a hacker Web site milw0rm.commilw0rm.com Takes about 7 hours to process the 1.2 TB of sampled data. Advanced Defense Lab15
Selection of RE Use Cookies to identify the malicious queries. Benign proxy are eliminated. Use a threshold to pick regular expressions based on their scores. Advanced Defense Lab16
Detection Results: Effect of Query Expansion and Regular Expression Matching Feed the 500 malicious queries into SearchAudit, we find that 122 of the 500 queries appear in the dataset. February 2009 dataset 174 IPs issued these queries Use the result to feed our system again 800 unique queries from 264 IPs Advanced Defense Lab17
Detection Results Advanced Defense Lab18
Effect of Incomplete Seeds Split the 122 seed queries into two sets 100 queries that were first posted on milw0rm.com before queries were posted in 2009 Advanced Defense Lab19
Looping Back Seed Queries Use derived RE as new seeds to feed back as an input to SearchAudit. Advanced Defense Lab20
Overall Matching Statistics Advanced Defense Lab21
Verification of Malicious Queries As we lack ground truth information about whether a query is malicious or not. Check whether the query is reported on any hacker Web sites Check query behavior whether the query matches individual bot or botnet features For each query q returned by SearchAudit Issue a query “q AND (dork OR vulnerability)” to search engine, and save the results. Advanced Defense Lab22
Verification of Queries Generated by Individual Bots Two features help us to distinguish bot queries from human queries Cookie: Most bot queries do not enable cookies, resulting in an empty cookie field. Normal users who do not clear their cookies, all the queries carry the old cookies. Link clicked Many bots do not click any link on the result page. Instead, they scrape the results off the page. Advanced Defense Lab23
Verification of Queries Generated by Individual Bots Advanced Defense Lab24
Verification of Queries Generated by Botnets If most of the IPs that issued malicious queries exhibit similar behavior, then it’s likely that all these IPs were running the same script. User agent Contains information about the browser and the version used Metadata Records certain metadata that comes with the request Pages per query Records the number of search result pages retrieved per query Inter-query interval Denotes the time between queries issued by the same IP Advanced Defense Lab25
Verification of Queries Generated by Botnets Advanced Defense Lab26
Verification of Queries Generated by Botnets Advanced Defense Lab27
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab28
Analysis of Detection Results Large countries such as USA, Russia, and China are responsible for almost half the IPs issuing malicious queries. Vulnerable Web Sites Try to exploit these web sites by SQL injection index.php?content=[ˆ?=#+;&:]{1,10} Try to find particular software with known vulnerabilities “Power by” Forum spamming “/includes/joomla.php” site:.[a-zA-Z]{2,3} Windows Live Messenger phishing Advanced Defense Lab29
Analysis of Detection Results Advanced Defense Lab30
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab31
Identifying Vulnerable Web Sites Applications of Vulnerability Searches Sample 5000 queries returned by SearchAudit. For every query q we issue a query “q –dork –vulnerability”. Obtain 80,490 URLs from 39,475 unique Web sites. Compare this list of random Web sites against a list of known phishing or malware sites. PhishTank Microsoft Test and show that many of these sites indeed have SQL injection vulnerabilities. Advanced Defense Lab32
Identifying Vulnerable Web Sites Advanced Defense Lab33
SQL Injection Vulnerabilities For the malicious queries, we look at the search results and crawl all of the links twice. First time, we crawl the link as is Second time, we add a single quote (‘) If the two pages are identical, then it suggests that there’s no obvious SQL injection vulnerability If the second page have any kind of SQL error, then there might exists an SQL injection vulnerability In 14,500 URLs, we find 1,760 URLs (12%) may have SQL injection vulnerability. Advanced Defense Lab34
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab35
Forum-Spamming Attacks We manually identified 46 REs that are associated with forum spamming. Advanced Defense Lab36
Advanced Defense Lab37
Forum-Spamming Attacks Advanced Defense Lab38
Apps of Forum Searching Queries Using Project Hony Pot to identify Web spammingProject Hony Pot Advanced Defense Lab39
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab40
Windows Live MSN Phishing What is a MSN Phishing ? / ?user=[a-zA-Z0-9._]* Advanced Defense Lab41
Windows Live MSN Phishing Advanced Defense Lab42
Characteristics of Compromised Accounts Advanced Defense Lab43
Outline Introduction Related Work Architecture Implementation – Stage 1 Implementation – Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab44
Conclusion Advanced Defense Lab45