Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.

Similar presentations


Presentation on theme: "11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26."— Presentation transcript:

1 11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26

2 2 Reference Pawan Prakash, Manish Kumar, Ramana Rao Kompella and Minaxi Gupta, “PhishNet: Predictive Blacklisting to Detect Phishing Attacks,” in IEEE INFOCOM 2010.

3 3 Outline Introduction Two Major Components of PhishNet ◦ URL prediction component ◦ Approximate URL matching component Evaluation Conclusion

4 4 Introduction Phishing attacks ◦ Set up fake web sites mimicking real businesses in order to lure innocent users into revealing sensitive information Blacklisting ◦ Match a given URL with a list of URLs belonging to a blacklist Problem of blacklisting ◦ Malicious URLs cannot be known before a certain amount of prevalence in the wild

5 5 Two Major Components of PhishNet URL prediction component ◦ Generate new URLs (child) from known phishing URLs (parent) by employing various heuristics ◦ Test whether the new URLs generated are indeed malicious Approximate URL matching component ◦ Perform an approximate match of a new URL with the existing blacklist

6 6 Component 1: Heuristics for Generating New URLs Typical blacklist URLs structure ◦ http://domain.TLD/directory/filename?query string H1: Replacing TLDs H2: IP address equivalence H3: Directory structure similarity H4: Query string substitution H5: Brand name equivalence

7 7 Heuristics for Generating New URLs H1: Replacing TLDs ◦ 3, 210 effective top-level domains (TLDs) ◦ Replace the effective TLD of the parent URL with 3, 209 other effective TLDs H2: IP address equivalence ◦ Phishing URLs having same IP addresses are grouped together into clusters ◦ Create new URLs by considering all combinations of hostnames and pathnames

8 8 Heuristics for Generating New URLs (cont’d) H3: Directory structure similarity ◦ URLs with similar directory structure are grouped together ◦ Build new URLs by exchanging the filenames among URLs belonging to the same group ◦ Parent  www.abc.com/online/signin/paypal.htm www.xyz.com/online/signin/ebay.htm ◦ Child  www.abc.com/online/signin/ebay.htm www.xyz.com/online/signin/paypal.htm

9 9 Heuristics for Generating New URLs (cont’d) H4: Query string substitution ◦ Build new URLs by exchanging the query strings among URLs ◦ Parent  www.abc.com/online/signin/ebay?XYZ  www.xyz.com/online/signin/paypal?ABC ◦ Child  www.abc.com/online/signin/ebay?ABC  www.xyz.com/online/signin/paypal?XYZ

10 10 Heuristics for Generating New URLs (cont’d) H5: Brand name equivalence ◦ Build new URLs by substituting brand names occurring in phishing URLs with other brand names

11 11 Component 1: Verification Conduct a DNS lookup to filter out sites that cannot be resolved For each of the resolved URLs ◦ Try to establish a connection to the corresponding server For each successful connection ◦ Initiate a HTTP GET request to obtain content from the server If the HTTP header from the server has status code 200/202 (successful request) ◦ Perform a content similarity between the parent and the child URLs If the URL’s content has sharp resemblance (above say 90%) with the parent URL ◦ Conclude that the child URL is a bad site

12 12 Component 2: Approximate Matching Determine whether a given URL is a phishing site or not

13 13 M1: Matching IP Address Perform a direct match of the IP address of URL with the IP addresses of the blacklist entries Assign a normalized score based on the number of blacklist entries that map to a given IP address If IP address IP i is common to n i URLs min{n i } (max{n i }): the minimum (maximum) of the number of phishing URLs hosted by blacklisted entries of IP addresses

14 14 M2: Matching Hostname Perform hostname match with those in the blacklist Domains of phishing URLs ◦ Specifically registered for hosting phishing sites ◦ Hosted on free/paidfor web-hosting services (WHS) Identify whether an incoming URL consists of a WHS or not ◦ Matching WHSes ◦ Matching non-WHSes

15 15 M2: Matching Hostname (cont’d)

16 16 M3: Matching Directory Structure Perform directory structure match with those in the blacklist Philosophy of this design ◦ H3 (directory structure similarity) ◦ H4 (query string substitution) n i : the number of URLs corresponding to a directory structure

17 17 M4: Matching Brand Names Check for existence of brand names in pathname and query string of URLs n i : the number of occurrences of the brand name Compute a final cumulative score ◦ Assign different weights to different modules

18 18 Evaluation: Component 1 Collect 6,000 URLs from PhishTank (2009/7/2 ~ 2009/7/25)

19 19 Evaluation: Component 2 How many benign (malicious) sites are (not) flagged as malicious Data source ◦ Phishing URLs  PhishTank (consists of about 18, 000 URLs)  SpamScatter (14, 000 URLs) ◦ Benign URLs  DMOZ (100, 000 benign URLs )  20, 000 benign URLs from Yahoo Random URL generator (YRUG)

20 20 Evaluation: Component 2 (cont’d) Training phase ◦ Create various data structures using the phishing URLs Testing phase ◦ An input URL is flagged as a phishing or a benign site Weight of individual modules ◦ W(M1, M2, M3, M4) = (1.0, 1.0, 1.5, 1.5)

21 21 Evaluation: Component 2 (cont’d)

22 22 Conclusion Address major problems associated with blacklists Two major components of PhishNet ◦ URL prediction component ◦ Approximate URL matching component Flag new URLs effectively


Download ppt "11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26."

Similar presentations


Ads by Google