Presentation is loading. Please wait.

Presentation is loading. Please wait.

PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.

Similar presentations


Presentation on theme: "PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University."— Presentation transcript:

1 PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University INFOCOM (March, 2010) 2010/8/031

2 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/032

3 Phishing Attacks Simplicity and Ubiquity of the Web Attract several miscreants – lure innocent to revealing sensitive information Above all. Today, such an miscreants attack of common and increasing by day is Phishing – http://www.antifishing.org/events/events.html http://www.antifishing.org/events/events.html – APWG’s anti-phishing contents of work 2010/8/033

4 Popular Solution Add additional feature s within an Internet browser Often provided by a mechanism is known as Blacklisting – Simple to design and easy to implement Major problem: Incompleteness – cyber-criminals are extremely savvy so that they are easy to evade blacklists 2010/8/034

5 Observation Malicious URLs do often tend to occur in groups that are close to each other either syntactically or semantically – www1.rogue.com, www2.rogue.com – two URLs with hostnames resolves to the same IP address 2010/8/035

6 Implication First, discover new sources of maliciousness in and around the original blacklist entries and add them into blacklist Second, exact match implementation of a blacklist to an approximate match that is aware of several of the legal mutations that often exist within these URLs 2010/8/036

7 PhishNet Comprise two major components: A. a URL prediction component B. an approximate URL matching component 2010/8/037

8 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/038

9 Predicting Malicious URLs Predicting new URLs from existing blacklist entries – e.g., PhishTank, http://www.phishtank.com/index.phphttp://www.phishtank.com/index.php Use five heuristics for generating new URLs Basic idea – combine pieces of known phishing URLs(parent) from a blacklist to generate new URLs(child) Then, test the existence of these child URLs using a verification process 2010/8/039

10 Heuristics H1, Replacing TLD (Top Level Domain) – find such variants of original blacklist entries obtained by changing the TLDs – use 3210 effective TLDs (eq, co.in) H2, IP address equivalence – URLs have same IP address are grouped together into clusters – create new URLs by considering all combinations of hostnames and pathnames 2010/8/0310

11 Heuristics (Conti.) H3, Directory structure similarity – URLs with similar directory structure are grouped together – build new URLs by exchanging the filenames among URLs belonging to the same group H4, Query string substitution – build new URLs by exchanging the query strings among URLs 2010/8/0311

12 Heuristics (Conti.) H5, Brand name equivalence – phishers often target multiple brand name using the same URL structure – build new URLs by substituting brand names occurring in phishing URLs with other brand names 2010/8/0312

13 Verificaiton Eliminate URLs that are either non-existent or are non-phishing sites – conduct DNS lookup – establish a connection to the corresponding server – initiate a HTTP GET request to obtain content from the server – if request is successful, use publicly available detection tool 2010/8/0313

14 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/0314

15 Approximate Matching Determine whether a given URL is a phishing site or not Perform approximate match of a given URL to the entries in the blacklist by first breaking the input URL into four different entities – IP address – hostname – directory structure – brand name 2010/8/0315

16 Approximate Matching (Conti.) 2010/8/0316

17 Approximate Matching (Conti.) M1: Matching IP address – drect match – assign a normalized score based on the number of blacklist entries that map to a given IP address – IP address IP i is common to n i URLs – scores computing as following: 2010/8/0317

18 Approximate Matching (Conti.) M2: Matching hostname – classify between WHS (Web Host Service)and non- WHS 2010/8/0318

19 Approximate Matching (Conti.) M2: Matching hostname – A. Matching WHSes: if match succeeds, confidence score is computed using (1), on the number of URLs that have the same primary domain – B. Matching non-WHSes: based on syntactic similarity across labels If match succeeds, confidence score is computed using(1), n i referring to the number of URLs that match a given regular expression 2010/8/0319

20 Approximate Matching (Conti.) M3: Matching directory structure – If match succeeds, confidence score is computed using(1), n i representing the number of URLs corresponding to a directory structure in the hash map M4: Matching brand names – If match succeeds, confidence score is computed using(1), n i being the number of occurrences of the brand name 2010/8/0320

21 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/0321

22 Predicting Malicious URLs Collected URLs over a period of 24 days starting from 2nd July 2009 to 25th July 2009 Generated almost 1.55 million child URLs from the approximately 6,000 parent URLs About 34489 out of 1.55 million could be fetched (%2), be compared with parent URL using page similarity tool – http://www.webconfs.com http://www.webconfs.com – greater than 90% similarity are reported as our new malicious URLs 2010/8/0322

23 Approximate Matching Effectiveness and the URL processing time Use data from four sources The experimental setup consists of two phases—training and testing For evaluation, use the following weight to different normalized scores : – W(M1) = 1.0 – W(M2) = 1.0 – W(M3) = 1.5 – W(M4) = 1.5 2010/8/0323

24 Approximate Matching 2010/8/0324

25 Approximate Matching 2010/8/0325 14% of URLs Gen 80% of 14% of URLs Gen (11.2% of URLs Gen) 2.2% of 14% of URLs Gen (2% of URLs Gen)

26 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/0326

27 Related Work APWG regularly publishes facts and figures about phishing such as list of TLDs and brand names targeted, trends in phishing URL structure Highly Predictive blacklisting – Rely on tens of thousands of features based on extra information from outside sources such as WhoIS, registrar information 2010/8/0327

28 Outline Introduction Component1 Component2 Evaluation Related Work Conclusion 2010/8/0328

29 Conclusion Blacklisting is the most common technique to defend against phishing attacks PhishNet suffers from low false positives and is remarkably effective at flagging new URLs that were not part of the original blacklist 2010/8/0329

30 THANK YOU 2010/8/0330


Download ppt "PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University."

Similar presentations


Ads by Google