Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

Similar presentations


Presentation on theme: "1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor."— Presentation transcript:

1 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor

2 CS710 | KAIST Agenda  Phishing Attacks  Motivation & Goal  Relative Work  CANTINA  Evaluation  Conclusion 2

3 CS710 | KAIST Phishing Attacks(1/2)  The Act of stealing personal information via the internet for the purpose of committing financial fraud  Create a faked site similar to original sites like bank  Send to users using variable methods Spam e-mail, XSS vulnerabilities, Malware …  Technical issues  URL Obfuscation Similar domain, Encoding URL…  DNS hijacking Modifying hosts file, DNS server setting…  Malware BHO(Browser Helper Object), Browser Toolbar, Key logger… 3

4 CS710 | KAIST Phishing Attacks(2/2)  Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages  Similar to original web site  Often contain brand names and other terms that are common on a given web page  Owner’s brands 4

5 CS710 | KAIST Motivation & Goal  Phishing is a rapidly growing problem with 9,255 unique phishing sites reported in 2006  84 Anti-phishing toolbars  Low accuracies  There is a strong need for better automated detection algorithms  A novel content-based approach for detecting phishing web sites.  Accomplish the accuracy more than existing approach 5

6 CS710 | KAIST Related work(1/3)  Anti-Phishing has four categories  Why People Fall for Phishing Attacks? Have examined the reasons that people fall for phishing attacks  Educating people about Phishing Attacks Focused on online training materials, testing and situated learning  Anti-Phishing User Interface Focused on the development of better user interface for anti-phishing tools  Automated Detection of Phishing 6

7 CS710 | KAIST Relative work(2/3)  Anti-Phishing user interface  Toolbar-based approach  Browser extensions Dynamic Security Skins Web Wallet 7

8 CS710 | KAIST Relative Work(3/3)  Automated detection of phishing  To use heuristics to judge whether a page has phishing characteristics. Host name, domain name, URLs,…  To use a blacklist that lists reported phishing URLs 8

9 CS710 | KAIST CANTINA | Basic Concept  Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages  Contain brand names and terms of legitimate pages  Robust Hyperlinks  To find a broken links  Add lexical signature to URLs If link doesn’t work, then feed signature to search engine Ex. http://aaa.com/a.html?lexical-signature==“word1+word2+...+word5”  TF/IDF (Term frequency/Inverse document frequency)  Frequency based algorithm.  Basic algorithm for search engine comparing and classifying documents A term has a high TF-IDF weight by having a high term frequency in a given document 9

10 CS710 | KAIST CANTINA | Basic Concept 10 Web page Calculate TF-IDF weight of each term Take the five terms with highest TF-IDF weight Search top file term(term1+term2..) using google Compare the domain name with google search results Phishing site : domain name of current page do not match the domain name of the N top search results (30)

11 CS710 | KAIST CANTINA | Basic Concept eBay, user, sign, help, forgot Faked Page TF/IDF Top 5 :

12 CS710 | KAIST CANTINA | Basic Concept eBay, user, sign, help, forgot Real Page TF/IDF Top 5 :

13 CS710 | KAIST CANTINA | Basic Concept

14 CS710 | KAIST CANTINA | Additional Solutions  Basic CANTINA has a number of false positive  Solutions  Add the current domain name to the lexical signature  ZMP(Zero results Means Phishing) Google returns zero search results –Meaningless domain(e.g., “u-s-j.be”)  Larger set of heuristics based on related work From existing approach (e.g., SpoofGuard, PILFER) Age of Domain, Known Images, Suspicious URL,… 14

15 CS710 | KAIST Evaluation | Effectiveness #1(1/2)  Four conditions  Basic TF-IDF  Basic TF-IDF + domain name  Basic TF-IDF + ZMP  Basic TF-IDF + domain + ZMP  100 phishing URLs and 100 legitimate URLs  Phishing URLs : PhishTank.com  Legitimate URLs : From previous study 15

16 CS710 | KAIST Evaluation | Effectiveness #1(2/2) 16  Basic TF-IDF + ZMP + domain  False positives a little high  Final TF-IDF

17 CS710 | KAIST Evaluation | Effectiveness #2(1/2)  Want to reduce false positives  Combining several heuristics method 17

18 CS710 | KAIST Evaluation | Effectiveness #2(2/2)  Determining the best weights for these heuristics is a typical classification problem.  Use a simple forward linear model  Used 100 phishing URLs, 100 legitimate to find weights 18

19 CS710 | KAIST Evaluation | Effectiveness #3(1/2)  To evaluate the effectiveness of Final-TF-IDF, Final-TD- IDF+heuristics, SpoofGuard, and Netcraft  SpoofGuard : the highest true positive rate Relies entirely on heuristics  Netcraft : one of the best toolbars overall Uses a combination of heuristics and an extensive blacklist.  100 phishing URLs from PhishTank.com  100 legitimate URLs  35 sites often attacked (citibank. Papayl)  35 top pages from Alexa ( most popular sites)  30 random web pages from random.yahoo.com 19

20 CS710 | KAIST Evaluation | Effectiveness #3(2/2) 20  Reduced false positives from 6% to 1% by combining Final-TF-IDF with simple heuristics  But, true positive was decreased

21 CS710 | KAIST Discussion  Limitations  Does not apply to non-English web sites  System Performance Depend on performance of Google search engine  Attacks by criminals  use image instead of words  Add invisible text  Circumventing TF-IDF and PageRank Using “Google Bombs”  Attempt a DoS attack on Google 21

22 CS710 | KAIST Conclusion  CANTINA uses TF-IDF + search engines + heuristics to find phishing web sites  97% true positives with 6% false positives  89% true positives with 1% false positives  Shifts problem of identifying phishing sites to a search engine problem 22

23 CS710 | KAIST 23 Q&A


Download ppt "1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor."

Similar presentations


Ads by Google