Anti-Phishing Approaches Lifeng Hu
What is Phishing? An engineering attack An attempt to trick individuals into revealing personal credentials (uname, passwd, credit card info, etc) Based on faked and websites A threat for the internet users Damages - 73 million US adults received more than 50 phishing s a year - $2.8 billion loss a year
Phishing Methods Establish websites having similar interface/URL as famous websites Establish cheating websites to get users’ personal information Establish transparent website between original websites and users Send s containing malicious URL Send s containing embed malicious flash/picture files to avoid text checking of anti- phishing
False positive/negative rate of Anti-Phishing Approaches False negative rate: the rate of phishing websites being regarded as good in all phishing websites False positive rate: the rate of good websites being regarded as phishing in all good websites So, the lower false rates are, the better Anti-Phishing approach is
Anti-Phishing Approaches for Specific Websites Typically, designed by website companies An example is Sitekey mechanism of BankOfAmerica online Pro: False negative rate is low False positive rate can be zero Con: Not applicable for phishing s
Anti-Phishing Approaches Based on Database Anti-phishing Firewall : Kaspersky Anti-phishing Toolbar : Netcraft All based on on-line database Toolbar can provide URL statistics data in advance Pro: Applicable for both websites and s False negative rate can be low False positive rate is low Con: Need frequent updates Relatively hard to implement False negative rate increases if not up-to-date
Anti-Phishing Approaches Based on Content PILFER: phishing detection based on machine-learning combining 10 filters: - IP based URL: /paypal.cgi?fix=account - Domain age from whois.net - Non-matching URL: paypal.com - HTML hidden URLs - Malicious JavaScript - … Pro: Practically, false positive and negative rate are relative low Machine learning methods make it possible to improve accuracy No constant update is needed Con: Still need updates on training data and filters to adapt new styles of phishing s Network cost is a problem
Anti-Phishing Approaches Based on Content (cont.) CANTINA: phishing website detection based on TF-IDF weight - TF: the number of times a given term appears in a specific document - IDF: a measure of the general importance of the term in all documents - TF-IDF = TF/IDF, specifies term with frequency in a given document - Search five top TF-IDF words of current web page in search engine such as Google - Current web page should be in top N (30) search results to be legitimate CANTINA also uses filters similar to PILFER to decrease false positive Pro: False positive and negative rate are very low No constant update is needed Search engine ranking is relative hard to cheat Con:Network cost is a problem Too many phishing website searches may affect phishing websites’ ranking
Summary of mentioned Anti-Phishing Approaches Anti-Phishing ApproachesFalse PositiveFalse Negative Implement Effort Adaptation Update Cycle For Specific WebsitesZeroLowEasySpecific WebsiteNone Firewall Based on DatabaseLowMedium General Web/ Very Frequently Toolbar Based on DatabaseLow Hard General Web/ Very Frequently PILFERLow MediumGeneral Sometimes CANTINAVery LowLowMedium General Websites Few
Thanks!