Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application January 31st, 2017 Samuel Marchal*, Giovanni Armano*, Kalle Saari*, Tommi Gröndahl*, Nidhi Singh†, N.Asokan* *Aalto University - †Intel Security samuel.marchal@aalto.fi
Requirements for phishing detection Accuracy: high detection rate with low misidentification of legitimate webpages as phish. Context independent detection: not dependent on any observed language or brand. Temporal resilience: accuracy does not degrade overtime. Resilience to dynamic phish: different content can be delivered to different user User privacy: no disclosure of browsing history Effective protection: fast decision and effective warning
Client-side implementation Decision relies only on information available to a web browser: Privacy preservation Resilient to dynamic phish Starting URL Landing URL Redirection chain Logged links HTML source code: Text Title HREF links Copyright
Modeling phisher limitations Phishers have different level of control and are placed under some constraints while building a webpage: Control: External loaded content (logged links) and external HREF links are not controlled by page owner. Constraints: Registered domain name part of URL cannot be freely defined: constrained by registration (DNS) policies. Accurate decision Temporal resilience
Use few but dynamic features 210 dynamic features computed from data sources: URL features (106) Term usage consistency (66) Usage of starting and landing mld (22) RDN usage (13) Webpage content (5) Gradient Boosting classification (supervised) Context independent decision Fast decision
Relevant warnings Redirection to the target of the phish / no technical jargon
System Accuracy (language independence) Classifier Training: 4,531 English legitimate webpages 1,036 phishing webpages Assessment: Legitimate webpages: 100,000 English 10,000 each in French, German, Italian, Portuguese and Spanish 1,216 phishing webpages
System Accuracy (language independence) ROC Curve Precision vs. Recall 100,000 English legitimate / 1,216 phishs (≈ real world repartition) Precision Recall FP Rate AUC Accuracy 0.956 0.958 0.0005 0.999
Accuracy comparison FPR Precision Recall Accuracy Cantina (CMU) 0.03 0.212 0.89 0.969 Cantina+ (CMU) 0.013 0.964 0.955 0.97 Ma et al. (UCB) 0.001 0.998 0.924 Whittaker et al. (Google) 0.0001 0.989 0.915 0.999 Monarch (UCB) 0.003 0.961 0.734 0.866 Our method 0.0005 0.956 0.958
Performance Memory footprint Impact on Web surfing 295 MB Phishing webpages: Interaction blocked in < 0.2 second Warning displayed (and target identified) in < 2 seconds Legitimate webpages: None (albeit false positives)
Thank You https://ssg.aalto.fi/projects/phishing/
Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application January 31st, 2017 Samuel Marchal*, Giovanni Armano*, Kalle Saari*, Tommi Gröndahl*, Nidhi Singh†, N.Asokan* *Aalto University - †Intel Security samuel.marchal@aalto.fi