Download presentation
Presentation is loading. Please wait.
Published byRaymond Coulston Modified over 10 years ago
1
PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA
2
What is Phishing? Anh Le - UC Irvine - PhishDef2 Social engineering and technical means to steal consumers’ personal identity, data, etc. Cause billions of dollars of loss annually
3
Anh Le - UC Irvine - PhishDef3 Antiphishing.org
4
Example of a Phishing Site Anh Le - UC Irvine - PhishDef4
5
Current Protection Anh Le - UC Irvine - PhishDef5 Google Safe Browsing Microsoft Smart Screen Third-Party
6
Current Protection Model Anh Le - UC Irvine - PhishDef6 Motivation: Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing Google Safe Browsing
7
Outline oPhishing Background oMotivation oOur proposal oNew Protection Model oLearning Algorithms oDataset oFeature Selection oEvaluation Results oConcluding Remarks Anh Le - UC Irvine - PhishDef7
8
Our Proposed Protection Model Anh Le - UC Irvine - PhishDef8 Main challenges: Accuracy and Classification Latency Which classification algorithm works best? Which set of features works best?
9
Prior Work oWhittaker et al. [NDSS ’10] oGoogle Safe Browsing oMa et al. [SIGKDD ’09] oBatch-based Classification oMa et al. [ICML ‘09] oBatch-based vs. Online Learning Anh Le - UC Irvine - PhishDef9 Server-Side Classification
10
Main Contributions oNew Protection Model: oClient-side classification oPropose using Adaptive Regularization of Weights (AROW) oHigh accuracy oResilient to noise oSet of Lexical Features oFast to extract at client side oObfuscation resistant Anh Le - UC Irvine - PhishDef10
11
Batch-based Support Vector Machine Online Perceptron Confident Weighted (CW) [Dredze et al., ICML 2008] Adaptive Regularization of Weights (AROW) [Crammer et al., NIPS 2009] Machine Learning Algorithms Anh Le - UC Irvine - PhishDef11
12
Online Classification Anh Le - UC Irvine - PhishDef12 Maintaining a weight vector and use it for classification Online Perceptron Trained Beforehand Extract In Real Time Client Side: Server Side:
13
Online Classification Anh Le - UC Irvine - PhishDef13 Confident Weighted (CW) Adaptive Regularization of Weights (AROW) minimum change enough to correct last mistake minimum change penalty for mistake increasing confidence
14
oPhishing URLs oPhishTank (4,082) oMalwarePatrol (2,001) oBenign URLs oOpen directory (4,012) oYahoo directory (4,143) oTime period: June 2010 Dataset Anh Le - UC Irvine - PhishDef14
15
Feature Selection Anh Le - UC Irvine - PhishDef15 oLexical Features oExternal Features oCountry, AS number, registration date, registrant, registrar, etc.
16
Outline oPhishing Background oMotivation oOur proposal oNew Protection Model oLearning Algorithms oDataset oFeature Selection oEvaluation Results oConcluding Remarks Anh Le - UC Irvine - PhishDef16
17
Evaluation Results: Lexical vs. Full Features Lexical features alone are better-suited than full features for client-side phishing classification Anh Le - UC Irvine - PhishDef17 (+) ~ 1% (-) Dependency on Remote Server (-) Avg. Latency: 1.64 s
18
Evaluation Results: CW vs. AROW AROW is more resilient to noise than CW Anh Le - UC Irvine - PhishDef18
19
Conclusion: PhishDef 19Anh Le - UC Irvine - PhishDef oClient-side phishing classification system oProactive, on-the-fly classification of zero-day phishing URLs oLow delay client side (ms), high accuracy (97%) oResilient to noisy data oFuture Work: oDevelop an add-on for Firefox
20
oQuestions Anh Le - UC Irvine - PhishDef20
21
Anh Le - UC Irvine - PhishDef21
22
Example of a Phishing Site 22Anh Le - UC Irvine - PhishDef http://www.hmrc.gov.uk/intro-income-tax.htm http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm
23
Evaluation Results: Batch-Based vs. Online Learning Online Learning outperforms Batched-Based Learning for Phishing classification Anh Le - UC Irvine - PhishDef23
24
Chrome 11 > Firefox 4 24Anh Le - UC Irvine - PhishDef
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.