Towards Eradicating Phishing Attacks Stefan Saroiu University of Toronto
Today’s anti-phishing tools have done little to stop the proliferation of phishing
Many Anti-Phishing Tools Exist
Phishing is Gaining Momentum
Current Anti-Phishing Tools Are Not Effective Let’s look at new approaches & new insights! Let’s look at new approaches & new insights! Part 1: new approach: user-assistance Part 1: new approach: user-assistance Part 2: need new measurement system Part 2: need new measurement system
Part 1 iTrustPage: A User-Assisted Anti- Phishing Tool
The Problems with Automation Many anti-phishing tools use auto. detection Many anti-phishing tools use auto. detection Automatic detection makes tools user-friendly Automatic detection makes tools user-friendly But it is subject to false negatives But it is subject to false negatives Each false negative puts a user at risk Each false negative puts a user at risk
What are False Negatives & False Positives? Example of a false negative: Example of a false negative: Phishing not detected by filter heuristics Phishing not detected by filter heuristics Example of a false positive: Example of a false positive: Legitimate dropped by filter heuristics Legitimate dropped by filter heuristics
Current Anti-Phishing Tools Are Not Effective Most anti-phishing tools use auto. detection Most anti-phishing tools use auto. detection Automatic detection makes tools user-friendly Automatic detection makes tools user-friendly But it is subject to false negatives But it is subject to false negatives Each false negative puts a user at risk Each false negative puts a user at risk
Can false negatives be eliminated?
Case Study: SpamAssassin SpamAssassin: one way to stop phishing SpamAssassin: one way to stop phishing Methodology Methodology Two corpora: Two corpora: Phishing: 1,423 s (Nov Aug. 06) Phishing: 1,423 s (Nov Aug. 06) Legitimate: 478 s from our Sent Mail folders Legitimate: 478 s from our Sent Mail folders SpamAssassin version SpamAssassin version Various levels of aggressiveness Various levels of aggressiveness
False Negatives Can’t Be Eliminated
Trade-off btw. False Negatives and False Positives Reducing false negatives increases false positives
Summary: Automatic Detection False negatives put users at risk False negatives put users at risk Hard to eliminate false negatives Hard to eliminate false negatives Making automatic detection more aggressive increases rate of false positives Making automatic detection more aggressive increases rate of false positives Appears to be fundamental trade-off Appears to be fundamental trade-off Let’s look at new approaches Let’s look at new approaches
New Approach: User-Assistance Involve user in the decision making process Involve user in the decision making process Benefits: Benefits: 1. False-positives unlikely and more tolerable Combine with conservative automatic detection 2. Use detection that is hard-for-computers but easy-for-people
Outline Motivation Motivation Design of iTrustPage Design of iTrustPage Evaluation of iTrustPage Evaluation of iTrustPage Summary of Part 1 Summary of Part 1
Two Observations about Phishing 1. Users intend to visit a legitimate page, but they are misdirected to an illegitimate page 2. If two pages look the same, one is likely phishing the other [Florêncio & Herley - HotSec ‘06]
Two Observations about Phishing 1. Users intend to visit a legitimate page, but they are misdirected to an illegitimate page 2. If two pages look the same, one is likely phishing the other [Florêncio & Herley - HotSec ‘06] Idea: use these observations to detect phishing
Involving Users Determine “intent” Determine “intent” Ask user to describe page as if entering search terms Ask user to describe page as if entering search terms Determine whether pages “look alike” Determine whether pages “look alike” Ask user to detect visual similarity between two pages Ask user to detect visual similarity between two pages Tasks are hard-for-computers but easy-for-people Tasks are hard-for-computers but easy-for-people
iTrustPage’s Validation When user enters input on a Web page Two-step validation process 1. Conservative automatic validation Simple whitelist -- top 500 most popular Web sites Simple whitelist -- top 500 most popular Web sites Cache -- avoid “re-validation” Cache -- avoid “re-validation” 2. Flag page “suspicious”; rely on user-assistance
iTrustPage: Validating Site
Step 1: Filling Out a Form
Step 2: Page Validated
iTrustPage: Avoid Phishing Site
Step 1: Filling Out a Suspicious Page
Step 2: Visual Comparison
Step 3: Attack Averted
Two Issues: Revise & Bypass What if users can’t find the page on Google? What if users can’t find the page on Google? Visiting an un-indexed page Visiting an un-indexed page Wrong/ambiguous keywords for search Wrong/ambiguous keywords for search iTrustPage supports two options: iTrustPage supports two options: Revise search terms Revise search terms Bypass validation process Bypass validation process Similar to false negatives in automatic tools Similar to false negatives in automatic tools
Outline Motivation Motivation Design of iTrustPage Design of iTrustPage Evaluation of iTrustPage Evaluation of iTrustPage Summary of Part 1 Summary of Part 1
Methodology Instrumented code sends anonymized logs: Instrumented code sends anonymized logs: Info about iTrustPage usage Info about iTrustPage usage High-Level Stats: High-Level Stats: June 27th August 9th, 2007 June 27th August 9th, ,184 unique installations 5,184 unique installations 2,050 users with 2+ weeks of activity 2,050 users with 2+ weeks of activity
Evaluation Questions How disruptive is iTrustPage? How disruptive is iTrustPage? Are users willing to help iTrustPage’s validation? Are users willing to help iTrustPage’s validation? Did iTrustPage prevent any phishing attacks? Did iTrustPage prevent any phishing attacks? How many searches until validate? How many searches until validate? How effective are the whitelist and cache? How effective are the whitelist and cache? How often do users visit pages accepting input? How often do users visit pages accepting input?
How disruptive is iTrustPage?
iTrustPage is not disruptive Users interrupted on less than 2% of pages After first day of use, 50+% of users never interrupted
Are users willing to help iTrustPage’s validation?
Many Users are Willing to Participate Half the users willing to assist the tool in validation
Did iTrustPage prevent any phishing attacks?
An Upper Bound Anonymization of logs prevents us from measuring iTrustPage’s effectiveness Anonymization of logs prevents us from measuring iTrustPage’s effectiveness 291 visually similar pages chosen instead 291 visually similar pages chosen instead 1/3 occurred after two weeks of use 1/3 occurred after two weeks of use
Summary of Evaluation Not disruptive; disruption rate decreasing over time Not disruptive; disruption rate decreasing over time Half the users are willing to participate in validation Half the users are willing to participate in validation Pages with input are very common on Internet Pages with input are very common on Internet iTrustPage is easy to use iTrustPage is easy to use
Summary of Part 1 An alternative approach to automation: An alternative approach to automation: Have user assist tool to provide better protection Have user assist tool to provide better protection Our evaluation has shown our tool’s benefits while avoiding pitfalls of automated tools Our evaluation has shown our tool’s benefits while avoiding pitfalls of automated tools iTrustPage protects users who always participate in page validation iTrustPage protects users who always participate in page validation
What is the Take-Away Point?
usabilitysecurity Automatic Detection User-Assistance
What is the Take-Away Point? usabilitysecurity Automatic Detection User-Assistance Many of today’s tools
What is the Take-Away Point? usabilitysecurity Automatic Detection User-Assistance Many of today’s toolsiTrustPage
Part 2 Bunker: A System for Gathering Anonymized Traces
Motivation Two ways to anonymize network traces: Two ways to anonymize network traces: Offline: anonymize trace after raw data is collected Offline: anonymize trace after raw data is collected Online: anonymize while it is collected Online: anonymize while it is collected
Motivation Two ways to anonymize network traces: Two ways to anonymize network traces: Offline: anonymize trace after raw data is collected Offline: anonymize trace after raw data is collected Online: anonymize while it is collected Online: anonymize while it is collected Today’s traces require deep packet inspection Today’s traces require deep packet inspection Privacy risks make offline anonymization unsuitable Privacy risks make offline anonymization unsuitable
Motivation Two ways to anonymize network traces: Two ways to anonymize network traces: Offline: anonymize trace after raw data is collected Offline: anonymize trace after raw data is collected Online: anonymize while it is collected Online: anonymize while it is collected Today’s traces require deep packet inspection Today’s traces require deep packet inspection Privacy risks make offline anonymization unsuitable Privacy risks make offline anonymization unsuitable Phishing involves sophisticated analysis Phishing involves sophisticated analysis Performance needs makes online anon. unsuitable Performance needs makes online anon. unsuitable
Simple Tasks are Very Slow Regular expression for phishing: " ((password)|(<form)|(<input)|(PIN)|(username)|(<script)| (user id)|(sign in)|(log in)|(login)|(signin)|(log on)| (signon)|(signon)|(passcode)|(logon)|(account)|(activate)|(verify)| (payment)|(personal)|(address)|(card)|(credit)|(error)|(terminated)| (suspend))[^A-Za-z] ” Regular expression for phishing: " ((password)|(<form)|(<input)|(PIN)|(username)|(<script)| (user id)|(sign in)|(log in)|(login)|(signin)|(log on)| (signon)|(signon)|(passcode)|(logon)|(account)|(activate)|(verify)| (payment)|(personal)|(address)|(card)|(credit)|(error)|(terminated)| (suspend))[^A-Za-z] ” libpcre: 5.5 s for 30 M = 44 Mbps max libpcre: 5.5 s for 30 M = 44 Mbps max
Motivation Two ways to anonymize network traces : Two ways to anonymize network traces : Offline: anonymize trace after raw data is collected Offline: anonymize trace after raw data is collected Online: anonymize while it is collected Online: anonymize while it is collected Today’s traces require deep packet inspection Today’s traces require deep packet inspection Privacy risks make offline anonymization unsuitable Privacy risks make offline anonymization unsuitable Phishing involves sophisticated analysis Phishing involves sophisticated analysis Performance needs makes online anon. unsuitable Performance needs makes online anon. unsuitable
Motivation Two ways to anonymize network traces : Two ways to anonymize network traces : Offline: anonymize trace after raw data is collected Offline: anonymize trace after raw data is collected Online: anonymize while it is collected Online: anonymize while it is collected Today’s traces require deep packet inspection Today’s traces require deep packet inspection Privacy risks make offline anonymization unsuitable Privacy risks make offline anonymization unsuitable Phishing involves sophisticated analysis Phishing involves sophisticated analysis Performance needs makes online anon. unsuitable Performance needs makes online anon. unsuitable Need new tool to combine best of both worlds Need new tool to combine best of both worlds
Threat Model Accidental disclosure: Accidental disclosure: Risk is substantial whenever humans are handling data Risk is substantial whenever humans are handling data Subpoenas: Subpoenas: Attacker has physical access to tracing system Attacker has physical access to tracing system Subpoenas force researcher and ISPs to cooperate Subpoenas force researcher and ISPs to cooperate As long as cooperation is not “unduly burdensome” As long as cooperation is not “unduly burdensome” Implication: Nobody can have access to raw data Implication: Nobody can have access to raw data
Is Developing Bunker Legal?
It Depends on Intent of Use Developing Bunker is like developing encryption Developing Bunker is like developing encryption Must consider purpose and uses of Bunker Must consider purpose and uses of Bunker Developing Bunker for user privacy is legal Developing Bunker for user privacy is legal Misuse of Bunker to bypass law is illegal Misuse of Bunker to bypass law is illegal
Our solution: Bunker Combines best of both worlds Combines best of both worlds Same privacy benefits as online anonymization Same privacy benefits as online anonymization Same engineering benefits as offline anonymization Same engineering benefits as offline anonymization Pre-load analysis and anonymization code Pre-load analysis and anonymization code Lock-it and throw away the key (tamper-resistance) Lock-it and throw away the key (tamper-resistance)
Outline Motivation Motivation Design of Bunker Design of Bunker Evaluation of Bunker Evaluation of Bunker Summary of Part 2 Summary of Part 2
Logical Design capture Anon. Key Online Offline assemble parse anonymize One-Way Interface (anon. data) Capture Hardware
capture Anon. Key Online Offline Capture Hardware Closed-box VM assemble parse anonymize Hypervisor encrypt decrypt Enc. Key Encrypted Raw Data One-Way Socket VM-based Implementation Open-box NIC
Open-box VM save trace logging maintenance capture Anon. Key Online Offline Capture Hardware Closed-box VM assemble parse anonymize Hypervisor encrypt decrypt Enc. Key Encrypted Raw Data One-Way Socket VM-based Implementation
Outline Motivation Motivation Design of Bunker Design of Bunker Evaluation of Bunker Evaluation of Bunker Summary of Part 2 Summary of Part 2
Software Engineering Benefits One order of magnitude btw. online and offline Development time: Bunker - 2 months, UW/Toronto - years
Summary of Part 2 Bunker combines: Bunker combines: Privacy benefits of online anonymization Privacy benefits of online anonymization Software engineering benefits of offline anon. Software engineering benefits of offline anon. Ideal tool for characterizing phishing Ideal tool for characterizing phishing
Our Current Use of Bunker Few “hard facts” known about phishing: Few “hard facts” known about phishing: Banks have no incentive to disclose info Banks have no incentive to disclose info Must focus on victims than on phishing attacks Must focus on victims than on phishing attacks Preliminary study of Hotmail users: Preliminary study of Hotmail users: How often do people click on links in their s? How often do people click on links in their s? Do the same people fall victims to phishing? Do the same people fall victims to phishing? How cautious are people who click on links in s? How cautious are people who click on links in s?
Our Contributions iTrustPage: new approach to anti-phishing iTrustPage: new approach to anti-phishing Bunker: system for gathering anonymized traces Bunker: system for gathering anonymized traces
Acknowledgements Graduate students at Toronto Graduate students at Toronto Andrew Miklas Andrew Miklas Troy Ronda Troy Ronda Researchers Researchers Alec Wolman (MSR Redmond) Alec Wolman (MSR Redmond) Faculty Faculty Angela Demke Brown (Toronto) Angela Demke Brown (Toronto)
Questions? iTrustPage:
Research Interests Building Systems Leveraging Social Networks Building Systems Leveraging Social Networks Exploiting social interactions in mobile systems Exploiting social interactions in mobile systems Rethinking access control for Web 2.0 Rethinking access control for Web 2.0 Making the Internet more secure Making the Internet more secure Characterizing spread of Bluetooth worms Characterizing spread of Bluetooth worms iTrustPage + Bunker iTrustPage + Bunker Characterizing network environments in the wild Characterizing network environments in the wild Characterizing residential broadband networks Characterizing residential broadband networks Evaluating emerging “last-meter” Internet apps Evaluating emerging “last-meter” Internet apps
Circumventing iTrustPage “Google bomb”: increasing a phishing page’s rank “Google bomb”: increasing a phishing page’s rank This is not enough to circumvent iTrustPage This is not enough to circumvent iTrustPage Breaking into a popular site that is already in iTrustPage’s whitelist or cache Breaking into a popular site that is already in iTrustPage’s whitelist or cache Compromising a user’s browser Compromising a user’s browser
Problems with Password Managers When password field present: When password field present: Ask user to select from a list of passwords Ask user to select from a list of passwords Remember password selection for re-visits Remember password selection for re-visits Challenges: Challenges: Auto. detection of passwd. fields can be “fooled” Auto. detection of passwd. fields can be “fooled” Such tools increase amount of confidential info Such tools increase amount of confidential info Don’t assist users on how to handle phishing Don’t assist users on how to handle phishing
Downloads Released on Mozilla.org
Most Searches Don’t Need Revision Users can find their page majority of the time
Outcomes of Validation Process 1/3 of time, users choose to bypass validation
Forms and Scripts are Prevalent Many Web pages have multiple forms
Whitelist’s Hit Rate Hit rate remains flat at 55%
Cache’s Hit Rate Hit rate reaches 65% after one week
Our solution Combines best of both worlds Combines best of both worlds Stronger privacy benefits than online anonymization Stronger privacy benefits than online anonymization Same engineering benefits as offline anonymization Same engineering benefits as offline anonymization Experimenter must commit to an anonymization process before trace begins Experimenter must commit to an anonymization process before trace begins
Illustrating the Arms Race SpamAssassin is adapting to phishing attacks Attackers are also adapting to SpamAssassin
Current Anti-Phishing Tools Are Not Effective Most anti-phishing tools use auto. detection Most anti-phishing tools use auto. detection Automatic detection makes tools user-friendly Automatic detection makes tools user-friendly But it is subject to false negatives But it is subject to false negatives Each false negative puts a user at risk Each false negative puts a user at risk
Offline Anonymization Trace anonymized after raw data is collected Trace anonymized after raw data is collected Privacy risk until raw data is deleted Privacy risk until raw data is deleted Today’s traces require deep packet inspection Today’s traces require deep packet inspection Headers insufficient to understand phishing Headers insufficient to understand phishing Payload traces pose a serious privacy risk Payload traces pose a serious privacy risk Risk to user privacy is too high Risk to user privacy is too high Two universities rejected offline anonymization Two universities rejected offline anonymization
Online Anonymization Trace anonymized online Trace anonymized online Raw data resides in RAM only Raw data resides in RAM only Difficult to meet performance demands Difficult to meet performance demands Extraction and anonymization must be done at line speeds Extraction and anonymization must be done at line speeds Code is frequently buggy and difficult to maintain Code is frequently buggy and difficult to maintain Low-level languages (e.g. C) + “Home-made” parsers Low-level languages (e.g. C) + “Home-made” parsers Small bugs cause large amounts of data loss Small bugs cause large amounts of data loss Introduces consistent bias against long-lived flows Introduces consistent bias against long-lived flows
Motivation Two ways to anonymize traces: Two ways to anonymize traces: Offline: trace anonymized after raw data is collected Offline: trace anonymized after raw data is collected Online: trace anonymized while raw data is collected Online: trace anonymized while raw data is collected Deep packet inspection killed us with phishing Deep packet inspection killed us with phishing A game changer A game changer Motivation: try to get the best of both worlds Motivation: try to get the best of both worlds Before I tell you about the design let me elaborate on the security concerns Before I tell you about the design let me elaborate on the security concerns
Related Work: iTrustPage Spam filters and blacklists Spam filters and blacklists Exchange, Outlook, SpamAssassin Exchange, Outlook, SpamAssassin IE7, Firefox, Opera IE7, Firefox, Opera New Web authentication tools New Web authentication tools Out-of-band [JDM06, PKA06] (MITM) Out-of-band [JDM06, PKA06] (MITM) Password managers [HWF05, RJM + 05, YS06] Password managers [HWF05, RJM + 05, YS06] New Web interfaces New Web interfaces Passpet, WebWallet, CANTINA Passpet, WebWallet, CANTINA Centralized approaches Centralized approaches Central server for password similarity [FH06] Central server for password similarity [FH06] Central server for valid sites [LDHF 05] Central server for valid sites [LDHF 05]
Related Work: User Studies Web password habits [FH07] Web password habits [FH07] Huge password management problems Huge password management problems People fall for simple attacks [DTH06] People fall for simple attacks [DTH06] Warnings more effective than passive cues [WMG06] Warnings more effective than passive cues [WMG06] Personalized attacks are very successful [JJJM06] Personalized attacks are very successful [JJJM06] Security tools must be intuitive and simple to use [CO06] Security tools must be intuitive and simple to use [CO06]
Related Work: Bunker Network tracing systems: Network tracing systems: Httpdump [WWB96], BLT [Fe00], UWTrace [Wo02], CoMo [Ia05] Httpdump [WWB96], BLT [Fe00], UWTrace [Wo02], CoMo [Ia05] Anonymization schemes: Anonymization schemes: Prefix-preserving [XFA + 01] Prefix-preserving [XFA + 01] High-level anonymization languages [PV03] High-level anonymization languages [PV03] Secure VMs: Secure VMs: Tamper-resistant hardware [LTH03] Tamper-resistant hardware [LTH03] Small VMMs + formal verification [Ka06, Ru08] Small VMMs + formal verification [Ka06, Ru08] PL techniques for memory safety + control flow [KBA02, CLD + 07] PL techniques for memory safety + control flow [KBA02, CLD + 07] Hardware memory protection [SLQ07] Hardware memory protection [SLQ07]