CANTINA: A Content-Based Approach to Detecting Phishing Web Sites Yue Zhang University of Pittsburgh Jason I. Hong, Lorrie F. Cranor Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
CSCD 303 Essential Computer Security Winter 2014 Lecture 3 - Social Engineering1 Phishing Reading: See links at end of lecture.
C MU U sable P rivacy and S ecurity Laboratory Anti-Phishing Phil The Design and Evaluation of a Game That Teaches People Not to.
Design and Evaluation of a Real-Time URL Spam Filtering Service
PHAD- A Phishing Avoidance and Detection Tool Using Invisible Digital Watermarking By Sonali Batra Web 2.0 Security and Privacy 2014.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
Jason Hong, PhD Carnegie Mellon University Wombat Security Technologies Teaching Johnny Not to Fall for Phish.
The PageRank Citation Ranking “Bringing Order to the Web”
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
CMU Usable Privacy and Security Laboratory A Brief History of Semantic Attacks or How Not to Get Screwed Online Serge Egelman.
Usable Privacy and Security: Protecting People from Online Phishing Scams Alessandro Acquisti Lorrie Cranor Julie Downs Jason Hong Norman Sadeh Carnegie.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
Usable Privacy and Security Jason I. Hong Carnegie Mellon University.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
By: Bihu Malhotra 10DD.   A global network which is able to connect to the millions of computers around the world.  Their connectivity makes it easier.
Examining the Effectiveness and Techniques of the Anti-Phishing Technology in Leading Web Browsers and Security Toolbars. Wesley W. Owen
Presented By Jay Dani.  Web Spoofing is a security attack that allows an adversary to observe and modify all web pages sent to the victim's machine,
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
Web Spoofing John D. Cook Andrew Linn. Web huh? Spoof: A hoax, trick, or deception Spoof: A hoax, trick, or deception Discussed among academics in the.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
Web-Phishing – Techniques and Countermeasures CIS5370 Computer Security Fall 2008 Muhammad Khalil / Marcus Wolff.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
WEB SPOOFING by Miguel and Ngan. Content Web Spoofing Demo What is Web Spoofing How the attack works Different types of web spoofing How to spot a spoofed.
Ph.D. candidate: Guang Xiang Committee: Jason Hong (chair), Carolyn Rosé (co-chair), Alexander Hauptmann, Christos Faloutsos, Markus Jakobsson.
KAIST Web Wallet: Preventing Phishing Attacks by Revealing User Intentions Min Wu, Robert C. Miller and Greg Little Symposium On Usable Privacy and Security.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
CMU Usable Privacy and Security Laboratory Phinding Phish: An Evaluation of Anti-Phishing Toolbars Yue Zhang, Serge Egelman, Lorrie.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Data Structures & Algorithms and The Internet: A different way of thinking.
Anti-Phishing Approaches Lifeng Hu
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009.
C MU U sable P rivacy and S ecurity Laboratory User Interfaces and Algorithms for Fighting Phishing Steve Sheng Doctoral Candidate,
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
BY : MUHAMMAD KHUZAIMI B. ISHAK 4 ADIL PUAN MAZITA INFORMATION AND COMMUNICATION OF TECHNOLOGY.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Phishing Dennis Schmidt, CISSP Director, Office of Information Systems HIPAA Security Officer UNC School of Medicine UNC School of Medicine.
C MU U sable P rivacy and S ecurity Laboratory Protecting People from Phishing: The Design and Evaluation of an Embedded Training.
Phishing & Pharming. 2 Oct to July 2005 APWG.
Usable Privacy and Security and Mobile Social Services Jason Hong
Wikispam, Wikispam, Wikispam PmWiki Patrick R. Michaud, Ph.D. March 4, 2005.
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
1 Phinding Phish : Evaluating Anti- Phishing Tools Yue Zhang,Jason Hong (2007) Carnegie Mellon University.
Web Spam Taxonomy Zoltán Gyöngyi, Hector Garcia-Molina Stanford Digital Library Technologies Project, 2004 presented by Lorenzo Marcon 1/25.
ISYM 540 Current Topics in Information System Management
CANTINA: A Content-Based Approach to Detecting Phishing Web Sites
Conveying Trust Serge Egelman.
CSCD 303 Essential Computer Security Fall 2017
A New Phishing Detection Approach
Presentation transcript:

CANTINA: A Content-Based Approach to Detecting Phishing Web Sites Yue Zhang University of Pittsburgh Jason I. Hong, Lorrie F. Cranor Carnegie Mellon University

Phishing Subject: eBay: Urgent Notification From Billing Department We regret to inform you that your eBay account could be suspended if you don’t update your account information.

Phishing is a Plague on the Internet Estimated 3.5 million people have fallen for phishing Estimated to cost $1-2.8 billion a year (and growing) 9255 unique phishing sites reported in June 2006 Easier (and safer) to phish than rob a bank

Strategies to Counter Phishing Make it invisible –Taking down phishing web pages –Filtering out phishing –Detecting phishing web pages (SpoofGuard, etc) Provide better user interfaces –Extended certificate verification –Anti-phishing toolbars (SpoofGuard, eBay, Netcraft, etc) Train the users –Embedded training (Kumaguru et al, CHI 2007) –Games (Sheng et al, SOUPS 2007)

Two Ways of Detecting Phishing Pages Human-verified Blacklists –No false positives, easy to implement, robust to new attacks –But tedious, slow to update, and not comprehensive –Only one toolbar found more than 60% phishing sites (Egelman et al, NDSS 2007) Heuristics –Fast to find new phishing sites (zero-day) –But false positives, may be fragile to new attacks –Not much work in this area –Our work contributes to the understanding of heuristics

Our Solution: CANTINA CANTINA uses a simple content-based approach –Examines content of a web page and creates a “fingerprint” –Sends that fingerprint as a query to a search engine –Sees if the web page in question is in the top search results If so, then we label it legitimate Otherwise, we label it phishing Nice properties: –Fast –Scales well –No maintenance by us (done by search engines) –Highly accurate

Talk Overview Problem Statement and Overview Using Robust Hyperlinks for Fingerprinting CANTINA Iteration #1 CANTINA Iteration #2 Conclusions

How Robust Hyperlinks Work Developed by Phelps and Wilensky to solve “404 not found” problem (D-Lib Magazine 2000) Add lexical signature to URLs –If link doesn’t work, then feed signature to search engine –Ex. How to generate useful signatures? –Term Frequency / Inverse Document Frequency (TF-IDF) –Their informal evaluation found using top five words as scored by TF-IDF was surprisingly effective

Adapting TF-IDF for Anti-Phishing Can same basic approach be used for anti-phishing? 1.Scammers often directly copy legitimate web pages or include keywords like name of legitimate organization Fake

Adapting TF-IDF for Anti-Phishing Can same basic approach be used for anti-phishing? 1.Scammers often directly copy legitimate web pages or include keywords like name of legitimate organization Real

Adapting TF-IDF for Anti-Phishing Can same basic approach be used for anti-phishing? 1.Scammers often directly copy legitimate web pages or include keywords like name of legitimate organization 2.With Google, phishing site should have low page rank APWG states that phishing sites alive 4.5 days Few sites link to phishing sites Hence, phishing sites unlikely to be in top search results Hypothesis: –CANTINA will be able to discriminate between legitimate and phishing sites quite well

How CANTINA Works (Iteration #1) Given a web page, calculate TF-IDF score for each word in that page Take five words with highest TF-IDF weights Feed these five words into a search engine (Google) If domain name of current web page is in top N search results, we consider it legitimate –N=30 worked well –No improvement by increasing N

Fake eBay, user, sign, help, forgot

Real eBay, user, sign, help, forgot

Evaluating Effectiveness of CANTINA In past work, built testbed to evaluate toolbars –Manual testing tedious and required too much pizza –See Egelman et al (NDSS 2007)

Evaluating CANTINA (Iteration #1) 100 phishing URLs from PhishTank.com –We used unverified URLs, manually verified them ourselves 100 legitimate URLs from another study on phishing –From 3Sharp, popular web sites, banks, etc Four conditions –Basic TF-IDF –Basic TF-IDF + domain name (ebay.com -> “ebay”) –Basic TF-IDF + ZMP (zero results means phishing) –Basic TF-IDF + domain name + ZMP

Evaluating CANTINA (Iteration #1) Good results False positives a little high Let’s call this Final TF-IDF

Talk Overview Problem Statement and Overview How Robust Hyperlinks Work CANTINA Iteration #1 CANTINA Iteration #2 Conclusions

How CANTINA Works (Iteration #2) Wanted to reduce false positives Added several heuristics from SpoofGuard and PILFER (see next talk) –Age of domain –Known images (logos) –Page is at suspicious URL or -) –Page contains suspicious links (see above) –IP Address in URL –Dots in URL (>= 5 dots) –Page contains text entry fields –TF-IDF

How CANTINA Works (Iteration #2) Used simple forward linear model to weight these –The more effective a heuristic, the larger the weight –Used 100 phishing URLs, 100 legitimate to find weights

Evaluating CANTINA (Iteration #2) Compared CANTINA to SpoofGuard and NetCraft –SpoofGuard uses all heuristics –NetCraft uses heuristics (?) and extensive blacklist 100 phishing URLs from PhishTank.com 100 legitimate URLs –35 sites often attacked (citibank, paypal) –35 top pages from Alexa (most popular sites) –30 random web pages from random.yahoo.com

Evaluating CANTINA (Iteration #2)

Discussion of Evaluation Good results again for CANTINA (iteration #2) –97% with 6% false positive, 89% with 1% false positive –1% false positive due to JavaScript phishing site CANTINA close to Netcraft (human-verified) Conducted another evaluation on URLs gathered from –Versus those from a phishing feed –CANTINA still pretty good, see paper for details

Discussion of CANTINA Overall Limitations –Does not work well for non-English web sites (TF-IDF) –System performance (querying Google each time) Early results from our latest work => low latency crucial CANTINA may be better for backend work than browser Attacks by criminals –Using images instead of words But has to look legitimate (no CAPTCHAs) –Invisible text –But phishing page still has to be in top search results Circumventing TF-IDF and PageRank (hard in practice?)

Conclusions CANTINA uses TF-IDF + search engines + heuristics to find phishing web sites –~97% true positives with 6% false positives –~89% true positives with 1% false positives Shifts problem of identifying phishing sites to a search engine problem Part of Carnegie Mellon’s effort to fight phishing –Better algorithms –Better user interfaces –Better training –See for more infohttp://cups.cs.cmu.edu

Acknowledgments NSF, ARO, CyLab Tom Phelps Related Conferences –SOUPS (July in Pittsburgh) –APWG e-Crime summit (Oct 4-5 in Pittsburgh)

Other Work by Our Research Group Algorithms –PILFER –CANTINA –Automated evaluation of toolbars (NDSS 2007) User Interfaces Training people not to fall for Phish –Embedded training system (CHI 2007) –Anti-phishing Phil game (SOUPS 2007)