1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
C MU U sable P rivacy and S ecurity Laboratory Anti-Phishing Phil The Design and Evaluation of a Game That Teaches People Not to.
By Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna Network and Distributed System Security(NDSS ‘07)
Design and Evaluation of a Real-Time URL Spam Filtering Service
PHAD- A Phishing Avoidance and Detection Tool Using Invisible Digital Watermarking By Sonali Batra Web 2.0 Security and Privacy 2014.
Phishing and Pharming New Identity Theft Threats Presentation by Jason Guthrie.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
CANTINA: A Content-Based Approach to Detecting Phishing Web Sites Yue Zhang University of Pittsburgh Jason I. Hong, Lorrie F. Cranor Carnegie Mellon University.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
CMU Usable Privacy and Security Laboratory A Brief History of Semantic Attacks or How Not to Get Screwed Online Serge Egelman.
Phishing – Read Behind The Lines Veljko Pejović
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
Usable Privacy and Security Jason I. Hong Carnegie Mellon University.
Radoncssi.org Google based IT infrastructure Alf Siochi.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
The Internet & Web Browsers Business Webpage Design Kelly Seale.
Internet Explorer Opportunities For Partners Margaret Cobb Product Manager IE Group Microsoft Corporation.
Examining the Effectiveness and Techniques of the Anti-Phishing Technology in Leading Web Browsers and Security Toolbars. Wesley W. Owen
Presented By Jay Dani.  Web Spoofing is a security attack that allows an adversary to observe and modify all web pages sent to the victim's machine,
 Internet vs WWW  Pages vs Sites  How the Internet Works  Getting a Web Presence.
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
User Interfaces and Algorithms for Fighting Phishing Jason I. Hong Carnegie Mellon University.
Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
WEB SPOOFING by Miguel and Ngan. Content Web Spoofing Demo What is Web Spoofing How the attack works Different types of web spoofing How to spot a spoofed.
Lecturer: Ghadah Aldehim
KAIST Web Wallet: Preventing Phishing Attacks by Revealing User Intentions Min Wu, Robert C. Miller and Greg Little Symposium On Usable Privacy and Security.
Internet Security facilities for secure communication.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
CMU Usable Privacy and Security Laboratory Phinding Phish: An Evaluation of Anti-Phishing Toolbars Yue Zhang, Serge Egelman, Lorrie.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Introduction To Internet
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Anti-Phishing Approaches Lifeng Hu
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Phishing Pharming Spam. Phishing: Definition  A method of identity theft carried out through the creation of a website that seems to represent a legitimate.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009.
Web Attacks— Offense… The Whole Story Yuri & The Cheeseheads Mark Glubisz, Jason Kemble, Yuri Serdyuk, Kandyce Giordano.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Week 10-11c Attacks and Malware III. Remote Control Facility distinguishes a bot from a worm distinguishes a bot from a worm worm propagates itself and.
CCT355H5 F Presentation: Phishing November Jennifer Li.
BY : MUHAMMAD KHUZAIMI B. ISHAK 4 ADIL PUAN MAZITA INFORMATION AND COMMUNICATION OF TECHNOLOGY.
Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17Data.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
C MU U sable P rivacy and S ecurity Laboratory Protecting People from Phishing: The Design and Evaluation of an Embedded Training.
Phishing & Pharming. 2 Oct to July 2005 APWG.
Usable Privacy and Security and Mobile Social Services Jason Hong
1.  Usability study of phishing attacks & browser anti-phishing defenses – extended validation certificate.  27 Users in 3 groups classified 12 web.
The Internet. Important Terms Network Network Internet Internet WWW (World Wide Web) WWW (World Wide Web) Web page Web page Web site Web site Browser.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
1 Phinding Phish : Evaluating Anti- Phishing Tools Yue Zhang,Jason Hong (2007) Carnegie Mellon University.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
By Collin Donaldson. Hacking is only legal under the following circumstances: 1.You hack (penetration test) a device/network you own. 2.You gain explicit,
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
2.2 Internet Basics.
ISYM 540 Current Topics in Information System Management
CANTINA: A Content-Based Approach to Detecting Phishing Web Sites
Conveying Trust Serge Egelman.
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
A New Phishing Detection Approach
Presentation transcript:

1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor

CS710 | KAIST Agenda  Phishing Attacks  Motivation & Goal  Relative Work  CANTINA  Evaluation  Conclusion 2

CS710 | KAIST Phishing Attacks(1/2)  The Act of stealing personal information via the internet for the purpose of committing financial fraud  Create a faked site similar to original sites like bank  Send to users using variable methods Spam , XSS vulnerabilities, Malware …  Technical issues  URL Obfuscation Similar domain, Encoding URL…  DNS hijacking Modifying hosts file, DNS server setting…  Malware BHO(Browser Helper Object), Browser Toolbar, Key logger… 3

CS710 | KAIST Phishing Attacks(2/2)  Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages  Similar to original web site  Often contain brand names and other terms that are common on a given web page  Owner’s brands 4

CS710 | KAIST Motivation & Goal  Phishing is a rapidly growing problem with 9,255 unique phishing sites reported in 2006  84 Anti-phishing toolbars  Low accuracies  There is a strong need for better automated detection algorithms  A novel content-based approach for detecting phishing web sites.  Accomplish the accuracy more than existing approach 5

CS710 | KAIST Related work(1/3)  Anti-Phishing has four categories  Why People Fall for Phishing Attacks? Have examined the reasons that people fall for phishing attacks  Educating people about Phishing Attacks Focused on online training materials, testing and situated learning  Anti-Phishing User Interface Focused on the development of better user interface for anti-phishing tools  Automated Detection of Phishing 6

CS710 | KAIST Relative work(2/3)  Anti-Phishing user interface  Toolbar-based approach  Browser extensions Dynamic Security Skins Web Wallet 7

CS710 | KAIST Relative Work(3/3)  Automated detection of phishing  To use heuristics to judge whether a page has phishing characteristics. Host name, domain name, URLs,…  To use a blacklist that lists reported phishing URLs 8

CS710 | KAIST CANTINA | Basic Concept  Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages  Contain brand names and terms of legitimate pages  Robust Hyperlinks  To find a broken links  Add lexical signature to URLs If link doesn’t work, then feed signature to search engine Ex.  TF/IDF (Term frequency/Inverse document frequency)  Frequency based algorithm.  Basic algorithm for search engine comparing and classifying documents A term has a high TF-IDF weight by having a high term frequency in a given document 9

CS710 | KAIST CANTINA | Basic Concept 10 Web page Calculate TF-IDF weight of each term Take the five terms with highest TF-IDF weight Search top file term(term1+term2..) using google Compare the domain name with google search results Phishing site : domain name of current page do not match the domain name of the N top search results (30)

CS710 | KAIST CANTINA | Basic Concept eBay, user, sign, help, forgot Faked Page TF/IDF Top 5 :

CS710 | KAIST CANTINA | Basic Concept eBay, user, sign, help, forgot Real Page TF/IDF Top 5 :

CS710 | KAIST CANTINA | Basic Concept

CS710 | KAIST CANTINA | Additional Solutions  Basic CANTINA has a number of false positive  Solutions  Add the current domain name to the lexical signature  ZMP(Zero results Means Phishing) Google returns zero search results –Meaningless domain(e.g., “u-s-j.be”)  Larger set of heuristics based on related work From existing approach (e.g., SpoofGuard, PILFER) Age of Domain, Known Images, Suspicious URL,… 14

CS710 | KAIST Evaluation | Effectiveness #1(1/2)  Four conditions  Basic TF-IDF  Basic TF-IDF + domain name  Basic TF-IDF + ZMP  Basic TF-IDF + domain + ZMP  100 phishing URLs and 100 legitimate URLs  Phishing URLs : PhishTank.com  Legitimate URLs : From previous study 15

CS710 | KAIST Evaluation | Effectiveness #1(2/2) 16  Basic TF-IDF + ZMP + domain  False positives a little high  Final TF-IDF

CS710 | KAIST Evaluation | Effectiveness #2(1/2)  Want to reduce false positives  Combining several heuristics method 17

CS710 | KAIST Evaluation | Effectiveness #2(2/2)  Determining the best weights for these heuristics is a typical classification problem.  Use a simple forward linear model  Used 100 phishing URLs, 100 legitimate to find weights 18

CS710 | KAIST Evaluation | Effectiveness #3(1/2)  To evaluate the effectiveness of Final-TF-IDF, Final-TD- IDF+heuristics, SpoofGuard, and Netcraft  SpoofGuard : the highest true positive rate Relies entirely on heuristics  Netcraft : one of the best toolbars overall Uses a combination of heuristics and an extensive blacklist.  100 phishing URLs from PhishTank.com  100 legitimate URLs  35 sites often attacked (citibank. Papayl)  35 top pages from Alexa ( most popular sites)  30 random web pages from random.yahoo.com 19

CS710 | KAIST Evaluation | Effectiveness #3(2/2) 20  Reduced false positives from 6% to 1% by combining Final-TF-IDF with simple heuristics  But, true positive was decreased

CS710 | KAIST Discussion  Limitations  Does not apply to non-English web sites  System Performance Depend on performance of Google search engine  Attacks by criminals  use image instead of words  Add invisible text  Circumventing TF-IDF and PageRank Using “Google Bombs”  Attempt a DoS attack on Google 21

CS710 | KAIST Conclusion  CANTINA uses TF-IDF + search engines + heuristics to find phishing web sites  97% true positives with 6% false positives  89% true positives with 1% false positives  Shifts problem of identifying phishing sites to a search engine problem 22

CS710 | KAIST 23 Q&A