Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Phishing and Pharming New Identity Theft Threats Presentation by Jason Guthrie.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
URL Obscuring COEN 152/252 Computer Forensics  Thomas Schwarz, S.J
Phishing – Read Behind The Lines Veljko Pejović
Phishing, Pharming, and Spam Margaret StewartTuesday, Oct. 21, 2006.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
1 Archive-It Training University of Maryland July 12, 2007.
Norman SecureSurf Protect your users when surfing the Internet.
Examining the Effectiveness and Techniques of the Anti-Phishing Technology in Leading Web Browsers and Security Toolbars. Wesley W. Owen
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
Phishing and Intrusion Prevention Tod Beardsley, TippingPoint (a division of 3Com), 02/15/06 – IMP-201.
PHISHING AND SPAM INTRODUCTION There’s a good chance that in the past week you have received at least one that pretends to be from your bank,
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
PhishScore: Hacking Phishers’ Minds
Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
WEB SPOOFING by Miguel and Ngan. Content Web Spoofing Demo What is Web Spoofing How the attack works Different types of web spoofing How to spot a spoofed.
KAIST Web Wallet: Preventing Phishing Attacks by Revealing User Intentions Min Wu, Robert C. Miller and Greg Little Symposium On Usable Privacy and Security.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Using Social Networks to Harvest Addresses Reporter: Chia-Yi Lin Advisor: Chun-Ying Huang Mail: 9/14/
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Adam Soph, Alexandra Smith, Landon Peterson. Phishing is a way of attempting to acquire information such as usernames, passwords, and credit card details.
Lecture 10: 9/26/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Phishing Pharming Spam. Phishing: Definition  A method of identity theft carried out through the creation of a website that seems to represent a legitimate.
Web Application Security ECE ECE Internetwork Security What is a Web Application? An application generally comprised of a collection of scripts.
Web Spoofing Steve Newell Mike Falcon Computer Security CIS 4360.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
BY : MUHAMMAD KHUZAIMI B. ISHAK 4 ADIL PUAN MAZITA INFORMATION AND COMMUNICATION OF TECHNOLOGY.
URL Obscuring COEN 252 Computer Forensics  Thomas Schwarz, S.J
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
BeamAuth : Two-Factor Web Authentication with a Bookmark 14 th ACM Conference on Computer and Communications Security Ben Adida Presenter : SJ Park.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
Search Tools and Search Engines Searching for Information and common found internet file types.
Return to the PC Security web page Lesson 6: Improving Security.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Post-Ranking query suggestion by diversifying search Chao Wang.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Identifying Suspicious URLs: An Application of Large-Scale Online Learning Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science & Engineering.
Data mining in web applications
Search Engine Optimization
Learning to Detect and Classify Malicious Executables in the Wild by J
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Web Mining Research: A Survey
What is Phishing? Pronounced “Fishing”
Phishing “In computing, phishing (also known as carding and spoofing) is a form of social engineering, characterized by attempts to fraudulently acquire.
Doxing Phishers: Analyzing Phishing Attacks from Lure to Attribution
Presentation transcript:

Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)

 Large-Scale Automatic Classification of Phishing Pages, Colin Whittaker, Brian Ryner, Marria Nazif, NDSS '10, /19/2015Slide 2 (of 32)

 Introduction  Phishing Classifier Infrastructure  Evaluation  Conclusion 9/19/2015Slide 3 (of 32)

 Phishing is form of identity theft  social engineering techniques  sophisticated attack vectors  To harvest financial information from unsuspecting consumers.  Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page. 9/19/2015Slide 4 (of 32)

 Overall System Design  Our system classifies web pages submitted by end users and collected from Gmail’s spam filters.  These features describe the composition ▪ the web page’s URL ▪ the hosting of the page ▪ the page’s HTML content as collected by a crawler 9/19/2015Slide 5 (of 32)

 Classification Workflow  The first process extracts features about the URL of the page.  The second process obtains domain information about the page and crawls it  The final process assigns the page a score based on the collected features representing the probability that the page is phishing 9/19/2015Slide 6 (of 32)

 Candidate URL Collection  We receive new potential phishing URLs in reports ▪ from users of our blacklist ▪ from spam messages collected by Gmail 9/19/2015Slide 7 (of 32)

 URL Feature Extraction  The first process in the workflow, the URL Feature Extractor, looks only at the URL of the page to determine features.  If it matches a whitelist of high profile, safe sites, then the URL Feature Extractor drops the URL from the workflow entirely.  We manually compile this whitelist of 2778 sites 9/19/2015Slide 8 (of 32)

 URL Feature Extraction  One feature this process extracts is whether the URL contains an IP address for its hostname. 9/19/2015Slide 9 (of 32)

 URL Feature Extraction ▪ Another feature this process extracts is whether the page has many host components ▪ Phishers commonly use a long hostname, prepending an authentic-sounding host to their fixed domain name, to confuse viewers into believing that the page is legitimate. 9/19/2015Slide 10 (of 32)

 URL Feature Extraction  Phishers often include characteristic strings in their URLs to mislead viewers.  These can include the trademarks of the phishing target, like “abbeynational” in the example above, or more general phrases associated with phishing targets, like “login”.  The feature extractor transforms each of these tokens into a boolean feature, such as “The path contains the token ‘login.’” 9/19/2015Slide 11 (of 32)

 Fetching Page Content  The URL Feature Extractor also collects URL metadata, including PageRank, from Google proprietary infrastructure  We also use a domain reputation score computed by the Gmail anti spam system as a feature. ▪ This score is roughly the percentage of s from a domain which are not spam 9/19/2015Slide 12 (of 32)

 Hosting and Page Feature Extraction  The Content Fetcher process crawls the page and gathers its hosting information. ▪ It records the returned IPs, name servers, and name server IPs. ▪ It also geo locates these IPs, recording the city, region, and country 9/19/2015Slide 13 (of 32)

Machine Learning and Bioinformatics Laboratory  Hosting and Page Feature Extraction  The Content Fetcher sends the URL to a pool of headless web browsers to render the page content.  After the browser renders the page, the Content Fetcher receives and records the page HTML, as well as all iframe, image, and javascript content embedded in the page 9/19/2015Slide 14 (of 32)

Machine Learning and Bioinformatics Laboratory  Page Classification  To compute the score for the page in log odds, the classifier combines these values using a logistic regression  The score translates to the computed probability that the page is phishing 9/19/2015Slide 15 (of 32)

 Page Classification  Before the classifier automatically blacklists the page, it checks to make sure that the page does not have a high PageRank 9/19/2015Slide 16 (of 32)

 Evaluation Dataset  First ▪ contains data collected between April 16, 2009 and July 14, 2009 with labes from July 15, ▪ examine our selected features and train our evaluation models  Second ▪ collected during the first two weeks of August, 2009, as a validation dataset. 9/19/2015Slide 17 (of 32)

9/19/2015Slide 18 (of 32)

9/19/2015Slide 19 (of 32)

9/19/2015Slide 20 (of 32)

9/19/2015Slide 21 (of 32)

Machine Learning and Bioinformatics Laboratory 9/19/2015Slide 22 (of 32)

Machine Learning and Bioinformatics Laboratory 9/19/2015Slide 23 (of 32)

9/19/2015Slide 24 (of 32)

9/19/2015Slide 25 (of 32)

9/19/2015Slide 26 (of 32)

9/19/2015Slide 27 (of 32)

9/19/2015Slide 28 (of 32)

9/19/2015Slide 29 (of 32)

9/19/2015Slide 30 (of 32)

 we describe our large-scale system for automatically classifying phishing pages which maintains a false positive rate below 0.1%.  Our classification system examines millions of potential phishing pages daily in a fraction of the time of a manual review process 9/19/2015Slide 31 (of 32)

9/19/2015Slide 32 (of 32)