Off the Hook: Real-Time Client- Side Phishing Prevention System July 28 th, 2016 University of Helsinki Samuel Marchal*, Giovanni Armano*, Kalle Saari*,

Slides:



Advertisements
Similar presentations
The Biosafety Clearing-House of the Cartagena Protocol on Biosafety Tutorial – BCH common features.
Advertisements

Internet Basics Instructors : Connie Hutchison & Christopher McCoy.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
© All Rights Reserved Web Browser A software application that enables you to view and interact with pages on the World Wide Web. Examples.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Design and Evaluation of a Real-Time URL Spam Filtering Service
PHAD- A Phishing Avoidance and Detection Tool Using Invisible Digital Watermarking By Sonali Batra Web 2.0 Security and Privacy 2014.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Server-Side vs. Client-Side Scripting Languages
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
Introduction to Web Based Application. Web-based application TCP/IP (HTTP) protocol Using WWW technology & software Distributed environment.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
Chapter 14 Introduction to HTML
Dawn Pedersen Art Institute. Introduction All your hard design work will suffer in anonymity if people can't find your site. The most common way people.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Samuvel Johnson nd MCA B. Contents  Introduction to Real-time systems  Two main types of system  Testing real-time software  Difficulties.
PhishScore: Hacking Phishers’ Minds
Copyright © cs-tutorial.com. Introduction to Web Development In 1990 and 1991,Tim Berners-Lee created the World Wide Web at the European Laboratory for.
Prevent Cross-Site Scripting (XSS) attack
SURF:SURF: Detecting and Measuring Search Poisoning Long Lu, Roberto Perdisci, and Wenke Lee Georgia Tech and University of Georgia.
Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University.
sound-effects-sound-clips-family-feud-download sound-effects-sound-clips-family-feud-download-
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Web Spoofing Steve Newell Mike Falcon Computer Security CIS 4360.
Internet Vocabulary CTE Intro. URL  The “address” of a website. Entering this address in the Address Bar will take you directly to a particular website.
CSCE 201 Web Browser Security Fall CSCE Farkas2 Web Evolution Web Evolution Past: Human usage – HTTP – Static Web pages (HTML) Current: Human.
SQL INJECTIONS Presented By: Eloy Viteri. What is SQL Injection An SQL injection attack is executed when a web page allows users to enter text into a.
BY : MUHAMMAD KHUZAIMI B. ISHAK 4 ADIL PUAN MAZITA INFORMATION AND COMMUNICATION OF TECHNOLOGY.
U NDERSTAND THE W EB AND D IGITAL C OMMUNICATIONS P ATHWAY 4.02 U NDERSTAND HOW W EBPAGES ARE CREATED AND USED.
Department of Computer Science Internet Performance Measurements using Firefox Extensions Scot L. DeDeo Professor Craig Wills.
Saphe surfing! 1 SAPHE Secure Anti-Phishing Environment Presented by Uri Sternfeld.
Phishing Website Detection & Target Identification October 30 th, 2015 Samuel Marchal*, Kalle Saari*, Nidhi Singh †, N.Asokan* *Aalto University - † Intel.
An Evaluation of Extended Validation and Picture-in-Picture Phishing Attacks Collin Jackson et. all Presented by Roy Ford.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Computer Basics Introduction CIS 109 Columbia College.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Introduction to Information Systems SSD1: Introduction to Information Systems Unit 1. The World Wide Web Unit 2. Introduction to Java and Object- Oriented.
Client-Side Malware Protection for your site
January 31st, 2017 Samuel Marchal*, Giovanni Armano*, Kalle Saari*,
Web Programming Language
Chapter 10: Web Basics.
Objective % Select and utilize tools to design and develop websites.
Internet Search What you need to know!.
What is the Internet? © EIT, Author Gay Robertson, 2016.
Site-Level Web Template Extraction
Software Applications for end-users
Based on Menu Information
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Processes The most important processes used in Web-based systems and their internal organization.
Objective % Select and utilize tools to design and develop websites.
Addresses on the Web.
HTML Vocabulary.
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
High Points CSCI 1710 Fall 2017.
browser search engine web page
Clustering Semantically Enhanced Web Search Results
CMP Creating Your Personal and Small Business Web Sites
What is the World Wide Web (www)
SEO Hand Book.
Web Servers (IIS and Apache)
Lesson 3 Web Browsers.
High Points CSCI 1210.
Presentation transcript:

Off the Hook: Real-Time Client- Side Phishing Prevention System July 28 th, 2016 University of Helsinki Samuel Marchal*, Giovanni Armano*, Kalle Saari*, Nidhi Singh †, N.Asokan* *Aalto University - † Intel Security

2 Outline Phishing detection system –minimal training data, language-independence, scalability, resilient to adaptive attack –highly accurate & fast (comparable to state-of-the-art) –locally computable Target identification mechanism –language-independent, fast –highly accurate (comparable to state-of-the-art) Browser Add-on –client-side computation, redirection to target

3 Outline Phishing detection system –minimal training data, language-independence, scalability, resilient to adaptive attack –highly accurate & fast (comparable to state-of-the-art) –locally computable Target identification mechanism –language-independent, fast –highly accurate (comparable to state-of-the-art) Browser Add-on –client-side computation, redirection to target

4 Phishing Website

5 Data Sources Starting URL Landing URL Redirection chain Logged links HTML source code: –Text –Title –HREF links –Copyright …

6 Phisher’s Control & Constraints Phishers have different level of control and are placed under some constraints while building a webpage: Control: External loaded content (logged links) and external HREF links are not controlled by page owner. Constraints: Registered domain name part of URL cannot be freely defined: constrained by registration (DNS) policies.

7 Conjectures By modeling control/constraints in a feature set we can improve identification of phishing webpages –Will have good generalizability, be language independent and circumvention will be difficult. By analyzing terms used in controlled and constrained sources we can identify the target of a phish

8 URL Structure Protocol = https FQDN = RDN = amazon.co.uk mld = amazon FreeURL = {www, /ap/signin?_encoding=UTF8} protocol://[subdomains.]mld.ps[/path][?query] FreeURL FQDN RDNFreeURL

9 Data Sources: Control & Constraints Control / Constraint separation: –RDNs are constrained in composition –FreeURL, text, title, etc. are not constrained –RDNs in redirection chain controlled (internal) by page owner –Others RDNs (HREFs and logged links) not controlled (external) Data sources separation: UnconstrainedConstrained Controlled Text Title Copyright Internal FreeURL Internal RDNs Uncontrolled External FreeURLExternal RDNs

10 Phishing Classification System Feature extraction (212) from data sources: –URL features (106) –Term usage consistency (66) –Usage of starting and landing mld (22) –RDN usage (13) –Webpage content (5) Gradient Boosting classification: –Feature selection and weighting –Robustness to over-fitting (generalizability)

11 Classification Performance (language independence) Classifier Training: –4,531 English legitimate webpages (Intel Security) –1,036 phishing webpages (PhishTank) Assessment: –Legitimate webpages (Intel Security): 100,000 English 10,000 each in French, German, Italian, Portuguese and Spanish –1,216 phishing webpages (PhishTank)

12 Classification Performance (language independence) ROC CurvePrecision vs. Recall 100,000 English legitimate / 1,216 phishs (≈ real world repartition) PrecisionRecallFP RateAUCAccuracy

13 Outline Phishing detection system –minimal training data, language-independence, scalability, resilient to adaptive attack –highly accurate & fast (comparable to state-of-the-art) –locally computable Target identification mechanism –language-independent, fast –highly accurate (comparable to state-of-the-art) Browser Add-on –client-side computation, redirection to target

14 Target Identification Target identification: identify a set of terms representing the impersonated service and brand: keyterms Assumption: keyterms appear in several data sources Query search engine with top keyterms to identify: –If the website is legitimate (appearing in top search results) –The potential targets of the phishing website Intersect sets of terms extracted from different visible data sources (title, text, starting/landing URL, Copyright, HREF links)

15 Target Identification Performance 600 phishing webpages with identified target: –(unverified phishes listed by PhishTank; identification done manually) TargetsIdentifiedUnknownMissedSuccess rate Top % Top % Top % Complementarity with phishing detection: –53 mislabeled legitimate webpages ( FP rate) –39 identified as legitimate in target identification Reduction of FP rate to (0.01%)

16 Outline Phishing detection system –minimal training data, language-independence, scalability, resilient to adaptive attack –highly accurate & fast (comparable to state-of-the-art) –locally computable Target identification mechanism –language-independent, fast –Highly accurate (comparable to state-of-the-art) Browser Add-on –client-side computation, redirection to target

17 Add-on Implementation Client-side implementation –Privacy friendly –Resilient to adaptive attacks Multi-browser –Chrome, Firefox, Safari (in progress) Cross platform –Windows (>= 8), Mac OSX (>= 10.8), Ubuntu (>= 12.04) Phishing warning –Redirection to target –Suspicious webpage displayed (user education)

18 Phishing warning

19 Performance Memory usage –256 MB Impact on Web surfing –Phishing webpages: Interaction blocked in < 0.5 seconds Warning displayed (and target identified) in < 2 seconds –Legitimate webpages: None (albeit false positives)

20 Summary Phishing website detection system: –Language independent / resilient to adaptive attacks –Fast ( < 0.5 second per webpage) –> 99.9% accuracy with < 0.05% false positives Target identification system: –Fast ( < 2 seconds per webpage) –Success rate > 90% for 1 target / 97.3% for a set of targets Phishing detection add-on: –Guidance towards likely target –Privacy friendly (client-side-only implementation)

21 Questions ?