Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17Data.

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

Computer Security Lab Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada A Novel Approach of Mining Write-Prints.
11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Phishing and Pharming New Identity Theft Threats Presentation by Jason Guthrie.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
Detection of Internet Scam Using Logistic Regression
Examining the Effectiveness and Techniques of the Anti-Phishing Technology in Leading Web Browsers and Security Toolbars. Wesley W. Owen
Automated malware classification based on network behavior
URLDoc: Learning to Detect Malicious URLs using Online Logistic Regression Presented by : Mohammed Nazim Feroz 11/26/2013.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
B OTNETS T HREATS A ND B OTNETS DETECTION Mona Aldakheel
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Dissecting One Click Frauds Authors: Nicolas Christin, Sally S. Yanagihara, Keisuke Kamataki Proceedings of the ACM CCS 2010 Reporter: Jing Chiu Advisor:
PhishScore: Hacking Phishers’ Minds
Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
FluXOR: Detecting and Monitoring Fast-Flux Service Networks Emanuele Passerini, Roberto Paleari, Lorenzo Martignoni, and Danilo Bruschi 5th international.
AUTHORS: ASAF SHABTAI, URI KANONOV, YUVAL ELOVICI, CHANAN GLEZER, AND YAEL WEISS "ANDROMALY": A BEHAVIORAL MALWARE DETECTION FRAMEWORK FOR ANDROID.
Anti-Phishing Approaches Lifeng Hu
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17 1 Data Mining and Machine Learning Lab.
Learning URL Patterns for Webpage De-duplication Authors: Hema Swetha Koppula… WSDM 2010 Reporter: Jing Chiu /12/5.
Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Predictive Blacklisting as an Implicit Recommendation System Authors: Fabio Soldo, Anh Le, Athina Markopoulou IEEE INFOCOM 2010 Reporter: Jing Chiu Advisor:
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Phishing & Pharming Methods and Safeguards Baber Aslam and Lei Wu.
Corrupted DNS Resolution Paths: The Rise of a Malicious Resolution Authority Reporter: Jing Chiu Adviser: Yuh-Jye Lee 2016/3/191Data Mining & Machine Learning.
Identifying Suspicious URLs: An Application of Large-Scale Online Learning Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science & Engineering.
Dec 14, 2014, Harvard University
A Simple Approach for Author Profiling in MapReduce
Experience Report: System Log Analysis for Anomaly Detection
Under the Shadow of sunshine
Learning to Detect and Classify Malicious Executables in the Wild by J
Domain Reputation Hussien Othman.
MALICIOUS URL DETECTION For Machine Learning Coursework
Erasmus University Rotterdam
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Malware and how to defend against it
Source: Computer & Security, Vol. 77, No. 1, pp , Aug
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Text Mining Application Programming Chapter 9 Text Categorization
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Presentation transcript:

Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17Data Mining and Machine Learning Lab.1

Paper Information  Authors:  Aaron Blum (University of Alabama, Birmingham)  Brad Wardman (University of Alabama, Birmingham)  Thamar Solorio (University of Alabama, Birmingham)  Source:  ACM Artificial Intelligence Security Workshop 3 rd, /3/17Data Mining and Machine Learning Lab.2

Outline  Introduction  Related Work  Approach  Data  Evaluation  Conclusion 2011/3/17Data Mining and Machine Learning Lab.3

Introduction  Phishing  A cybercrime comes from spammed s and fraudulent websites  Entice victims to provide sensitive information  The information is used to steal identities or gain access to money  Characteristics  Highly dynamic environment  Model need to be updated frequently  New ideas  Combine online learning with content-inspection based approach  Model trained only by largely lexical features (without host based features)  Provide results to show the performance of URL inspection based detection is as well as content inspection based detection 2011/3/17Data Mining and Machine Learning Lab.4

Related Work  Content based Phishing URL Detection  Use the similarity between the content files to detect phishing websites  Purely URL based Malicious URL Detection  Use host information and URL lexical features with online learning algorithms  PhishNet  Extend the usability of blacklists  Domain Blacklisting  Expand blacklist by the DNS zone file data and WHOIS information 2011/3/17Data Mining and Machine Learning Lab.5

Approach  Feature Extraction  Delimiters: “/”, ”?”, ”.”, ”=” and “_”  Bigram combination  Lexical feature groups Lexical feature groups  Learning algorithm  Confident Weighted Algorithm  Updating model by different weights of the features’ occurrence 2011/3/17Data Mining and Machine Learning Lab.6

Approach (cont.)  MD5 Matching  Use files’ MD5 checksum to check files similarity  Easy to evade ( by varying the content)  Examples Examples  Deep MD5 Matching  Download all the associated content files  Compare the similarity between two websites’ content files by Kulczynski 2 coefficient 2011/3/17Data Mining and Machine Learning Lab.7

Data  Data Source  UAB Phishing Data Mine  Two and half a year collecting time  Benigns may look “phishy” (e.g.)e.g.  9,506unique domains  25,203 URLs (6,114 malicious)  Cyveillance  18,990 unique domains  34,234 URLs (all malicious)  All feeds are fully de-duplicated  Datasets  UAB Feeds  Cyveillance full  Cyveillance abridged  Mixed 2011/3/17Data Mining and Machine Learning Lab.8

Data (cont.)  Percentage of total URLs vs. Individual Domains 2011/3/17Data Mining and Machine Learning Lab.9

Evaluation  Experiment setting  Training and testing set was conducted on daily batches  Training initially conducted on UAB data  Model will be updated by a daily URL blacklist/whitelist feed  False positive and false negative error rates were computed every prediction 2011/3/17Data Mining and Machine Learning Lab.10

Evaluation(cont.) 2011/3/17Data Mining and Machine Learning Lab.11

Evaluation(cont.) 2011/3/17Data Mining and Machine Learning Lab.12

Evaluation(cont.) 2011/3/17Data Mining and Machine Learning Lab.13

Conclusion  Lexical features based learning provide robust performance by CW algorithm  Quality diverse training data could approve a accuracy higher than 97%  For proposed system  Training data could be collected from any blacklists  Easy implement and robust performance 2011/3/17Data Mining and Machine Learning Lab.14

Thanks for your attention  Q&A? 2011/3/17Data Mining and Machine Learning Lab.15

Lexical Feature Group 2011/3/17Data Mining and Machine Learning Lab.16

URLs including the recipient’s 2011/3/17Data Mining and Machine Learning Lab.17

Data in UAB Phishing Data Mine 2011/3/17Data Mining and Machine Learning Lab.18