Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab.

Slides:



Advertisements
Similar presentations
Dynamics of Online Scam Hosting Infrastructure
Advertisements

1 Network-Level Spam Detection Nick Feamster Georgia Tech.
PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA.
ECG Signal processing (2)
Computer Security Lab Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada A Novel Approach of Mining Write-Prints.
11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
FRAppE: Detecting Malicious Facebook Applications
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Locally Constraint Support Vector Clustering
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Automated malware classification based on network behavior
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A.C. Chen ADL M Zubair Rafique Muhammad Khurram Khan Khaled Alghathbar Muddassar Farooq The 8th FTRA International Conference on Secure and.
URLDoc: Learning to Detect Malicious URLs using Online Logistic Regression Presented by : Mohammed Nazim Feroz 11/26/2013.
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Dissecting One Click Frauds Authors: Nicolas Christin, Sally S. Yanagihara, Keisuke Kamataki Proceedings of the ACM CCS 2010 Reporter: Jing Chiu Advisor:
PhishScore: Hacking Phishers’ Minds
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
FluXOR: Detecting and Monitoring Fast-Flux Service Networks Emanuele Passerini, Roberto Paleari, Lorenzo Martignoni, and Danilo Bruschi 5th international.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Learning to Detect Malicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science & Engineering UC San Diego Presentation for Google.
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.
Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17Data.
On Detecting Pollution Attacks in Inter-Session Network Coding Anh Le, Athina Markopoulou University of California, Irvine.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Learning URL Patterns for Webpage De-duplication Authors: Hema Swetha Koppula… WSDM 2010 Reporter: Jing Chiu /12/5.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Predictive Blacklisting as an Implicit Recommendation System Authors: Fabio Soldo, Anh Le, Athina Markopoulou IEEE INFOCOM 2010 Reporter: Jing Chiu Advisor:
1.  Usability study of phishing attacks & browser anti-phishing defenses – extended validation certificate.  27 Users in 3 groups classified 12 web.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Il-Ahn Cheong Linux Security Research Center Chonnam National University, Korea.
Corrupted DNS Resolution Paths: The Rise of a Malicious Resolution Authority Reporter: Jing Chiu Adviser: Yuh-Jye Lee 2016/3/191Data Mining & Machine Learning.
SMOOTHWALL FIREWALL By Nitheish Kumarr. INTRODUCTION  Smooth wall Express is a Linux based firewall produced by the Smooth wall Open Source Project Team.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Identifying Suspicious URLs: An Application of Large-Scale Online Learning Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science & Engineering.
An ANN approach to identify malicious URLs ECE 539 – Final Project Jayneel Gandhi.
Under the Shadow of sunshine
Learning to Detect and Classify Malicious Executables in the Wild by J
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
A New Phishing Detection Approach
Open-Category Classification by Adversarial Sample Generation
CS 142 Lecture Notes: Security Attacks: Phishing
CS 142 Lecture Notes: Security Attacks: Phishing
iSRD Spam Review Detection with Imbalanced Data Distributions
Botnet Detection by Monitoring Group Activities in DNS Traffic
Using Link Information to Enhance Web Page Classification
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Presentation transcript:

Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17 1 Data Mining and Machine Learning Lab.

 Authors:  Anh Le, Athina Markopoulou (University of California, Irvine)  Michalis Faloutsos (University of California, Riverside)  Source:  to appear in IEEE INFOCOM 2011 Mini Conference, Shanghai, China, April 10-15, (poster, tech report) 2011/3/17 2 Data Mining and Machine Learning Lab.

 Introduction  Dataset and Feature Extraction  Classification Algorithms  Evaluation Results  System Deployment  Conclusion 2011/3/17 3 Data Mining and Machine Learning Lab.

 “How well can one detect phishing URLs using only lexical features compared to using full features?”  PhishDef Properties:  High accuracy: 96%-97%  Light-weight: Low latency Imposes a modest overhead  Proactive approach As opposed to reactively relying on blacklist  Resilience to noise 95%-86% accuracy when there is 5%-45% noise 2011/3/17 4 Data Mining and Machine Learning Lab.

 Dataset  Malicious URLs PhishTank MalwarePatrol  Legitimate URLs Yahoo Directory Open Directory (DMOZ)  External Feature Collection  WHOIS  Team Cymru 2011/3/17 5 Data Mining and Machine Learning Lab.

 Feature Extraction  Automatically selected features Delimiters: ‘/’, ’?’, ‘.’, ‘=‘, ‘_’, ‘&’ and ‘-’. Four parts: Domain Name Directory File Name Argument  Obfuscation-resistant lexical features Four different URL obfuscation techniques Five categories of hand-selected lexical features 2011/3/17 6 Data Mining and Machine Learning Lab.

 (I) Obfuscating the host with an IP address  (II) Obfuscating the host with another domain  (III) Obfuscating with large host names  (IV) Domain unknown or misspelled 2011/3/17 7 Data Mining and Machine Learning Lab.

 Features related to the full URL  Length of the URL (Type II)  Number of dots in the URL (Type II)  Blacklisted words (Type IV) confirm, account, banking, secure, ebayisapi, webscr, login and signin Paypal, free, lucky and bonus  Features related to the domain name  Length of the domain name (Type III)  IP or port number is used in the domain name (Type I)  Number of tokens of the domain name (Type III)  Number of hyphens used in the domain name (Type III)  The length of the longest token (Type III)  Features related to the directory  Length of the directory (Type II)  Number of sub-directory tokens (Type II)  Length of the longest sub-directory token (Type II)  Maximum number of dots and other delimiters used in a sub-directory token (Type II) 2011/3/17 Data Mining and Machine Learning Lab. 8

 Features related to the file name  Length of the file name (Type II)  Number of dots and other delimiters used in the file name (Type II)  Features related to the argument part  Length of the argument part  Number of variables  Length of the longest variable value  The maximum number of delimiters used in a value  Summary of dataset Summary of dataset 2011/3/17 Data Mining and Machine Learning Lab. 9

 Batch Learning  Support Vector Machine (SVM)  Online Learning  Online Perception (OP)  Confidence Weighted (CW)  Adaptive Regularization of Weights (AROW) 2011/3/17 Data Mining and Machine Learning Lab. 10

 Batch-based vs. Online algorithms  SVM vs. AROW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 11

 Lexical Features vs. Full Features  OP, CW and AROW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 12

 Obfuscation-Resistant Lexical Features  Performance of AROW with/without OR features after the last URL 2011/3/17 Data Mining and Machine Learning Lab. 13

 The resilience of AROW to noisy data  AROW and CW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 14

 Minimum/Maximum URL Similarity Distance distribution 2011/3/17 Data Mining and Machine Learning Lab. 15

2011/3/17 Data Mining and Machine Learning Lab. 16

2011/3/17 Data Mining and Machine Learning Lab. 17  Proposed PhishDef – a proactive defense scheme of phishing attacks  PhishDef detecting phishing URLs on-the-fly  PhishDef use only lexical features  High accuracy (97%)  Low overhead  Resilient to noisy training data  Firefox and Chrome add-ons implementation

 Q&A? 2011/3/17 Data Mining and Machine Learning Lab. 18

2011/3/17 Data Mining and Machine Learning Lab. 19