Malicious URL Protection based on Attackers’ Habitual Behavioral Analysis Source: Computer & Security, Vol. 77, No. 1, pp. 790-806, Aug. 2018. Author: Sungjin Kim, Jinkook Kim, and Brent ByungHoon Kang Speaker: Ren-Kai Yang Date: 2019/02/14
Outline Introduction Related works Proposed scheme Performance evaluation Conclusions
Introduction(1/3) www.youtub.com www.facebookc.om Which one is the real Google site? 1. www.google.com 2. www.googIe.com 3. www.goog1e.com Malicious URL(Uniform Resource Locator) 植入網址(在網站建立新的網頁) 內容 程式碼(在HTML中植入javascript讓你網站的訪客重新導向到預先建立的惡意網站)
Introduction(2/3) Phishing email
Introduction(3/3) Source: https://blog.darkthread.net/blog/iframe-clickjacking/
Related works(1/4) Web-filtering
Related works(2/4) WHOIS
Related works(3/4) Alexa 101-1000
Related works(4/4) URL: 140.134.131.145/discussion/Query.php Feature-based URL: 140.134.131.145/discussion/Query.php Hostname Pathname Filename
Proposed scheme(1/4) Fuzzy-based similarly matching
Optimizing URLs to three malicious pools Proposed scheme(2/4) 204-222 (39%) 50-70 (19%) 110-121 (17%) 173-175 (10%) Feature extraction and grouping Training Optimizing URLs to three malicious pools 1. Domain pool 2. Path pool 3. Filename pool Classifier Based on similarity matching Domain Pathname Filename 211.24.196.113/images/index.html 110.34.196.114/PEG/ad/index1.html 110.34.196.115/PEG/js/index.php Classifier
Proposed scheme(3/4) 110.34.196.113/PEG/js/index2.html Similarity measure and modeling 110.34.196.113/PEG/js/index2.html Parsing 1. Domain string 2. Path string 3. Filename string Fuzzing Classifier Result Input URL A parsed URL Output New URLs (Malicious & Benign) Levenshtein distance Domain Pathname Filename 211.24.196.113 images index.html 110.34.196.114 PEG/js index1.html 110.34.196.115 PEG/ad index.php
Proposed scheme(4/4) Similarity measure and modeling(cont.) Malicious or Benign? 110.34.196.220/PEG/jslab/index2.html Domain 211.24.196.113 110.34.196.114 110.34.196.115 * Threshold = 0.9 Filename index.html index1.html index.php 110.34.196.113 (0.45) index2.html (0.9) 110.34.196.113 (0.72) index2.html (0.9) 110.34.196.113 (0.72) index2.html (0.54) Pathname images PEG/js PEG/ad * Levenshtein distance = 7 (0.93) PEG/jslab (0) PEG/jslab (0.66) PEG/jslab (0.55)
Performance evaluation(1/3) The average of the similarity probability ratio related to three finite feature sets.
Performance evaluation(2/3) Variation in detection rate according to manipulation of FW threshold. Same Different Same Different
Performance evaluation(3/3) Performance results Test Fuzzy Benign 573 6.885s Malicious 1301 56.083s Total 1874 62.968s
Conclusions Behaviors
Optimizing URLs to three malicious pools Training Optimizing URLs to three malicious pools 1. Domain pool 2. Path pool 3. Filename pool Classifier Based on similarity matching Dataset selection Feature extraction Malicious URLs Distribution URLs Test Step Parsing 1. Domain string 2. Path string 3. Filename string Fuzzing Classifier Result Input URL A parsed URL Output New URLs (Malicious & Benign)