Accurately Detect Parked Domain Typo- squatting Attacks Mishari Almishari and Xiaowei Yang University of California, Irvine Donald Bren School of Information and Computer Sciences Computer Science Department malmisha,
Introduction Typo-Squatting refers to the act of registering domain names that are typographical errors of other popular domain names (target domains) to hijack the traffic intended to those popular domain names Hijacking for malicous purposes Hijacking for financial purposes
Goals & Contributions Accurately identify typo-squatting domains Measure the amount of traffic hijacked by squatters Build a system that would reduce the amount of traffic to such domains
Methodology Identifying Typos Use edit distance of 1 as our typo definition Less controversial in terms of typo definition Users are more prone to make a single error than 2 or more A study shows that 90-95% of spelling errors are of 1 mistake Nevertheless, extending the typo definition is worth working at.
Methodology Identifying hijacking attempts Is being a typo domain enough? No, 55% are not squatting What are the common hijacking indicators? Parked Domain / Ads Listing (88.5%) Offensive Adult Content (3.1%) Domain For Sale (2.1%) Forwarding To Another Domain (8.3%) How to identify Parked Domain? Use Machine Learning Classifier (96%) (100%)
Experiment Measure amount of hijacked traffic UCI DNS traces of 8 months 500 popular domains from Alexa Website Steps Pre-processing of DNS queries Finding Typo Domains Finding Typo Squatting Domains
Measurement Results Typo-squatting Hits Total of 23,989 Ranges from 1,675 to 3,621 Typo-squatting Domains Total of 1,786 domains Ranges from 347 to 530 domains
Measurement Results Maximum Hits to Typo- squatting Domains Could reach up to 649 hits for one domain in on month Average Hijack Ratio Low 0.33% to 1%
Measurement Results Maximum Hijack Ratio From 82% to 100% Most squatted Domains Most hijacked is 2 nd Most hijacked is
Measurement Results Typo Characterization 14% of Cat 1 is missing dot 66% of Cat 2 is from neighbor keys 26% of Cat 2 is the same as one before or after 42 % is from neighbor keys Typo CategoryRatio Missing One Character 32% Adding One Character 33% Substituting One Character 22% Swapping Two Characters 13%
Comparison With Other Typo- correctors Google & Yahoo typo-correction web services 15% (12%) missed by Google (Yahoo) 99.6% (98%) of what is missed are real parked domains 23%(31%) fwd to the same target domain
System Implementation Successfully integrate our methodology with Mozilla Firefox browser Second set, 94% <= 167 ms Non Typo domains, 10 ms in avg and max is 25 ms
Classifier Data Set is of 2,800 sample 700 are parked domain and 2,100 general purpose domain from Yahoo Directory Identify distinguishing features Compute Distribution for verification Use WEKA library to try different classification algorithms, Random Forest was the best
Conclusion Defined and implemented an accurate identification methodology Performed measurements that show typo- squatters are moderately successful Integrated the methodology with a Firefox browser to detect typo-squatting domains on the fly