Download presentation
Presentation is loading. Please wait.
1
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari
2
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
3
Introduction - Motivation Traffic is important to web domains! no point of launching without incoming traffic Loosing/Gaining traffic means loosing/gaining money One way to price the ADS is Pay Per Click Model Traffic Diversion could be a serious threat to a domain
4
Introduction - Motivation Typos may attract traffic Users vulnerable to making typos Users may forget about visiting target domain Threat to Target Domain! Intentionally registering such typo domains is called Typo-squatting
5
Introduction - Goal To study how much traffic typo- squatters can get from target domains Are those domains attracting much traffic? There are many typo-squatting domains registered (Banerjee et al., 08) Search engines typo-corrections and browser auto- completions! How much traffic target domains are loosing? Is it of negligible ratio or a serious threat? Do users go back to target domains or get distracted?
6
Introduction - Challenges How to identify typo-squatting domains? Does Typo mean Typo-squatting? Short Domains www.abc.com and www.abd.com www.abc.comwww.abd.com Longer Domains www.walmart.com and www.walkmart.com www.walmart.comwww.walkmart.com If not, how can we? Hijacking indicator
7
Introduction - Contribution Automatic and accurate identification of typo- squatting domains (Measurement Methodology) Bound on how much traffic target domains are loosing towards typo-squatting domains (Measurement Results)
8
Outline Introduction Background Methodology Parked Domain Classifier Measurements Related Work Future Work Conclusion
9
Background – Domain Parking Domain Parking is the practice of showing a temporary page for an unused domain before launching it
10
Background - Domain Parking
11
Background – Domain Parking
13
Domain Parking Service Parks and hosts unused domains Monetize the traffic by showing ads Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)
14
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
15
Methodology Data Collection Identifying Typo-Squatting Domains
16
Methodology - Data Collection DNS traces @ UCI Revolvers Internal requests to domain names DNS query proceeds http request Caching limitation Our study represents a lower-bound
17
Methodology - Data Collection UCI NET INTERNET UCI Resolver Our Machine DATE TIME HASHED-IP DOMAIN TYPE CLASS USER QUERY
18
Methodology – Identify Typo- squatting Domain Identify Similar Domains a. Single Error Typo Single error accounts for 90-95% of spelling/typo errors (Pollock et al, 83) www.walmart.com and www.wamart.com www.walmart.comwww.wamart.com b. gTLD substitution www.amazon.com and www.amazon.org www.amazon.comwww.amazon.org
19
Methodology – Identify Typo- squatting Domains But Similar domain is not enough! www.abc.com and www.abd.comwww.abc.comwww.abd.com www.walmart.com and www.walkmart.comwww.walmart.comwww.walkmart.com www.usps.com and www.usps.orgwww.usps.comwww.usps.org Random Sample More than 54% are not Typo-squatting Need to Identify Hijacking Intention
20
Methodology – Identify Typo- squatting Domain Identify Hijacking Indicator Parked Domain (Ads – listing) ~ 88% Forwarding to other domains ~ 8% Others: Inappropriate Content, … Parked Domain as the indicator
21
Methodology – Identify Typo- squatting Domain Similar DomainParked Domain Typo-Squatting Domain
22
Methodology – Identify Typo- squatting Domain How to identify Parked Domain? Parked Domain Classifier 96% Presence of Parking signatures Well-known parking signatures (domain names/urls)
23
Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains
24
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
25
Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier
26
Data Set Data Set consists of 2,800 domains 700 are parked domain Collected from MS Strider Website 2,100 are non-parked domains Collected From the fourteen Yahoo Directory Top Categories
27
Feature Selection Heuristically, Identify common features in parked domain Compute the distribution of those features for verification Common Link Ratio Max
28
Feature Selection
29
Combining Features Into Classifier Tried Different Classifier Algorithms Decision Tree SVM K-Nearest Neighbor Random Forest The best performance
30
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
31
DATA Sets DNS Traces Four Months ~ 30 million domains ( ~ 2 billion hits ) ( ~ 30,000 users ) Target Domain Set Alexa’s Top 500 popular domains ~53,000,000 hits
32
Typo-Squatting Domains & Hits 1,332 typo-squatting 13,431 hits (~ 110 a day) Is it Large or Small? 500 Target Domains 4 Month Period ~ 30,000 users Given Similar Ratio may translate to non-trivial number 30,000 => 110 Per Day 300,000 => 1,100 Per Day 3000,000 => 11,000 (X 365 = ~ 4,000,000 A YEAR)
33
Typo-squatting Ratio 0.025% of total number of queries (89%, ≤ 1%) (70%, ≤ 0.1%) ( 57%, ≤ 0.01%)
34
User Correction Ratio – Alexa- 500 54% of typo-squatting queries are corrected ~ 51% squatted target domains have most squat hits corrected
35
Potential Hit Loss Potential Hit Loss Ratio = 0.012% (92%, ≤ 1%) (78%, ≤ 0.1%) (64%, ≤ 0.01%)
36
Potential Money Loss ~75% do not point to target domains Referring Typo-Sqt Ratio = 0.008% (96%, ≤ 1%) (91%, ≤ 0.1%) ( 81%, ≤ 0.01%)
37
Non-existing Similar Domains 8,285 potential hits (~ 500 non-existing typo domain) 0.015% of total number of queries (96%, ≤ 1%) (83%, ≤ 0.1%) (66%, ≤ 0.01%)
38
Typo-Squatting Distribution 19 % of all Typo-squatting hits
39
Top Ten Typo-squatting Domains 19 % of all Typo-squatting hits
40
Top Ten Target Domains Responsible of 55% to all typo-squatting queries of Alexa-500 50 Million hits of “www.facebook.com”
41
Typo Characterization Most Typos are single errors ( 95% VS 5%) Most gTLD sub are “com” to “org” (50%) Add – 37 % are of non-adjacent keys Sub – 77% are of non-adjacent keys Sub – 13% of substitutions are “a” and “o” Spelling error
42
Typo-squatting Domains – TP60 15,499 hits 0.045% of total number of queries (76%, ≤ 1%) (60%, ≤ 0.5%)
43
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
44
Future Work How much of the ads budget go to squatters? Enhance our identification technique See, if the results hold at other ISPs Typo Modeling for getting traffic back
45
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
46
Related Work MS Strider Project [Wang et al. Sruti06] McAfee Study [Keats McAfee White Paper 07] JAAL project [Banerjee et al. Infocom 08]
47
Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion
48
Accurately and automatically identify typo-squatting domains How much traffic go to typo-squatters Bound on how much traffic the target domain is loosing towards typo-squatting inconsequential
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.