Presentation is loading. Please wait.

Presentation is loading. Please wait.

Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.

Similar presentations


Presentation on theme: "Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari."— Presentation transcript:

1 Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari

2 Outline Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion

3 Introduction - Motivation Traffic is important to domains! no point of launching without incoming traffic Loosing/Gaining traffic => loosing/gaining money One way to price the ADS is PPC => how important traffic Traffic Diversion could be a serious threat to a domain

4 Introduction - Motivation Typos may divert the traffic Users vulnerable to making typos Users may forget about visiting target domain Threat to Target Domain! Intentionally registering such typo domains is called Typo-squatting

5 Introduction - Goal To study how much traffic typo-squatters can get from target domains Are those domains attracting much traffic? Search engines typo-corrections! Browser auto-completions! How much traffic target domains is loosing? Is it of negligible ratio or a serious threat? Do users go back to target domains or get distracted?

6 Introduction - Challenges How to identify typo-squatting domains? Does Typo mean Typo-squatting? Short Domains www.abc.com and www.abd.com www.abc.comwww.abd.com Longer Domains www.walmart.com and www.walkmart.com www.walmart.comwww.walkmart.com If not, how can we? Hijacking indicator

7 Introduction - Contribution Automatic and accurate identification of typo- squatting domains show how much traffic target domains are loosing towards typo-squatting domains

8 Outline Introduction Background Methodology Parked Domain Classifier Data Results Related Work Future Work Conclusion

9 Background – Domain Parking Domain Parking showing a temporary page for an unused domain before launching them

10 Background - Domain Parking

11 Background – Domain Parking

12

13 Domain Parking Service Parks and hosts unused domains Monetize the traffic by showing ads Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)

14 Outline Introduction Background Methodology Parked Domain Classifier Data Results Future Work Related Work Conclusion

15 Methodology Data Collection Identifying Typo-Squatting Domains

16 Methodology - Data Collection DNS traces @ UCI Revolvers Internal requests to domain names DNS query proceeds http request Caching limitation Our study represents a lower-bound

17 Methodology – Identify Typo- squatting Domain 1.Identify Similar Domains a. Single Error Typo Single error accounts for 90-95% of spelling errors www.walmart.com and www.walkmart.com www.walmart.comwww.walkmart.com b. gTLD substitution www.amazon.com and www.amazon.org www.amazon.comwww.amazon.org

18 Methodology – Identify Typo- squatting Domains But Similar domain is not enough! www.walmart.com and www.walkmart.comwww.walmart.comwww.walkmart.com Random Sample More than 54% are not Typo-squatting

19 Methodology – Identify Typo- squatting Domain 2. Identify Hijacking Indicator  Inappropriate Content  Domain For Sale  Forwarding to other domains  Ads – listing (Parked Domain)  More than 80%

20 Methodology – Identify Typo- squatting Domain Similar DomainParked Domain Typo-Squatting Domain

21 Methodology – Identify Typo- squatting Domain How to identify Parked Domain? Parked Domain Classifier Presence of Parking signatures Well-known parking signatures (domain names/urls)

22 Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains

23 Outline Introduction Background Methodology Parked Domain Classifier Data Results Future Work Related Work Conclusion

24 Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier

25 Data Set Data Set consists of 2,800 domains 700 are parked domain Collected from MS Strider Website 2,100 are non-parked domains Collected From the fourteen Yahoo Directory Top Categories

26 Feature Selection Heuristically, Identify common features in parked domain Compute the distribution of those features for verification Common Link Ratio Max

27 Combining Features Into Classifier Tried Different Classifier Algorithms Decision Tree SVM K-Nearest Neighbor Random Forest The best performance

28 Outline Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion

29 DATA Sets DNS Traces Four Months Anonymous CNAME and A ~ 30 million domains ( ~ 2 billion hits ) ( ~ 30,000 users ) Target Domain Set Alexa’s Top 500 popular domains

30 Typo-Squatting Domains & Hits 1,332 typo-squatting 13,431 hits Is it Large or Small? 500 Target Domains 4 Month Period ~ 30,000 users Given Similar Ratio may translate to large number 30,000 => 13,000 300,000 => 130,000 3000,000 => 1,300,000

31 Typo-squatting Ratio 0.025% of total number of queries 89% LE 1% (70% LE 0.1%) ( 57% LE 0.01%)

32 User Correction Ratio – Alexa- 500 on average, 54% of typo-squatting queries are corrected

33 Potential Hit Loss 0.012% 92% LE 1% (78% LE 0.1%) (64% LE 0.01%)

34 Potential Money Loss 0.008% 96% LE % (91% LE 0.1%) ( 81% LE 0.01%)

35 Non-existing Similar Domains 463 potential typo-squatting 8,285 potential hits 0.015% of total number of queries 96% LE 1% (83% LE 0.1%) (66% LE 0.01%)

36 Typo-squatting Domains – TP60 629 typo-squatting 15,499 hits 0.045% of total number of queries 76% LE 1% (60% LE 0.5%)

37 Top Ten Typo-squatting Domains 19 % of all Typo-squatting hits

38 Top Ten Target Domains Responsible of 55% to all typo-squatting queries of Alexa-500 50 Million hits of “www.facebook.com”

39 Typo Characterization Most Typos are single errors ( 95% VS 5%) Most gTLD sub are “com” to “org” (50%) Add - 63% are of adjacent keys Sub – 23% are of adjacent keys Sub – 13% of substitutions are “a” and “o” Spelling error

40 Outline Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion

41 Future Work How much target domains are paying squatters? Enhance our identification technique Typo Modeling for getting traffic back Why People go to Parked Domains? How can you increase the traffic

42 Outline Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion

43 Related Work MS Strider Project [Wang et al. Sruti06] McAfee Study [Keats McAfee White Paper 07] JAAL project [Banerjee et al. Infocom 08]

44 Outline Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion

45 Accurately and automatically identify typo-squatting domains How much traffic go typo-squatters Bound on how much traffic the target domain is loosing towards typo-squatting inconsequential


Download ppt "Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari."

Similar presentations


Ads by Google