Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.

Slides:



Advertisements
Similar presentations
The Internet.
Advertisements

The Internet and the Web
Google and Beyond… Hatch Library Bay Path College / Spring 2010.
Introduction to TCP/IP, the Internet, IP Addressing, and Domain Name.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.
Harvesting SSL Certificate Data to Identify Web-Fraud Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/10/04 1.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Data-rich Section Extraction from HTML pages Introducing the DSE-Algorithm Original Paper from: Jiying Wang and Fred H. Lochovsky Department of Computer.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Accurately Detect Parked Domain Typo- squatting Attacks Mishari Almishari and Xiaowei Yang University of California, Irvine Donald Bren School of Information.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Pricing of Banner Ads E-Business Technologies Andrés Belmont.
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
11 The Ghost In The Browser Analysis of Web-based Malware Reporter: 林佳宜 Advisor: Chun-Ying Huang /3/29.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Ask Joanne, LLC Fundamentals of Internet Marketing Free Teleclass by Ask Joanne, LLC September 19, :30 – 10:30am.
 Internet vs WWW  Pages vs Sites  How the Internet Works  Getting a Web Presence.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
Abstract Introduction Results and Discussions James Kasson  (Dr. Bruce W.N. Lo)  Information Systems  University of Wisconsin-Eau Claire In a world.
PhishScore: Hacking Phishers’ Minds
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
1 ITGS - introduction A computer may have: a direct connection to a net (cable); or remote access (modem). Connect network to other network through: cables.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 7-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
What DNS is Not 0 Kylie Brown, Jordan Eberst, Danielle Franz Drew Hanson, Dennis Kilgore, Charles Newton, Lindsay Romano, Lisa Soros 0 Paul Vixie
A Statistical Approach to Typosquatting Detection DNS Ops Workshop, 4-5 June 2008 Alessandro Linari and Oxford Brookes University.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Detecting Typo- squatting Domains Mishari Almishari
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Defending Against Internet Worms: A Signature-Based Approach Aurthors: Yong Tang, and Shigang Chen Publication: IEEE INFOCOM'05 Presenter : Richard Bares.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
HoneySpam 2.0 Profiling Web Spambot Behaviour Pedram Hayati Kevin Chai Vidyasagar Potdar Alex Talevsky Prof. Tharam Dillon Prof. Elizabeth Chang Digital.
Parking Sensors: Analyzing and Detecting Parked Domains
© 2010 Pearson Education, Inc. | Publishing as Prentice Hall. Computer Literacy for IC 3 Unit 3: Living Online Chapter 2: Searching for Information.
Mohammad Taha Khan *, Xiang Huo *, Zhou Li † & Chris Kanich * University of Illinois at Chicago * & RSA Labs † Every Second Counts: Quantifying the Negative.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Post-Ranking query suggestion by diversifying search Chao Wang.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
The Internet. Important Terms Network Network Internet Internet WWW (World Wide Web) WWW (World Wide Web) Web page Web page Web site Web site Browser.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Identifying Spam Web Pages Based on Content Similarity Sole Pera CS 653 – Term paper project.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #6 Guest Lecture + Some Topics in Biometrics September 12,
Domain Name System INTRODUCTION to Eng. Yasser Al-eimad
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
DNS Security Risks Section 0x02. Joke/Cool thing traceroute traceroute c
Introduction to Information Systems SSD1: Introduction to Information Systems Unit 1. The World Wide Web Unit 2. Introduction to Java and Object- Oriented.
Source: Procedia Computer Science(2015)70:
Spreadsheets, Websites
Network Profiler: Towards Automatic Fingerprinting of Android Apps
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Presentation transcript:

Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Introduction - Motivation Traffic is important to web domains! no point of launching without incoming traffic Loosing/Gaining traffic means loosing/gaining money One way to price the ADS is Pay Per Click Model Traffic Diversion could be a serious threat to a domain

Introduction - Motivation Typos may attract traffic Users vulnerable to making typos Users may forget about visiting target domain Threat to Target Domain! Intentionally registering such typo domains is called Typo-squatting

Introduction - Goal To study how much traffic typo- squatters can get from target domains Are those domains attracting much traffic? There are many typo-squatting domains registered (Banerjee et al., 08) Search engines typo-corrections and browser auto- completions! How much traffic target domains are loosing? Is it of negligible ratio or a serious threat? Do users go back to target domains or get distracted?

Introduction - Challenges How to identify typo-squatting domains? Does Typo mean Typo-squatting? Short Domains and Longer Domains and If not, how can we? Hijacking indicator

Introduction - Contribution Automatic and accurate identification of typo- squatting domains (Measurement Methodology) Bound on how much traffic target domains are loosing towards typo-squatting domains (Measurement Results)

Outline Introduction Background Methodology Parked Domain Classifier Measurements Related Work Future Work Conclusion

Background – Domain Parking Domain Parking is the practice of showing a temporary page for an unused domain before launching it

Background - Domain Parking

Background – Domain Parking

Domain Parking Service Parks and hosts unused domains Monetize the traffic by showing ads Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Methodology Data Collection Identifying Typo-Squatting Domains

Methodology - Data Collection DNS UCI Revolvers Internal requests to domain names DNS query proceeds http request Caching limitation Our study represents a lower-bound

Methodology - Data Collection UCI NET INTERNET UCI Resolver Our Machine DATE TIME HASHED-IP DOMAIN TYPE CLASS USER QUERY

Methodology – Identify Typo- squatting Domain Identify Similar Domains a. Single Error Typo Single error accounts for 90-95% of spelling/typo errors (Pollock et al, 83) and b. gTLD substitution and

Methodology – Identify Typo- squatting Domains But Similar domain is not enough! and and and Random Sample More than 54% are not Typo-squatting Need to Identify Hijacking Intention

Methodology – Identify Typo- squatting Domain Identify Hijacking Indicator  Parked Domain (Ads – listing)  ~ 88%  Forwarding to other domains  ~ 8%  Others: Inappropriate Content, … Parked Domain as the indicator

Methodology – Identify Typo- squatting Domain Similar DomainParked Domain Typo-Squatting Domain

Methodology – Identify Typo- squatting Domain How to identify Parked Domain? Parked Domain Classifier 96% Presence of Parking signatures Well-known parking signatures (domain names/urls)

Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier

Data Set Data Set consists of 2,800 domains 700 are parked domain Collected from MS Strider Website 2,100 are non-parked domains Collected From the fourteen Yahoo Directory Top Categories

Feature Selection Heuristically, Identify common features in parked domain Compute the distribution of those features for verification Common Link Ratio Max

Feature Selection

Combining Features Into Classifier Tried Different Classifier Algorithms Decision Tree SVM K-Nearest Neighbor Random Forest The best performance

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

DATA Sets DNS Traces Four Months ~ 30 million domains ( ~ 2 billion hits ) ( ~ 30,000 users ) Target Domain Set Alexa’s Top 500 popular domains ~53,000,000 hits

Typo-Squatting Domains & Hits 1,332 typo-squatting 13,431 hits (~ 110 a day) Is it Large or Small? 500 Target Domains 4 Month Period ~ 30,000 users Given Similar Ratio may translate to non-trivial number 30,000 => 110 Per Day 300,000 => 1,100 Per Day 3000,000 => 11,000 (X 365 = ~ 4,000,000 A YEAR)

Typo-squatting Ratio 0.025% of total number of queries (89%, ≤ 1%) (70%, ≤ 0.1%) ( 57%, ≤ 0.01%)

User Correction Ratio – Alexa % of typo-squatting queries are corrected ~ 51% squatted target domains have most squat hits corrected

Potential Hit Loss Potential Hit Loss Ratio = 0.012% (92%, ≤ 1%) (78%, ≤ 0.1%) (64%, ≤ 0.01%)

Potential Money Loss ~75% do not point to target domains Referring Typo-Sqt Ratio = 0.008% (96%, ≤ 1%) (91%, ≤ 0.1%) ( 81%, ≤ 0.01%)

Non-existing Similar Domains 8,285 potential hits (~ 500 non-existing typo domain) 0.015% of total number of queries (96%, ≤ 1%) (83%, ≤ 0.1%) (66%, ≤ 0.01%)

Typo-Squatting Distribution 19 % of all Typo-squatting hits

Top Ten Typo-squatting Domains 19 % of all Typo-squatting hits

Top Ten Target Domains Responsible of 55% to all typo-squatting queries of Alexa Million hits of “

Typo Characterization Most Typos are single errors ( 95% VS 5%) Most gTLD sub are “com” to “org” (50%) Add – 37 % are of non-adjacent keys Sub – 77% are of non-adjacent keys Sub – 13% of substitutions are “a” and “o” Spelling error

Typo-squatting Domains – TP60 15,499 hits 0.045% of total number of queries (76%, ≤ 1%) (60%, ≤ 0.5%)

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Future Work How much of the ads budget go to squatters? Enhance our identification technique See, if the results hold at other ISPs Typo Modeling for getting traffic back

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Related Work MS Strider Project [Wang et al. Sruti06] McAfee Study [Keats McAfee White Paper 07] JAAL project [Banerjee et al. Infocom 08]

Outline Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion

Accurately and automatically identify typo-squatting domains How much traffic go to typo-squatters Bound on how much traffic the target domain is loosing towards typo-squatting inconsequential