TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id: 401689301.

Slides:



Advertisements
Similar presentations
Copyright © 2012 Certification Partners, LLC -- All Rights Reserved Lesson 4: Web Browsing.
Advertisements

Lesson 4: Web Browsing.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Inbound Statistics Slides Attract. 1 Blogging There are 31% more bloggers today than there were three years ago 46% of people read blogs more than once.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
LEARN THE QUICK AND EASY WAY! VISUAL QUICKSTART GUIDE HTML and CSS 8th Edition Chapter 21: Publishing Your Pages on the Web.
Alexander Hartmann.  Free service offered by Google that generates detailed statistics about the visitors to a website. A premium version is also available.
Norman SecureSurf Protect your users when surfing the Internet.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *
With Internet Explorer 9 Getting Started© 2013 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring the World Wide Web with Internet Explorer.
11 The Ghost In The Browser Analysis of Web-based Malware Reporter: 林佳宜 Advisor: Chun-Ying Huang /3/29.
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
Browser Wars and the Politics of Search Engines
Abstract Introduction Results and Discussions James Kasson  (Dr. Bruce W.N. Lo)  Information Systems  University of Wisconsin-Eau Claire In a world.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
MIS Week 6 Site:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
INTERLEGES AGM KIEV THE “ESSENTIALS” OF LAW FIRM WEBSITES.
CSCE 201 Web Browser Security Fall CSCE Farkas2 Web Evolution Web Evolution Past: Human usage – HTTP – Static Web pages (HTML) Current: Human.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
MIS Week 6 Site:
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Studying Spamming Botnets Using Botlab
Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine Optimization Information Systems 337 Prof. Harry Plantinga.
Google Analytics Workshop
LESSON 15 – UNIT 0 ADAPTING YOUR WEB SITE FOR MOBILE DEVICES.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
We believe in qualitative quantity
[xxxx] SEO Online Marketing for Business Catalyst Websites
Chapter 1: Internet Marketing Foundations. Chapter Objectives Describe how computers and servers communicate to enable people to interact with webpages.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Web Analytics Fundamentals Presented by Tejaswi, Chandrika, Sunil.
Free SEO for Blogs & YouTube Channels.
Chapter 10: Web Basics.
SEARCH ENGINE OPTIMIZATION
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Automated Experiments on Ad Privacy Settings
A lustrum of malware network communication: Evolution & insights
Lesson 4: Web Browsing.
Prof. Dr. Marc Rennhard Head of Information Security Research Group
Practical Censorship Evasion Leveraging Content Delivery Networks
Strategies for improving Web site performance
Web Mining Ref:
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Search Engine Optimization By Maddova Media Pvt. Ltd.
SEARCH ENGINE OPTIMIZATION
No Direction Home: The True cost of Routing Around Decoys
563.10: Bloom Cookies Web Search Personalization without User Tracking
What’s New in Fireware v12.1.1
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
INTRODUCTION DIGITAL MARKETINGDIGITAL MARKETING IS A WAY OF MARKETING THE BRANDS OR THINGS BY USING DIGITAL MEDIA SUCH AS MOBILE PHONES, INTERNET, COMPUTERS.
Advance SEO tips & techniques. Search Engine Optimization Search Engine Optimization(SEO) or Organic Search is a process that focuses on increasing the.
Fred Dirkse CEO, OIC Group, Inc.
Intro to Ethical Hacking
Intro to Ethical Hacking
Edge computing (1) Content Distribution Networks
Section 14.1 Section 14.2 Identify the technical needs of a Web server
Presented by Jerry Work President, Work Media LLC
E-Commerce and Social Networks
Lesson 4: Web Browsing.
The Domain Abuse Activity Reporting System (DAAR)
Searching the Web.
Best Digital Marketing Tips For Quick Web Pages Indexing Presented By:- Abhinav Shashtri.
Exploring DOM-Based Cross Site Attacks
Presentation transcript:

TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id: 401689301

OBJECTIVE Provide a reliable and reproducible ranking list system. Tranco- A Research-Oriented Top Sites Ranking Hardened Against Manipulation. A stabilized, reproducible, similar and non-manipulated ranking list for research.

Background Multiple commercial providers publish rankings of popular domains that they compose using various methods. Around 133 top-tier studies over the past 4 years based their experiments and conclusions on the data from these providers. The providers included in the research are: Alexa Majestic Cisco-Umbrella Quantcast

Background Alexa Cisco- Umbrella: Alexa is an American web traffic analysis company which is a subsidiary of amazon. The ranks calculated by Alexa are based on traffic data from global data panel Cisco- Umbrella: The ranks calculated by Cisco Umbrella are based on DNS traffic to its two DNS resolvers (marketed as OpenDNS), claimed to amount to over 100 billion daily requests from 65 million users.

Background Majestic: Quantcast: Majestic publishes the daily updated ‘Majestic Million’ list consisting of one million websites since October 2012. The ranks calculated by Majestic are based on backlinks to websites Sites are ranked on the number of class C (IPv4 /24) subnets that refer to the site at least once. Quantcast: Quantcast directly measures traffic through a tracking script as well as sites where Quantcast estimates traffic based on data from ‘ISPs and toolbar providers including the number of users. The list also includes ‘hidden profiles’, where sites are ranked but the domain is hidden.

Background Domain rankings would perfectly reflect the popularity of web sites free from any biases. The properties to be considered between these sites while using for security research are as follows: Similarity Stability Representativeness Responsiveness Benignness

Problem-Classification of list usage Security studies often rely on the list from top ranking sites. Around 133 research papers make use of these lists for various purposes in their research. These papers are classified according to 4 purposes of their usage of lists. Prevalence – Proportion of sites affected by an issue Evaluation – Serve to test an attack or defense Whitelist – source of benign websites Ranking – exact ranks of sites are mentioned.

Problem- Influence on security studies Most studies lack any comment on when the list was downloaded. Hampers the reproducibility of the studies. Influence on security studies: Incentives – Influence the studies related to policy makers and government through malicious user as incentives. Case Study – A long tail of fingerprinting scripts are largely unblocked by current privacy tools.

Problem- Large scale Manipulation Manipulating rankings becomes a prime vector for influencing security research. These manipulations can be done on the lists with minimal effort and low cost. Few such manipulation techniques applied on the sites.

Alexa – manipulation Alexa ranks domains based on traffic data from two sources: Traffic rank – A browser extension that reports all page visits Certify – An analytic service that uses a tracking script to count all visits on subscribing websites. Extension: Installed an extension in chrome browser instance to include a domain in ranking list. Achieved a rank as high as 370461 with 12 requests. Certify: Requires subscription for using the service. Achieved up to a rank of 28798 with in 52 days.

Cisco Umbrella - manipulation Ranks websites on the number of unique client IP’s issuing DNS requests. Cloud providers : Pool of IP addresses for service instances (AWS). Achieved rank of 200000. Alternatives: Tor IP spoofing

Majestic- manipulation Ranks based on number of subnets hosting a website that links to the ranked domain. Backlinks – Option to provide higher rank position in SEO. Reflected URLs- Provide with GET option in URL’s Alternatives: Hosting own sites Pingbacks

Quantcast- manipulation Ranks based on traffic data obtained from tracking script and webmasters install on their website. Quantified- It mainly obtains traffic data through its tracking script that webmasters install on their website. Alternatives: A chance of interaction with ISP and toolbar providers for Quantcast.

Solution Defend existing rankings against manipulation. An improvised, efficient, stabilized ranking site – TRANCO. A suitable ranking list for research hardened against manipulation. Combination options: The Borda Count Dowdall rule Add filters to create a list that represent a certain desired subset of popular domains. Provide multiple options to filters like status code, domain length and content length.

Solution Malicious domains – Remove domains on the google safe browsing list from the generated list. Evaluation – Validate characteristics for security study. Similarity: The final result is unbiased and provides almost equal similarity in all reports. Stability: Averaging the ranking for 30 days provide more stable list Reproducibility: A citation template and short link are generated for every list. Manipulation: The manipulation need to be quadrupled to insert a website into the list.

Consideration The generated list is a byproduct of the existing lists. Avoids manipulation effects on the final list The effort need to be quadrupled to obtain a ranking in the new list. All the ranking sites need to be manipulated in proportions to appear a domain in the combined list. Responsiveness factor for the lists is not addressed.

Criticism It’s a byproduct of existing lists. An automated machine learning algorithm using Amber loom domain analyzer, which analyze and scan about the new websites. Categorical grouping of websites and maintaining an observatory period before entering them into potential list for various validations can help to avoid the malicious domains to enter into the list. Inclusion of addon functionalities like Valbot.com which provide domain name valuations reporting globally on site value traffic, PageRank, malware, whois data, seo and social media presence.