What is WEB SPAM Many slides are from a lecture by Marc Najork: “Detecting Spam Web Pages”

Slides:



Advertisements
Similar presentations
What is WEB SPAM Many slides from a lecture by Marc Najork, Microsoft: “Detecting Spam Web Pages”
Advertisements

The Values of a Link for Search Engine Optimization.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
Yahoo! Research Bradley Horowitz VP Product Strategy Yahoo, Inc. August 2006.
The process of increasing the amount of visitors to a website by ranking high in the search results of a search engine.
CS345 Data Mining Web Spam Detection. Economic considerations  Search has become the default gateway to the web  Very high premium to appear on the.
Detecting Spam Web Pages Marc Najork Microsoft Research, Silicon Valley.
PBA Front-End Programming Search Engine Optimisation.
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
WEB SPAM A By-Product Of The Search Engine Era Web Enhanced Information Management Aniruddha Dutta Department of Computer Science Columbia University.
Todd Friesen April, 2007 SEO Workshop Web 2.0 Expo San Francisco.
Meta Tags What are Meta Tags And How Are They Best Used?
SEO PLAN Presented By Mangesh Dolse. Lead Management Tool( Sample)
Search Engine Optimization
SEO for Web Designers By Alfredo Palconit, Jr.. I. What is SEO? A process of improving a site’s traffic and rank from organic search engine results. Notes:
8 White Hat SEO Methods for PHP Developers David Fischer Avity LLC.
Search Engine Optimization. Introduction SEO is a technique used to optimize a web site for search engines like Google, Yahoo, etc. It improves the volume.
1 A Quantitative Study of Forum Spamming Using Context-Based Analysis Yi-Min Wang^ Ming Ma^ Yuan Niu* Hao Chen* Francis Hsu* *UC Davis, ^Microsoft Research.
Adversarial Information Retrieval The Manipulation of Web Content.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
1 ITGS - introduction A computer may have: a direct connection to a net (cable); or remote access (modem). Connect network to other network through: cables.
BLACK HAT SEO "Show Me The Money”. Keyword Selection.
Adversarial Information Retrieval on the Web or How I spammed Google and lost Dr. Frank McCown Search Engine Development – COMP 475 Mar. 24, 2009.
© 2006 Stephan M Spencer Netconcepts Search Engine Marketing by Stephan Spencer President, Netconcepts.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Web spamming Detecting Spam Web Pages through Content Analysis Alexandros Ntoulas et al, 2006, International World Wide Web Conference.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
Search Engine Marketing Gay, Charlesworth & Esen Chapter 6.
Heuristics for Detecting Spam Web Pages Marc Najork Microsoft Research, Silicon Valley Joint work with Fetterly, Manasse, Ntoulas.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
The Road to Online Marketing. A Magic Voyage Begins!!!
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
SEARCH ENGINE OPTIMIZATION - Search engine optimization is a technique which helps a site to get higher rank on search engines.
Search Engine Optimization. Search Engines ≈50% your new users are from a search engine ≈50% are returning users Many repeat viewers will return using.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Continuing Education UCC Fall 2010 Search Engine Optimization.
Search Engine Optimization for Dummies Peter Kent.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
The Icing! SEO, Show Notes, Enhanced Podcasts, Pitches 27 June 2008.
Search Engine Optimization Information Systems 337 Prof. Harry Plantinga.
The effects of Web Spam on The Evolution of Search Engines CS315-Web Search and Mining.
SEARCH ENGINE OPTIMIZATION. What is Search Engine Optimization?  Search engine optimization ( SEO ) is the process of affecting the visibility of a website.
Successful Site Architecture Matt Bailey SiteLogic
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
SES Xiamen 2007 – Lessons from Spamming Lessons from Spamming 发送同样的消息到多个新闻组.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Chapter 8: Web Analytics, Web Mining, and Social Analytics
SEO - TECHNIQUES Types of SEO SEO techniques can be classified into two broad categories : 1.White Hat SEO 2.Black Hat SEO
KiloBytes Technologies “New Face Of Technology” / Website: SEOwww.kilobytes.inSEO.
SEO Tactics Search Engines Optimization is the best process which helps to improve your business in search engine mediums and social mediums such as Facebook,
Off-Site SEO to Improve Your Website’s Page Rank Straight Up Marketing.
Web Spam Taxonomy Zoltán Gyöngyi, Hector Garcia-Molina Stanford Digital Library Technologies Project, 2004 presented by Lorenzo Marcon 1/25.
Link building. WHAT IS LINK BUILDING Off Page SEO is only done for completing major motive of link building. Link building techniques is one of the simplest.
Search Engine Optimization
How do Web Applications Work?
Search Engine Optimization
Dr. Frank McCown Comp 250 – Web Development Harding University
WEB SPAM.
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
Search Search Engines Search Engine Optimization Search Interfaces
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Web Spam
Search Engine Optimization
Blog SEO Tips: How to Write SEO Friendly Blog Posts
Presentation transcript:

What is WEB SPAM Many slides are from a lecture by Marc Najork: “Detecting Spam Web Pages”

What do Web Spammers do Web Spammers target the last step Inverted Index Search Engine Servers Document IDs Query THE WEB Index the documents Get indices for relevant documents Retrieve full text of relevant documents Display results on a web page Rank Result

Web spam (you know it when you see it)

Defining Web Spam Spam web page is… A page created for the sole purpose of attracting search engine referrals (to this page or some other “target” page) Ultimately a judgment call Some web pages are borderline useless Some pages look fine in isolation, but in context are clearly “spam”

Spamming Techniques Boosting Rank:  Term Spamming  Link Spamming Hiding Spam:  Content Hiding  Cloaking  Redirecting

Boosting Rank by Term Spamming Editing the textual content The Search engine looks for relevant terms in various fields Different fields are weighed different

Term Spam: Keyword stuffing Search engines return pages that contain query terms (Certain caveats and provisos apply …) One way to get more SE referrals: Create pages containing popular query terms (“keyword stuffing”) Three variants: Hand-crafted pages Completely synthetic pages Assembling pages from “repurposed” content

Synthetic content for keyword stuffing Monetization Random words Well-formed sentences stitched together Links to keep crawlers going

More examples of synthetic content Someone’s wedding site!

Really good synthetic content Links to keep crawlers going Grammatically well-formed but meaningless sentences “Nigritude Ultramarine”: An SEO competition

Spamming Techniques Boosting Rank:  Term Spamming  Link Spamming Hiding Spam:  Content Hiding  Cloaking  Redirecting

Boosting Rank by Link Spamming Link structure  importance Outgoing links Incoming links Use Directories Link exchange and spam farms

Link Spam Inflating the rank of a page by creating nepotistic links to it From own sites: Link farms From partner sites: Link exchanges From unaffiliated sites (e.g. blogs, guest books, web forums, etc.) The more links, the better Generate links automatically Use scripts to post to blogs Synthesize entire web sites Synthesize many web sites (DNS spam) The more important the linking page, the better Buy expired highly-ranked domains Post links to high-quality blogs

Inflate rank: Link farms, link exchanges

Inflate rank: Expired domains

Inflate rank: Web forum and blog spam

Spamming Techniques Boosting Rank:  Term Spamming  Link Spamming Hiding Spam:  Content Hiding  Cloaking  Redirecting

Hiding Spam Invisible content Cloaking: serve different page to a crawler than to a browser Techniques: Recognize page request is from search engine (based on “user-agent” info or on IP address) Make some text invisible (i.e. black on black) Use CSS to hide text Use JavaScript to rewrite page (dynamically created) Use “meta-refresh” to redirect user to other page

Why should we care about Web spam? We depend on search engines and trust them Web Spam undermines the reputation of a trusted information source