Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Slides:



Advertisements
Similar presentations
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Advertisements

Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Leveraging Social Chatter: Online Brand Reputation Monitoring and Management A Wakeup Call Gary Levine WSI Brand Reputation Expert.
Information Retrieval in Practice
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Reduced Support Vector Machine
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Search Engine Optimization HOW AND WHY Introduction to SEO SEO stands for “Search Engine Optimization” and often refers to the ability to easily locate.
SEO Webinar - With Neil Palmer of IM3.co.uk In Partnership with Huddlebuy How do I improve my website traffic with SEO? Covering: What is SEO? Why is SEO.
Search Engine Optimization
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Active Learning for Class Imbalance Problem
Basic PlanSpecial planPremium plan Starts from 8000 And INR (6 Months) Starts from And INR (6 Months) starts from and 1,10,000.
Search Engine optimization.  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Focused Crawling for both Topical Relevance and Quality of Medical Information By Tim Tang, David Hawking, Nick Craswell, Kathy Griffiths CIKM ’05 November,
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Keywords and Search Results & Upcoming Updates August 30, 2011.
Search Engine Optimization & Pay Per Click Advertising
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Algorithmic Detection of Semantic Similarity WWW 2005.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Search Engines SCI199 Oct. 5, 2009 Phillipa Gill
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Predicting Voice Elicited Emotions
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
NTU & MSRA Ming-Feng Tsai
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
SEO and SEA Search engine optimization and Search engine advertising Wesley Lacroix IBK.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
KiloBytes Technologies “New Face Of Technology” / Website: SEOwww.kilobytes.inSEO.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
WEB SPAM.
SEO Services in Hyderabad
TS Webtech
A Comparative Study of Link Analysis Algorithms
Presented by Jerry Work President, Work Media LLC
in. SEO is the process of optimizing the website in top search engines like Google, Bing, Yahoo, etc. The experts aim to index the website.
Maximizing Exposure for Your Non-Profit
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Presentation transcript:

Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths and Reality

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 2 Motivation ● Internet search engines (e.g. Google) drive users to highly ranked pages ● Search engines ranking results greatly influence how people acquire knowledge from the Internet [Pan ‘07] ● It is desirable to understand how a search engine ranks web pages ● Search engines’ ranking algorithms are proprietary ■ Publicly available information is very limited and out- dated

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 3 Current Approaches ● Guess-works by webmasters ■ Trial and error ■ Inefficient ● Based on experience of search engine optimization (SEO) experts Lack of systematical studies leads to folklores

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 4 Various Ranking Feature Opinions SEO experts Survey of Internet users Individual Internet marketing expert

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 5 Goals & Challenges ● Goals ■ Systematically approximate a search engine’s ranking results ■ Identify the importance of ranking factors ● Reverse-engineering a search engines’ ranking algorithms can be very complicated ■ Numerous ranking factors − Google claims to have over 200 ranking factors ■ Sophisticated ranking functions

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 6 Our Approach ● Build our own ranking system to approximate search engines’ ranking results Learning models: Linear programming SVM Recursive partitioning algorithm: Capture non-equational behavior of ranking functions. New ranking system: Generate our own ranking results and compare to Google’s

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 7 System Architecture ● Components of our ranking system ■ Crawler ■ Ranking Engine Can we approximate Google’s ranking results (top 10 pages) by using our own ranking system?

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 8 Ranking Features

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 9 Learning Models ● Linear programming model ■ Minimize the distance between our ranking system and Google’s ■ Minimize objective function ● Support vector machine (SVM) learning models ■ General technique for learning to rank programs ■ Support linear and polynomial kernels Weight: highly ranked pages are more important Ranking difference between the 2 pages Decision function: Out of order => penalty Decision function: Out of order => penalty Sum up the penalties

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 10 Recursive Partitioning Algorithm ● Multiple layers of indices ● Non-equational ranking algorithm While we need to partition the set of |S| pages Partition the |S| pages into top half and bottom half Return top half of the |S| pages and continue the recursion The algorithm ends when we found top X pages Train or apply ranking models to the set of |S| pages

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 11 Experimental Evaluation ● Evaluate different ranking models ■ Which model has better prediction accuracy? ● Evaluate the effectiveness of recursive partitioning algorithm ■ Can recursive partitioning algorithm improve prediction accuracy? ● Evaluate the relative weights of ranking features ■ Which ranking feature is more important?

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 12 Experimental Setup ● Crawl top 100 pages of 60 random keywords ● Randomly select 15 keywords as the training set with the rest 45 keywords as the testing set ● Evaluate the accuracy of our ranking system by predicting Google’s top 10 pages for each keyword in the testing set

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 13 Comparisons of Ranking Models The performance of our customized linear learning is better than SVM-linear model The performance of the polynomial model is better than both linear models. At the cost of: (1)Significant increase of learning time (2)No human readable equations The performance of the polynomial model is better than both linear models. At the cost of: (1)Significant increase of learning time (2)No human readable equations For 78% of the explored keywords, our ranking system successfully predicts 7 or more pages within the top 10 pages

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 14 The Power of Recursive Partitioning The recursive partitioning algorithm does help to improve accuracy of the ranking system in every round 3 rounds of recursive partitioning successfully “smooth out” the non-linearity of Google ranking algorithm and achieve a high prediction accuracy

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 15 Weights in Different Rounds in a Linear Model In different rounds, the learning model produces different set of weights Page rank score, keyword in title and hostname are the top 3 ranking feature Keyword in meta-description tag matters but in meta- keyword tag does not

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 16 Case Studies ● Can we improve our ranking system’s accuracy by isolating a subset of ranking features ■ Example: remove the age factor by focusing on “young” pages ● Can we use our ranking system to detect biases in search engines’ ranking algorithms? ■ Example: blogs ● Can we validate or disapprove new ranking features? ■ Example: HTML syntax errors

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 17 Isolating Subsets of Ranking Features We crawl web pages less or equal to 24 hours old to remove ranking features of age and page rank Our ranking system’s hit rate improves to 80% for 92% of evaluated keywords When the ranking features are more specific, our ranking system performs better

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 18 Negative Bias Toward Blogs We categorized web pages to different categories (e.g. blogs, news and music) and add a new ranking feature (hypothesis) into our ranking system The accuracy of our ranking system improves and the weight of the new ranking feature (blog) is negative

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 19 HTML Syntax Errors do not Matter We add a new ranking feature (hypothesis) for the number of HTML syntax errors in each web page The performance of the new ranking model is very close to the original one -> the new ranking feature does not make an impact

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 20 Conclusions ● In this work, we show that it is possible to systematically approximate Google’s ranking results with high accuracy ■ By a linear learning model incorporated with a recursive partitioning scheme ● We reveal the relative importance of ranking features in Google’s ranking function ● We illustrate our system can validate or disapprove ranking features and detect ranking bias

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 21 Thank you!

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 22 Backup Slides

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality Linear Programming Model

Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality Query Keywords