Presented by : Manoj Kumar & Harsha Vardhana Impact of Search Engines on Page Popularity by Junghoo Cho and Sourashis Roy (2004)

Slides:



Advertisements
Similar presentations
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Advertisements

LIS618 lecture 9 Web retrieval Thomas Krichel
Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Freshness Policy Binoy Dharia, K. Rohan Gandhi, Madhura Kolwadkar Department of Computer Science University of Southern California Los Angeles, CA.
Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
1 Searching the Web Junghoo Cho UCLA Computer Science.
The PageRank Citation Ranking “Bringing Order to the Web”
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
1 Internet and Data Management Junghoo “John” Cho UCLA Computer Science.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
How Search Engines Work Source:
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
HAL R VARIAN FEBRUARY 16, 2009 PRESENTED BY : SANKET SABNIS Online Ad Auctions 1.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
CPSC 534L Notes based on the Data Mining book by A. Rajaraman and J. Ullman: Ch. 5.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Overview of Web Ranking Algorithms: HITS and PageRank
1 Efficient Crawling Through URL Ordering by Junghoo Cho, Hector Garcia-Molina, and Lawrence Page appearing in Computer Networks and ISDN Systems, vol.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Predictive Ranking -H andling missing data on the web Haixuan Yang Group Meeting November 04, 2004.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Understanding Google’s PageRank™ 1. Review: The Search Engine 2.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
1 Page Quality: In Search of an Unbiased Web Ranking Presented by: Arjun Dasgupta Adapted from slides by Junghoo Cho and Robert E. Adams SIGMOD 2005.
Page Quality: In Search of an Unbiased Web Ranking Seminar on databases and the internet. Hebrew University of Jerusalem Winter 2008 Ofir Cooper
Evolution of Web from a Search Engine Perspective Saket Singam
Web Information retrieval (Web IR)
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Google's Page Rank. Google Page Ranking “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page
When it involves do a link audit on our web site web pages for factors such as to find out why our site has actually been punished or to determine why.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Discovering Changes on the Web What’s New on the Web? The Evolution of the Web from a Search Engine Perspective Alexandros Ntoulas Junghoo Cho Christopher.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Mathematics of the Web Prof. Sara Billey University of Washington.
1 Ranking. 2 Boolean vs. Non-boolean Queries Until now, we assumed that satisfaction is a Boolean function of a query –it is easy to determine if a document.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Jan 27, Digital Preservation Seminar1 Effective Page Refresh Policies for Web Crawlers Written By: Junghoo Cho & Hector Garcia-Molina Presenter:
1 What’s New on the Web? The Evolution of the Web from a Search Engine Perspective A. Ntoulas, J. Cho, and C. Olston, the 13 th International World Wide.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Finding Replicated web collections
The PageRank Citation Ranking: Bringing Order to the Web
PageRank and Markov Chains
Uniform Sampling from the Web via Random Walks
CS246 Web Characteristics.
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Anatomy of a Search Search The Index:
Junghoo “John” Cho UCLA
Presentation transcript:

Presented by : Manoj Kumar & Harsha Vardhana Impact of Search Engines on Page Popularity by Junghoo Cho and Sourashis Roy (2004)

Overview Introduction Page Rank & Popularity Popularity Evolution - Experimental study - Theoretical study Interesting Facts Conclusion

“If your page is not indexed by Google, your page does not exist on the web” How do search engines rank web pages for a given query? Judgment of “quality” and “relevance”… The “rich-get-richer” phenomenon

Page rank & Popularity Number of links to a page Weighted links PR(p i )=d + (1 - d)[PR(p 1 )/c 1 + …. + PR(p m )/c m ] Popularity means… formula taken from the paper

A sample query Suppose we have a query “The Bugatti Veyron” Page I built out of interest:

How the popularity of web pages evolve?

A setup for experiment Downloaded pages of 154 web sites Downloaded pages until either no more pages were in reach or up to a maximum of 200,000 Downloaded 5 million pages(nodes) but the nodes in the web graph of the snapshots are around 13 million to 15 million. (In a GRAPH : node=> web page ; edge=> outgoing link) Common pages in both the snapshots count up to 7.8 million. Initial page rank assumed as 1.

Sample snapshot (web pages and corresponding links) The nodes in the graph drawn may be more than the nodes in the snapshot

Two snapshots S 1 & S 2 in a gap of 7 months s1 contains 13 million nodes and s2 contains 15 million nodes (since we are interested in the popularity of a page, only the common nodes, which is 7.8 millions, are considered) Using Incoming Links (IL) Total incoming links to a group are : Increase in popularity of a group : IL(G i, S 2 ) – IL(G i,S 1 ) formula taken from the paper Experimental study

Results (when used IL as metric) All results taken from the paper

Results cont. (detailed view)

Result for relative increase in popularity

Page rank as Metric Two snapshots S 1 & S 2 in a gap of 7 months Using Page Rank (PR) Page Rank of a group :

Results using PR as metric

Results cont. (detailed view)

Results for relative increase of page rank

If search engines do not rank pages based on the current popularity, will popular pages still get more popular?

Theoretical Study Random Surfer Model Search Dominant Model

Random Surfer Model Popularity P(p,t) Visit popularity V(p,t) = r1 P(p,t)

Google Popularity Evolution

Popularity Evolution in search dominant model Proposition : Under the search-dominant model, the number of visits to page pat time t satisfies the following equation: V(p,t) = r2 P(p,t) ^ (9/4) where r2 is a normalization constant.

A test result

Closer look at the result

This is TRUE! Result predicts that it takes 66 times longer under the search-dominant model than under the random-surfer model in order for a page to become popular!

Observing all the results we should believe that search engine ( an indispensable tool ) plays a significant role in the survival of a web page. Once a page gets a reasonable ranking in search dominant model the popularity increases very quickly.

Share Of Searches: July

Can a competitor harm others sites ranking in Google?

Conclusion Popular pages are getting more popular. Unpopular pages are getting relatively less popular. Many high quality pages are ignored since no one discovered them yet. There is an urgent need to develop a new ranking mechanism.