VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Evaluating Search Engine
Search Engines and Information Retrieval
Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Information Retrieval
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Link Analysis HITS Algorithm PageRank Algorithm.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Search Engines and Information Retrieval Chapter 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
The Business Model of Google MBAA 609 R. Nakatsu.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Hongbo Deng, Michael R. Lyu and Irwin King
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Evaluation Anisio Lacerda.
The PageRank Citation Ranking: Bringing Order to the Web
Methods and Apparatus for Ranking Web Page Search Results
Mining Query Subtopics from Search Log Data
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Junghoo “John” Cho UCLA
Relevance and Reinforcement in Interactive Browsing
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Web Information retrieval (Web IR)
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University

M.-S. ChenNTU2 Outline  Motivation and Goal  Preliminaries and Related work Introduction to Link-analysis  Defects of Traditional Link-analysis and Ideas for Improvement  System Framework and Algorithms  Implementation and Experimental Results  Conclusions

M.-S. ChenNTU3 Motivation and Goal  To find the most relevant pages satisfying the user’s information need in the Web  Traditional means for this task Keyword-based search engines  Problems Some relevant pages do not contain the keywords in the page text  An alternative method Analyze the links contained in Web pages instead of ranking by keywords

M.-S. ChenNTU4 HITS (1/3)  Authority pages A page pointed to by many other pages  Hub pages A page pointing to many other pages  Mutual reinforcement An authority pointed to by many hub pages is an even better authority A hub pointing to many authority pages is an even better hub Based on this argument, the goal of HITS is to find the set of best authority pages

M.-S. ChenNTU5 HITS (2/3) q1 q2 q3 page p x p := sum of y q for all q  p  Let x p and y p denote the authority and hub score of page p, respectively q1 q2 q3 page p y p := sum of x q for all p  q

M.-S. ChenNTU6 HITS (3/3)  Iterative algorithm 1. Obtain a set of Web pages using a keyword- based query and expand it to form a base set 2. Assign each page of the base set an initial authority and hub score of 1 3. According to its links, update the scores of each page 4. Normalize the scores so that  ( x p ) 2 =1 and  ( y p ) 2 =1 for all p in the base set 5. Do steps 3 and 4 iteratively until the scores converge

M.-S. ChenNTU7 The Problem with HITS  Links in Web pages only reflect page creators’ judgment  Sometimes a link will not be put in the page even though its destination is very relevant e.g: There will be no link to a company ’ s competitor in the same industry in its homepage  We argue: Page readers’ consideration should be of equal importance

M.-S. ChenNTU8 The Notion of Virtual Links  The basic idea Identify pages that are heavily accessed within a period, and form a “ hot set ” from these pages Create “ virtual links ” for pages in the hot set and incorporate them into the computation of authority scores  Design a Web warehouse for this task and utilize it to identify authoritative Web pages

M.-S. ChenNTU9 System Framework Page Archive Keyword & Ranking Database Web Pages Authority Evaluator Query Interface Clickstream Database Clicking Observer Virtual Link Creator virtual links page content & links keywords scores query results

M.-S. ChenNTU10 Creating Virtual Links  Scenario: A user interested in Java- related Web pages came to our system She submitted a query with keyword “ java ” Assume that the query result contains 100 URLs She clicked top 1-10 of the 100 URLs except the 6 th The hot set consists of the 9 URLs clicked

M.-S. ChenNTU11 Creating Virtual Links (cont ’ d) URL 1 URL 2 Virtual Hub URL 5 URL 6 URL 7 URL 10  2 criteria URL 1 URL 2 Hub 1 URL 5 URL 6 URL 7 URL 10 Hub 2 Hub n

M.-S. ChenNTU12 Algorithm VIPAS (Virtual LInk Powered Authority Search)  Initialization Phase 1. For a query term, perform the regular HITS analysis 2. Collect a base set of pages with computed authority and hub scores and store them in the database  Virtual Link Collection Phase 3. Monitor the user behavior to see whether a URL in the list is clicked by the user or not 4. After a period of user behavior observation, put URLs that are often accessed into the “hot set” 5. Create virtual links for pages in the hot set

M.-S. ChenNTU13 Algorithm VIPAS (cont ’ d)  Refinement Phase 6. For each page in the hot set, compute its new authority and hub scores 7. Run several iterations of score updating for pages in the base set  2 flavors VIPAS-VH(VIPAS with virtual links from a Virtual Hub) VIPAS-TH(VIPAS with virtual links from Top Hubs)

M.-S. ChenNTU14 Finding Hot Sets 1. In an observing period, pay attention to clicks of continuous URLs in the list 2. When a user continuously clicks several URLs and then skips some URLs following, we mark those that have been skipped 3. Exclude pages marked with a frequency greater than  from the forming of hot sets 4. Among pages left, those that are accessed by at least % users are put into the hot set  Some relevant URLs that have already been browsed by the user will be skipped

M.-S. ChenNTU15 Finding Hot Sets (cont ’ d) ………….. clicked skipped clicked ………….. skipped clicked skipped clicked URL 4 is marked, but URL 1 is not URL 4 is marked

M.-S. ChenNTU16 Assigning Weights to Virtual Links Clickstream 1: (t 1,t 2,t 3,t 4,x 1,x 2 ) Clickstream 2: (t 3,x 1,t 1 ) n pages in the hot set: t 1,t 2,…,t n

M.-S. ChenNTU17  Final weight:   For period T i where i  2 Assigning Weights to Virtual Links (cont ’ d) (1/3 is the degeneration factor)

M.-S. ChenNTU18 Computing the New Scores  Let x p and y p denote the authority and hub score of page p, respectively  For each page p, we update p ’s authority score by  Similarly, we update p ’s hub score by

M.-S. ChenNTU19 User-behavior Observation  Use an ASP script 1.The Source of Java(TM) Technology 2.…………………. 3.……… plain URL replaced by wrapper.asp?URL= 1.Increment the click count of 2.Record the time 3.Redirect the user to Query result for keyword: “Java” Query result page

M.-S. ChenNTU20 Implementation and Experiments  Experimental testbed NTUEE website (  Data collection 03/28/ ’ 02 ~ 05/31/ ’ 02  Parameters ParameterValue  20%  40% A A 10 H H

M.-S. ChenNTU21 Evaluation Method  For a keyword, we manually select a list of authority pages and compare it with the output of each algorithm  Discrepancy coefficient  SNURL (H denotes H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 7228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 8682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu

M.-S. ChenNTU22 Discrepancy Coefficient – Regular HITS RankSNURL (H denotes H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 77269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 85892H/html_2000/WWW/faculty/NoSort.html[no title] 94959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff… H/html_2000/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff… H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 41(SN 7228) 

M.-S. ChenNTU23 Discrepancy Coefficient – VIPAS-VH RankSNURL (H denotes H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 77229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 87269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 95892H/html_2000/WWW/faculty/NoSort.html[no title] H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 6(SN 7228) 

M.-S. ChenNTU24 Evaluation Method  Grouping coefficient   Stability The standard deviation of each algorithm ’ s discrepancy coefficients for all of the keywords

M.-S. ChenNTU25 Grouping Coefficient – Regular HITS R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 41(SN 7228)  RankSNURL (H denotes H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 77269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 85892H/html_2000/WWW/faculty/NoSort.html[no title] 94959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff… H/html_2000/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff… H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title]

M.-S. ChenNTU26 Grouping Coefficient – VIPAS-VH R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 6(SN 7228)  RankSNURL (H denotes H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 77229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 87269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 95892H/html_2000/WWW/faculty/NoSort.html[no title] H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff….

M.-S. ChenNTU27 Experimental Results

M.-S. ChenNTU28 Experimental Results (cont ’ d)

M.-S. ChenNTU29 Conclusions  Link-analysis algorithms are popular in Web information retrieval But they need further improvement  In our work, we built a Web warehouse Incorporate user feedback into the identification of authoritative resources (Algorithm VIPAS) Experimental results show that VIPAS is very effective and the warehouse is able to retrieve much more valuable information for users