CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics.

Slides:



Advertisements
Similar presentations
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Computer Science and.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Data-rich Section Extraction from HTML pages Introducing the DSE-Algorithm Original Paper from: Jiying Wang and Fred H. Lochovsky Department of Computer.
Web Search – Summer Term 2006 VII. Selected Topics - The Hilltop Algorithm (c) Wolfgang Hürst, Albert-Ludwigs-University.
A Comparison of Manual and Automatic Melody Segmentation Massimo Melucci Nicola Orio.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
WEB SPAM A By-Product Of The Search Engine Era Web Enhanced Information Management Aniruddha Dutta Department of Computer Science Columbia University.
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Search Tools and Search Engines Searching for Information and common found internet file types.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
What Does the User Really Want ? Relevance, Precision and Recall.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A PRESENTATION on What is this Page Known for? Computing Web Page Reputations D. Rafiei.
CS 440 Database Management Systems Web Data Management 1.
IR Theory: Web Information Retrieval. Web IRFusion IR Search Engine 2.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
1 Ranking. 2 Boolean vs. Non-boolean Queries Until now, we assumed that satisfaction is a Boolean function of a query –it is easy to determine if a document.
1 Ranking. 2 Boolean vs. Non-boolean Queries Until now, we assumed that satisfaction is a Boolean function of a query –it is easy to determine if a document.
DATA MINING Introductory and Advanced Topics Part III – Web Mining
HITS Hypertext-Induced Topic Selection
Methods and Apparatus for Ranking Web Page Search Results
CIW Lesson 6 Web Search Engines.
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Augmenting (personal) IR
A Comparative Study of Link Analysis Algorithms
Thanks to Bill Arms, Marti Hearst
PageRank algorithm based on Eigenvectors
9 Algorithms: Indexing Now where did I put that?.
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Chapter 5: Information Retrieval and Web Search
Information Retrieval and Web Design
Introduction to Search Engines
IR Theory: Web Information Retrieval
Presentation transcript:

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics K. Bharat & G. A. Mihaila WWW10 Conference, May 2001, Hong Kong by Osama Ahmed Khan 10/06/2005

Problem  Query on Popular Topic  Content Analysis Solution  Most Authoritative Pages

Technical Terms  Expert  Recommendation  Non-affiliation

Hilltop Algorithm 1.Expert Lookup  Detecting Host Affiliation  Expert Selection  Expert Indexing 2.Target Ranking  Computing Expert Score  Computing Target Score

Detecting Host Affiliation  Conditions  Same first 3 octets of IP  Same rightmost non-generic token of hostname  Union-Find Algorithm

Expert Selection  Retrieve all webpages with: Out-degree > Threshold (k) (e.g. k = 5)  Expert will have: URLs pointing to k distinct non-affiliated hosts

Expert Indexing  Inverted Index  Mapping Keywords to Experts  Key Phrases  Match Positions

Computing Expert Score  Condition  Atleast 1 URL with all query keywords  Expert Score: (S 0, S 1, S 2 ) S i = SUM {key phrases p with k-i query terms} * LevelScore(p) * FullnessFactor(p,q) Expert_Score = 2 32 * S * S 1 + S 2

Computing Target Score  Condition  Atleast 2 non-affiliated experts  Target Score: Edge_Score(E,T) = Expert_Score(E) * SUM {query keywords w} * occ(k,T) Target_Score = Sum {Edge_Score(E,T)}

Evaluation 1.Locating Specific Popular Targets

Evaluation Evaluation (Contd.) 2.Gathering Relevant Pages

Conclusion  Characteristics  Popular Queries  Expert Subset  Hilltop vs.  PageRank  Topic Distillation

Thank You