CS276 Information Retrieval and Web Search

Slides:



Advertisements
Similar presentations
Web Development & Design Foundations with XHTML
Advertisements

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
Metadata Quality Assurance : The University of North Texas Libraries Experience Daniel Gelaw Alemneh & Hannah Tarver 3rd annual Texas Conference on Digital.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 1.1 Chapter 1 : Introduction The World-Wide-Web.
MINAP: DATA HANDLING PROCEDURES & DATA ACCESS Data Management Group, 13 July 2009.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Traditional IR models Jian-Yun Nie.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Application of Ensemble Models in Web Ranking
Chapter 5: Introduction to Information Retrieval
INF 2914 Information Retrieval and Web Search Lecture 1: Overview These slides are adapted from Stanford’s class CS276 / LING 286 Information Retrieval.
Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.
Search Engine Marketing Free Traffic for Your Web Site Paul Allen, CEO
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Web search basics.
ITCS 6265 Information Retrieval and Web Mining Lecture 10: Web search basics.
Search Engines and Information Retrieval Chapter 1.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
CSCI 5417 Information Retrieval Systems Jim Martin
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-1 How Search Engines Work Today we show how a search engine works  What happens when.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
| 1 › Gertjan van Noord2014 Search Engines Lecture 7: relevance feedback & query reformulation.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
Brief (non-technical) history Full-text index search engines Altavista, Excite, Infoseek, Inktomi, ca Taxonomies populated with web page Yahoo.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.
CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Search Engines By: Faruq Hasan.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engines Information Technology and Social Life March 2, 2005.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 19 Web search basics.
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
WEB SEARCH BASICS By K.KARTHIKEYAN. Web search basics The Web Ad indexes Web spider Indexer Indexes Search User Sec
Search Engine Optimization
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Modified by Dongwon Lee from slides by
Search Engine Architecture
Lecture 16: Web search/Crawling/Link Analysis
Search Engines & Subject Directories
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Introduction to Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Retrieval and Web Design
Presentation transcript:

CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 15: Web search basics

Brief (non-technical) history Early keyword-based engines ca. 1995-1997 Altavista, Excite, Infoseek, Inktomi, Lycos Paid search ranking: Goto (morphed into Overture.com  Yahoo!) Your search ranking depended on how much you paid Auction for keywords: casino was expensive!

Brief (non-technical) history 1998+: Link-based ranking pioneered by Google Blew away all early engines save Inktomi Great user experience in search of a business model Meanwhile Goto/Overture’s annual revenues were nearing $1 billion Result: Google added paid search “ads” to the side, independent of search results Yahoo followed suit, acquiring Overture (for paid placement) and Inktomi (for search) 2005+: Google gains search share, dominating in Europe and very strong in North America 2009: Yahoo! and Microsoft propose combined paid search offering

Paid Search Ads Algorithmic results.

Web search basics User The Web Indexer Indexes Ad indexes Web spider Sec. 19.4.1 Web search basics User The Web Web spider Indexer Search Indexes Ad indexes

User Needs Need [Brod02, RL04] Sec. 19.4.1 User Needs Need [Brod02, RL04] Informational – want to learn about something (~40% / 65%) Navigational – want to go to that page (~25% / 15%) Transactional – want to do something (web-mediated) (~35% / 20%) Access a service Downloads Shop Gray areas Find a good hub Exploratory search “see what’s there” Low hemoglobin United Airlines Seattle weather Mars surface images Canon S410 Car rental Brasil

How far do people look for results? (Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)

Users’ empirical evaluation of results Quality of pages varies widely Relevance is not enough Other desirable qualities (non IR!!) Content: Trustworthy, diverse, non-duplicated, well maintained Web readability: display correctly & fast No annoyances: pop-ups, etc. Precision vs. recall On the web, recall seldom matters What matters Precision at 1? Precision above the fold? Comprehensiveness – must be able to deal with obscure queries Recall matters when the number of matches is very small User perceptions may be unscientific, but are significant over a large aggregate

Users’ empirical evaluation of engines Relevance and validity of results UI – Simple, no clutter, error tolerant Trust – Results are objective Coverage of topics for polysemic queries Pre/Post process tools provided Mitigate user errors (auto spell check, search assist,…) Explicit: Search within results, more like this, refine ... Anticipative: related searches Deal with idiosyncrasies Web specific vocabulary Impact on stemming, spell-check, etc. Web addresses typed in the search box “The first, the last, the best and the worst …”

The Web document collection Sec. 19.2 The Web document collection No design/co-ordination Distributed content creation, linking, democratization of publishing Content includes truth, lies, obsolete information, contradictions … Unstructured (text, html, …), semi-structured (XML, annotated photos), structured (Databases)… Scale much larger than previous text collections … but corporate records are catching up Growth – slowed down from initial “volume doubling every few months” but still expanding Content can be dynamically generated The Web

Tantangan Web IR Data Terdistribusi Data Mudah Berubah Volume yang Besar Data Tak terstruktur dan berulang Kualitas Data Data yang Heterogen Pemakai yang Variatif Latar Brlakang Kemampuan membuat Query Tidak Sabar/ jeli melihat Hasil

Ranking Vectore Space Page Rank Striktur Dokumen Term Proximity Relevance Feedback

Ranking : Hyperlink Web dianggap populer jika banak link yang masuk Web dianggap memiliki sumber yang baik jika memiliki link keluar yang banyak dan baik Algoritma Page Rank : menghitung jumlah link yang keluar dan masuk dalam suatu web P(A) = (1-d) + d∑ P(Di) / C(Di)

inis