ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Web Search and Mining Course Overview 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 0: Course Overview.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Information Retrieval in Practice
Information Retrieval - Organization of the course Jian-Yun Nie 聂建云.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Lecture 1: Web Search Overview & Web Crawling
1. Search Engines Architecture Azreen Azman, PhD SMM 5891 All slides ©Addison Wesley, 2008.
Search Engines and Information Retrieval Chapter 1.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Chapter 6: Information Retrieval and Web Search
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Information Retrieval
IR. SI 650/EECS 549 Information Retrieval People search the Web daily Search engines –Google –Bing –Baidu –Yandex Information Retrieval is about search.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Rensselaer Polytechnic Institute CSCI-4220 – Network Programming David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st edition.
Information Retrieval in Practice
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Search Engine Architecture
Lecture 1: Introduction and the Boolean Model Information Retrieval
Information Retrieval (in Practice)
Modern Information Retrieval
Information Retrieval and Web Search
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Information Retrieval and Web Search
Information Retrieval and Web Search
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval
CSCE 561 Information Retrieval System Models
Thanks to Bill Arms, Marti Hearst
Text Categorization Rong Jin.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Data Mining Chapter 6 Search Engines
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval Systems
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Chapter 5: Information Retrieval and Web Search
Search Engine Architecture
Information Organization: Overview
Information Retrieval and Web Design
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH Lecture 1: Introduction S. M. Vahidipour Vahidipour@kashanu.ac.ir

Outline Introduction to the Course Overview of the Semester

Text Books Search Engines: Information Retrieval in Practice W. Bruce Croft, Donald Metzler, Trevor Strohman Pearson Education, 2010

Text Books Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition) Ricardo Baeza-Yates, Berthier Ribeiro-Neto ACM Press Books, 2010

Text Books Introduction to Information Retrieval C. Manning, P. Raghavan, and H. Schütze Cambridge University Press, 2008

Search and Information Retrieval Search on the Web is a daily activity for many people throughout the world Google: 40,000 searches per second (3.5 billion per day; 1.2 trillion per year) Yahoo: 3,200 searches per second (280 million per day; 8.4 billion per month) Bing: 927 searches per second ( 80 million per day; 2.4 billion per month) 106: Million, 109: billion, 1012: Trillion, 1015: Quadrillion, 1018: Quintillion, …

Search and Information Retrieval Search and communication are most popular uses of the computer. Applications involving search are everywhere. The field of computer science that is most involved with R&D for search is information retrieval (IR).

Information Retrieval “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) General definition that can be applied to many types of information and search applications Still appropriate after 40 years. Primary focus of IR since the 50s has been on text and documents

Data/Information Storage Search

Data/Information Structured Unstructured

Structured vs. Unstructured Data

Examples: Common properties What is a Document? IM: instant messages What is a Document? Examples: Web pages, email, books, news stories, scholarly papers, text messages, Word™, Powerpoint™, PDF, forum postings, patents, IM (Instant Messages) sessions, etc. Common properties Significant text content Some structure (≈ attributes in DB) Papers: title, author, date Email: subject, sender, destination, date

Exact matching of words is not enough Comparing Text Comparing the query text to the document text and determining what is a good match is the core issue of information retrieval. Exact matching of words is not enough Many different ways to write the same thing in a “natural language” like English Does a news story containing the text “karl benz built the first automobile in 1886” match the query “car inverter”? Defining the meaning of a word, a sentence, a paragraph, or a story is more difficult than defining the meaning of a database field.

IR is more than just text, and more than just web search Dimensions of IR IR is more than just text, and more than just web search although these are central People doing IR work with different media, different types of search applications, and different tasks Three dimensions of IR Content Applications Tasks 20

New applications increasingly involve new media The Content Dimension Textual data, but… New applications increasingly involve new media Video, photos, music, speech Scanned documents (for legal purposes) Like text, content is difficult to describe and compare Text may be used to represent them (e.g., tags) IR approaches to search and evaluation are appropriate

The Application Dimension Desktop search Personal enterprise search See above plus recent web pages P2P search No centralized control File sharing, shared locality Literature search Forum search … Web search Most common Vertical search Restricted domain/topic Books, movies, suppliers Enterprise search Corporate intranet Databases, emails, web pages, documentation, code, wikis, tags, directories, presentations, spreadsheets

The Task Dimension User queries / ad-hoc search Range of query enormous, not pre-specified Filtering Given a profile (interests), notify about interesting news stories Identify relevant user profiles for a new document Classification / categorization Automatically assign text to one or more classes of a given set Identify relevant labels for documents Question answering Similar to search Automatically answer a question posed in natural language Provide concrete answer, not list of documents.

Users and information needs کاربران یک موتور جستجو قضات نهایی کیفیت هستند Main Issues in IR Relevance A relevant document contains the information a user was looking for when he/she submitted the query Evaluation How well does the ranking meet the expectation of the user Users and information needs Users of a search engine are the ultimate judges of quality

Big issues include main IR issues but also some others… جمع آوری داده های جدید IR and Search Engines A search engine is the practical application of information retrieval techniques to large scale text collections Big issues include main IR issues but also some others… Information Retrieval Search Engines Performance: Efficient search and indexing Incorporating new data: Coverage and freshness Scalability: Growing with data and users Adaptability: Tuning for applications Specific problems: e.g., Spam Relevance: Effective ranking Evaluation: Testing and measuring Information needs: User interaction Additional

Outline Introduction to the Course Overview of the Semester

Search Engine Indexing Querying Basic architecture Text acquisition گرفتن متن Search Engine Basic architecture Main issues Indexing Text acquisition Text transformation Index creation Querying User interaction Ranking Evaluation

Boolean retrieval Vector space model Probabilistic models Overview of Traditional Retrieval Models Boolean retrieval Vector space model Probabilistic models

Overview of Evaluation Metrics Effectiveness metrics Efficiency metrics Training, testing, and statistics

Advanced Retrieval Models Language model-based retrieval Learning to rank 30

Language model-based approaches Word Mismatch Problem Language model-based approaches Translation model Topic model Word cluster model Wordnet Dependency model Query expansion approaches

Advanced/Specific IR Tasks Query log and query suggestion Personalized search Information extraction Cross-language IR Question answering Recommendation systems Enterprise search Digital library Structured text retrieval Multimedia retrieval

Query Log and Query Suggestion

Personalized Search

Information Extraction

Cross-language Retrieval

Question Answering

Recommendation Systems

Presans: مطبوعات Enterprise Search

Digital Library 40

Structured Text Retrieval

Multimedia Retrieval

Questions?