1 TP6084 CAPAIAN MAKLUMAT INFORMATION RETRIEVAL (IR) Introduction.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
INFORMATION RETRIEVAL WEEK 1 AND 2
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
1 Information Retrieval and Web Search Introduction.
Modern Information Retrieval Chapter 1 Introduction.
Focus on Readers Service for Education How to Use Library Services and Resources Effectively USST Library.
Overview of Search Engines
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
FEB. 3, 2009 LIBRARY BASICS AND CLASSIFICATION AND ORGANIZATION LIBS101 – Spring 2010.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Modern Information Retrieval Computer engineering department Fall 2005.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Finding Credible Sources
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Information Retrieval
Information Retrieval Systems Info624 – Week 1 Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
INFORMATION STROAGE AND RETRIEVAL SYSTEM By Ms. Preeti Patel Lecturer School of Library And Information Science DAVV, Indore
UNIVERSITY UTARA MALAYSIA COLLEGE OF ARTS & SCIENCES.
Definition, purposes/functions, elements of IR systems Lesson 1.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Information Retrieval in Practice
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Retrieval (in Practice)
Modern Information Retrieval
Information Retrieval and Web Search
Information Retrieval and Web Search
Federated & Meta Search
Information Retrieval and Web Search
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Search
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

1 TP6084 CAPAIAN MAKLUMAT INFORMATION RETRIEVAL (IR) Introduction

2 What will be covered today… Course overview Introduction to IR

What this course about Search Engines –What is it? –How to build one? –How to evaluate? –What are the models? –How do Google rank results? –etc Models? What are the research in this area..? What about Mutimedia data? What about semantic web? etc….. 3

4 Course Overview What this course is …about –How people search and find information. –How computers store and retrieve information. –How computer systems are designed to help people find information they need.

5 Course Overview The course will emphasize on –Understanding of Theories Tools Algorithms, and Evaluations for Information Retrieval Systems –Viewing web search engine as the practical application of IR system

6 Course Content (subject to change) Introduction IR and Search Engine Architecture of Search Engine Text processing Indexing and Ranking Queries & Interface Retrieval Models Evaluation Classification & Clustering Social Search

7 References The textbook for this course: Croft, W.B., Metzler, D. & Strohman, T Search Engines: Information Retrieval in Practice. New York: Addison Wesley Other recommended books: –Grossman, D.A. & Frieder, D.A Information Retrieval: Algorithms & Heuristics, 2 nd Edition. Berlin: Springer. –Baeza-Yates, R. & Ribeiro-Neto, B Modern Information Retrieval. New York: Addison Wesley –Manning, C., Raghavan, P. & Schutze, H Introduction to Information Retrieval. New York: Cambridge University Press For general reading on search engine, you must read: –Batella, J The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Portfolio Hardcover. List of related journal/proceedings articles will be informed time by time during class.

8 Assessment Exam – 50% Project/Assignments – 50% Lectures: –Monday (11 am – 12 noon) BK8 –Thursday (10 am – 12 noon) BK8

Any problem..? Dr. Shereena Arif (PhD) Room H-2-8, IT School, Faculty of Information Science & Technology, UKM Bangi. OR Website/blog : shereenarif.wordpress.com Blog dedicated for this course : tp6084.wordpress.com Any media suggested for communication? 9

Shall we start ……… 10

11 What is IR? Finding relevant information in large collections of data In such a collection you may want to find: –‘Give me information on the history of the Tun Razak’ An article about Tun Razak (text retrieval) – ‘What does a brain tumor look like on a CT-scan’ A picture of a brain tumor (image retrieval) –`It goes like this: I do, I do, I do, I do do do do do... ' A certain song (music retrieval)

12 What is IR? IR is a branch of applied computer science focusing on the representation, storage, organization, access, and distribution of information. [System Centered] IR involves helping users find information that matches their information needs. [User Centered]

13 Text Retrieval Online library catalogs (OPAC) Internet search engines, such as –AltaVista, Google, Ilse Specialized systems (aka vendors): – MEDLINE (medical articles) – Lexis-Nexis (legal, business, academic,... ) – Westlaw (legal articles) – Dialog (business information)

14 Retrieval vs. Browsing Popular Web Directories: – Yahoo!, Open Directory Project (dmoz) The user has to ‘guess’ the ‘right’ directories to find the information –The user has to adapt to the designers' conceptualization of the directory The goal of information retrieval is to provide immediate random access to the data –The user can specify his information need

15 IR vs. Database Querying IR is not the same thing as querying a database Database querying assumes that the data is in a standardized format. Transforming all information, news articles, web sites into a database format is difficult and impossible for large data collections. Text retrieval can work with plain, unformatted data.

16 Data Retrieval vs. Information Retrieval Data retrieval Information retrieval ContentData Information Data objectTable Document MatchingExact match Partial match, best match Items wantedMatching Relevant Query languageSQL(artificial) Natural Query specification Complete Incomplete ModelDeterministic Probabilistic Highly structure Less structure

17 Relevance as Similarity A fundamental idea within IR is: ‘A document is relevant to a query if they are similar’ Similarity can be defined as: – string matching/comparison – similar vocabulary – same meaning of text

18 The Ubiquity of IR Search engines Information filtering – routing –Text categorization Detecting information structure –Hyperlink generation –Topic/Information detection/Screening –Portal development and maintenance –Digital libraries Question Answering

19 “ Web brings IR to the Center of the Stage ” IR has become a center of the focus in the Web era. Its theories, techniques, and applications have reached many fields where processing large amount of information is essential.

20 Challenges of IR User Information Search/select Info. Needs Queries Stored Information Translating info. needs to queries Matching queries To stored information Query result evaluation: Does the information found match user’s information needs?

21 Data and Information Data –String of symbols associated with objects, people, and events –Values of an attribute Data need not have meaning to everyone Data must be interpreted with associated attributes. Information –The meaning of the data interpreted by a person or a system –Data that changes the state of a person or system that perceives it. –Data that reduces uncertainty. if data contain no uncertainty, there are no information with the data. Examples: It snows in the winter. It does not snow this winter.

22 Information and Knowledge knowledge –Structured information through structuring, information becomes understandable –Processed Information through processing, information becomes meaningful and useful –information shared and agreed upon within a community Data information knowledge

23 Text Strings of ASCII symbols or Unicode –structured by the author –indexed by information service providers Representation of natural languages people use –To convey meanings –To communicate between readers and authors. Data or information? –If it can be understood, it’s information. by Whom? A person or a system?

24 Documents Logical unit of text –articles, books, –links, web pages Other components that come with the text –figures, charts, graphics –multimedia

25 Textual Data Repository of human intellectuals –Rich and diverse resources for all answers. If it is written, it is there (in text) –Meaningful and understandable (to users). Simple ASCII representation Free of pre-formatted structures –continuous –separated into documents Easy to process by the computer – Machine Intensive (not labor intensive)

26 Problems with Text Massive –Any IR system needs the capability of large scale data processing. –Use of indexes and various representations are required. Inconsistent –It’s a human language Syntactical and semantic variances –Same information expressed in different ways. –Different information expressed in similar ways. Incomplete –It uses common knowledge. –It’s an open system.

27 Retrieval –What do we retrieve? Data Information Knowledge –We retrieve documents that contains text which carries information. Information can be anywhere in the text, in the links, in the process of text.

28 Information Retrieval Are they the same? –Text retrieval –Document retrieval –Information retrieval

29 Information Retrieval Conceptually, information retrieval is used to cover all related problems in finding needed information Historically, information retrieval is about document retrieval, emphasizing document as the basic unit Technically, information retrieval refers to (text) string manipulation, indexing, matching, querying, etc.

30 IR Systems IR systems contain three components: –System –People –Documents (information items) User SYSTEMS Browsing Retrieval Documents (Database)

31 Basic Overview of Retrieval Process

32 Detail Overview of Retrieval Process

33 Historical Summary 1960’s –Basic advances in retrieval and indexing techniques 1950: Calvin N. Moors coins the term `Information Retrieval' 1959: Luhn describes statistical retrieval 1960: Maron and Kuhns dene a probabilistic model of IR 1966: Craneld project denes evaluation measures 1968: Gerard Salton's rst book about the SMART retrieval system

34 Historical Summary 1990’s and 2000’s –Large-scale, full-text IR and filtering experiments and systems –Dominance of ranking –Many Web-based retrieval engines –Interfaces and browsing –Multimedia and multilingual –Machine learning techniques –Question answering (factoids) The Future –IR in context (the right answer for you now here) –Logic-based IR? –NLP? –Integration with other functionality –Distributed, heterogeneous database access

35 End of Topic 1