C.Watterscsci64031 Information Retrieval Csci6403 Dr.Carolyn Watters.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
ISP 433/533 Week 2 IR Models.
Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina.
INFO 624 Week 3 Retrieval System Evaluation
Agent Technology for e-Commerce
1 Information Retrieval and Web Search Introduction.
Web Mining Research: A Survey
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
IR Models: Review Vector Model and Probabilistic.
CS 430 / INFO 430 Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
CSCI-235 Micro-Computer in Science Internet Search.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
C.Watterscsci64031 Term Frequency and IR. C.Watterscsci64032 What is a good index term Occurs only in some documents ??? –Too often – get whole doc set.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Subject Headings Objective: Students will understand that both books and articles are assigned words to describe their contents. These terms are referred.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Directories. Our Search In the first module of this series, NetSearch: Search Tools, we began our search on the topic of “pollution.” “ NetSearch” is.
Automated Information Retrieval
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Retrieval and Web Search
Search Engine Architecture
CS 430: Information Discovery
Information Retrieval and Web Search
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
INFORMATION RETRIEVAL
Information Retrieval and Web Search
Vincent Granville, Ph.D. Co-Founder, DSC
Multimedia Information Retrieval
CS 430: Information Discovery
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Search Engine Architecture
Web Mining Research: A Survey
Information Retrieval and Web Design
Information Retrieval and Web Design
Information Retrieval and Web Search
Presentation transcript:

C.Watterscsci64031 Information Retrieval Csci6403 Dr.Carolyn Watters

C.Watterscsci64032 Outline Definitions Information Retrieval Information Theory Feature Sets & Term characteristics

C.Watterscsci64033 General Terms & Concepts Data Information Retrieval Document Question Answering Filtering Clustering Browsing

C.Watterscsci64034 History Card catalog Hole punch Databases and queries Multimedia (images, audio, etc) Web

C.Watterscsci64035 Examples New York Times Google Amazon Medline Lexis/Nexis

C.Watterscsci

C.Watterscsci64037 IR and CS? What are these systems based on? How can we make them better? How do we know if they are effective? What else could we do using these techniques?

C.Watterscsci64038 IR and Databases IRDBMS Dataunstructuredstructured AttributesvagueWell defined queriesKeyword & features SQL defined Resultsimpreciseexact

C.Watterscsci64039 Basic Ideas/Problems Behind IR Retrieve text that contains the answer Use keywords to represent query Assume user can articulate need No universal categorization of data Relevant items are similar to query Relevant items are similar to each other More than one right answer Results may “satisfice”

C.Watterscsci Similarity Query -> document Document -> document Similar? –String matching –Controlled vocabulary match –Same meaning –Probability about same topic

C.Watterscsci Using Keywords as Feature Set Bag of Words Approach Compare words as independent tokens Why would we do this? For Example – DOW weathers storm –storm weathers door

C.Watterscsci Important Words? Enron Ruling Leaves Corporate Advisers Open to Lawsuits By KURT EICHENWALD A ruling last week by a federal judge in Houston may well have accomplished what a year's worth of reform by lawmakers and regulators has failed to achieve: preventing the circumstances that led to Enron's stunning collapse from happening again. To casual observers, Friday's decision by the judge, Melinda F. Harmon, may seem innocuous and not surprising. In it, she held that banks, law firms and investment houses — many of them criticized on Capitol Hill for helping Enron construct off-the-books partnerships that led to its implosion — could be sued by investors who are seeking to regain billions of dollars they lost in the debacle.

C.Watterscsci IR and the Bag of Words Find all words in document Compare query words to these words Works pretty well!!! Improvements –???

C.Watterscsci Information Theory & IR Shannon 1948 Information content (value) of a message depends on both receiver’s knowledge and message content

C.Watterscsci Try this Merry …. Happy …. Prime Minister …. Professor….. teaches computer science. Tomorrow we expect a high temperature of …. Warning …….

C.Watterscsci Information Theory Value or content of a message is based on how much the receiver’s uncertainty (entropy) is reduced Predictability of the message (impact of content) –Very predictable – low uncertainty – low entropy Hello, good day, how are you? Fine. –Unpredictable – high uncertainty – high entropy Move your car. Leave the building.

C.Watterscsci Information Content Function H defines the Information Content H(p) = -log p H(p) is the a priori probability that a message could be predicted So, if a receiver can predict a message With p=1 then H(1) = 0 If cannot predict message Then p=0 and H(0) is undefined

C.Watterscsci Calculation of Entropy Example – receive one letter of the alphabet H = log 1/26 or 4.7 bits if all equally likely 4.14 bits given known distribution Given n messages, the average information content (bits) of any one of those messages is H = - p r log p r Average Entropy is maximized when? –All messages are equally likely –When would this occur?

C.Watterscsci Entropy and Words Given D unique words in a vocabulary H = -  p r log p r Turns out that DH (bits) 10, , ,

C.Watterscsci Using Entropy Information Content is additive H(p 1, p 2 ) = H(p 1 ) + H( p 2 ) So what?? Google Queries some terms have more information value some retrieval messages have more information value SO??

C.Watterscsci Next? Examine the nature of these words Why? What is the relative value of search terms? What is the relative value of terms in document set?

C.Watterscsci For Tuesday Read the handout article Prepare a review for it using the form found on the web site Articles on Reviewing can be found at the end of the Topics Page Think about the notion of finding information based only on the words used in the text!