Information Retrieval Search Engine Technology (4) Prof. Dragomir R. Radev.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Mathematics of Information Retrieval 11/21/2005 Presented by Jeremy Chapman, Grant Gelven and Ben Lakin.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Information Retrieval in Practice
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
ISP433/633 Week 3 Query Structure and Query Operations.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Using TF-IDF to Determine Word Relevance in Document Queries
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Information Retrieval
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Information Retrieval
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
(C) 2003, The University of Michigan1 Information Retrieval Handout #2 February 3, 2003.
Information Retrieval Search Engine Technology (8) Prof. Dragomir R. Radev.
(C) 2003, The University of Michigan1 Information Retrieval Handout #5 January 28, 2005.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
Information Retrieval (4) Prof. Dragomir R. Radev
Definition, purposes/functions, elements of IR systems Lesson 1.
 GEETHA P.  Originally coined by Tim O’Reilly Publishing Media  Second generation of services available on www.  Lets people collaborate and share.
Automated Information Retrieval
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Semantic Processing with Context Analysis
Lecture 12: Relevance Feedback & Query Expansion - II
Web Data Extraction Based on Partial Tree Alignment
Handling Data Using Databases
Relevance Feedback Hongning Wang
Improved Word Alignments Using the Web as a Corpus
Chapter 5: Information Retrieval and Web Search
Retrieval Utilities Relevance feedback Clustering
Information Organization: Overview
Presentation transcript:

Information Retrieval Search Engine Technology (4) Prof. Dragomir R. Radev

SET/IR – W/S 2009 … 7. Approximate string matching …

Levenshtein edit distance Examples: –Theatre-> theater –Ghaddafi->Qadafi –Computer->counter Edit distance (inserts, deletes, substitutions) –Edit transcript Done through dynamic programming

Recurrence relation Three dependencies –D(i,0)=i –D(0,j)=j –D(i,j)=min[D(i-1,j)+1,D(1,j-1)+1,D(i-1,j-1)+t(i,j)] Simple edit distance: –t(i,j) = 0 iff S1(i)=S2(j)

Example Gusfield 1997 WRITERS V11 I22 N33 T44 N55 E66 R77

Example (cont’d) Gusfield 1997 WRITERS V I N T44444* N55 E66 R77

Tracebacks Gusfield 1997 WRITERS V I N T44444* N55 E66 R77

Weighted edit distance Used to emphasize the relative cost of different edit operations Useful in bioinformatics –Homology information –BLAST –Blosum – heidelberg.de:8000/misc/mat/blosum50.htmlhttp://eta.embl- heidelberg.de:8000/misc/mat/blosum50.html

Links Web sites: – – Demo: –/home/cs6998/tools/editDistance/dp/l.pl theater theatre – h.htmlhttp://nayana.ece.ucsb.edu/imsearch/imsearc h.html

Other methods Cosine Generation probabilities (language modeling) (exp)KL-divergence

SET/IR – W/S 2009 … 8. Query expansion Relevance feedback …

Query expansion

Corpus-based: mine query logs NLP-based Vector-space relevance feedback

Relevance feedback Problem: initial query may not be the most appropriate to satisfy a given information need. Idea: modify the original query so that it gets closer to the right documents in the vector space

Relevance feedback Automatic Manual Method: identifying feedback terms Q’ = a 1 Q + a 2 R - a 3 N Often a 1 = 1, a 2 = 1/|R| and a 3 = 1/|N|

Example Q = “safety minivans” D 1 = “car safety minivans tests injury statistics” - relevant D 2 = “liability tests safety” - relevant D 3 = “car passengers injury reviews” - non- relevant R = ? S = ? Q’ = ?

Pseudo relevance feedback Automatic query expansion –Thesaurus-based expansion (e.g., using latent semantic indexing – later…) –Distributional similarity –Query log mining

Examples Book: publication, product, fact, dramatic composition, record Computer: machine, expert, calculator, reckoner, figurer Fruit: reproductive structure, consequence, product, bear Politician: leader, schemer Newspaper: press, publisher, product, paper, newsprint Distributional clustering: Lexical semantics (Hypernymy): Book: autobiography, essay, biography, memoirs, novels Computer: adobe, computing, computers, developed, hardware Fruit: leafy, canned, fruits, flowers, grapes Politician: activist, campaigner, politicians, intellectuals, journalist Newspaper: daily, globe, newspapers, newsday, paper

Examples (query logs) Book: booksellers, bookmark, blue Computer: sales, notebook, stores, shop Fruit: recipes cake salad basket company Games: online play gameboy free video Politician: careers federal office history Newspaper: online website college information Schools: elementary high ranked yearbook California: berkeley san francisco southern French: embassy dictionary learn

[Otterbacher et al. HLT EMNLP 2005]

Readings 4: MRS15, MRS16 5: MRS17 6: MRS18, MRS19