ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Modern Information Retrieval
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
IR Models: Review Vector Model and Probabilistic.
Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.
Chapter 5: Information Retrieval and Web Search
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Information Retrieval in Practice
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Lemur Application toolkit Kanishka P Pathak Bioinformatics CIS 595.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Chapter 23: Probabilistic Language Models April 13, 2004.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Developing GRID Applications GRACE Project
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval on the World Wide Web
MYCIN  MYCIN was an early backward chaining expert system that used artificial intelligence to identify bacteria causing severe infections, such as bacteremia.
Recuperação de Informação B
Information Retrieval and Web Design
Recuperação de Informação B
Information Retrieval and Web Design
Probabilistic Ranking of Database Query Results
Introduction to Search Engines
Presentation transcript:

ITCS 6010 Natural Language Understanding

Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of natural language, and, natural language understanding devoted to making computers "understand" statements written in human languages Subset of AI and linguistics Has several categories Open domain question answering Natural interfaces to databases Text-based natural language research Dialogue-based natural language research

Text-based Research Performed with respect to text-based applications e.g. magazines, newspapers, messages Information extraction and comprehension Document Retrieval Translations Summarization

Dialogue-based Research Performed with respect to dialogue-based applications e.g. question-answering systems, automated help centers Difficult because dialogue-based applications manage a natural flowing dialogue between the user and the interface

Natural Language Interface to Databases (NLIDB) Allows users to access information from database using natural language queries Removes user’s need to know structure of database and details about data Examples: RENDEVOUS, ASK and LANGUAGEACCESS

Architecture of NLIDB General architecture Linguistic front-end Database back-end Front-end Accepts natural language question as input Translate question to a meaning representation language (MRL)

Architecture of NLIDB (cont’d) Back-end Accepts MRL Translate MRL to supported database language Executes query Results typically presented in sub-set of natural langauge

Natural Language Question Answering (NLQA) Process of retrieving answers for questions Questions posed in natural language Precise answer presented in natural language

Question Answering Using Statistical Model (QASM) Converts natural language questions into search engine specific query Premise: There exists a best possible operator to apply on a natural language question

QASM (cont’d) How it works: Classifier determines best operator to apply to a NL question Operator produces new query that improves upon original Operator matched to question-answer pair Expectation maximization (EM) algorithm stabilizes missing data i.e. paraphrased questions Iteratively maximizes likelihood estimation

Probabilistic Phrase Reranking (PPR) PPR A process that goes through a set of subtasks to retrieve most relevant answer to proposed question Subtasks: Query modulation Question converted to appropriate query Question Type Recognition Queries organized according to the question type i.e. location, definition, person, etc. Document Retrieval Most relevant unit of information e.g. documents are returned in this stage i.e. the units with highest probability of containing the answer

Probabilistic Phrase Reranking (PPR) cont’d Subtasks (cont’d): Passage/Sentence Retrieval Sentences, phrases or textual units that contain answers are identified from information unit returned in previous task Answer Extraction Chosen textual units are split into phrases Each is a potential answer Phrase/Answer Reranking Phrases generated are ranked Top of the list - Phrase with greatest possibility of containing correct answer

Bayesian Approach Uses probabilistic IR model and Bayes’ Rule Goal of probabilistic model: Estimate probability that a document, d k, is relevant (R) to a query, q i.e. P q ( R | d k ) Each document represented by set of words Words stemmed Suffixes and prefixes removed Now known as index terms

Bayesian Approach (cont’d) Each document represented by a vector, t = ( t 1, t 2, …., t p ) where p is the number of index terms Bayes’ Rule applied to model to express probability that document relevant to a specific query, q P q ( R | t ) α P q ( t | R ) P q ( R )

Bayesian Approach (cont’d) Assumption Each word is independent given relevance and non-relevance of document Results in expression for log odds of relevance

Bayesian Approach (cont’d) Document is relevant if: User’s needs satisfied Frequency of terms in relevant and non-relevant documents retrieved Initial status of documents unknown Ad hoc estimation of probabilistic model parameters used to determine an initially-ranked list of documents Strengths of approach: Initial document ranking not based on ad hoc considerations provided Automatic mechanism for learning and incorporating relevance information from other queries provided