NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Chapter 5: Introduction to Information Retrieval
Text Databases Text Types
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Learning for Text Categorization
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Minimum Spanning Trees Displaying Semantic Similarity Włodzisław Duch & Paweł Matykiewicz Department of Informatics, UMK Toruń School of Computer Engineering,
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Vector Space Model CS 652 Information Extraction and Integration.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Chapter 5: Information Retrieval and Web Search
Natural Language Understanding
Information Retrieval in Practice
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Text mining.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 6: Information Retrieval and Web Search
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
SINGULAR VALUE DECOMPOSITION (SVD)
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 23: Probabilistic Language Models April 13, 2004.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Link Distribution on Wikipedia [0407]KwangHee Park.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
14.0 Linguistic Processing and Latent Topic Analysis.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Best pTree organization? level-1 gives te, tf (term level)
Information Retrieval: Models and Methods
Lecture 15: Text Classification & Naive Bayes
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Multimedia Information Retrieval
Basic Information Retrieval
Michal Rosen-Zvi University of California, Irvine
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Information Retrieval
CS 430: Information Discovery
Presentation transcript:

NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy

Agenda Problem Key Methodology Naïve Bayes LSA Results & Comparison Conclusion

Problem: QA QA/discourse is complex. Overhead in Knowledge extraction & decoding (grammar based systems). Restrictions due to inherent language & culture constructs. Context of word usage

Key: IR Information is hidden in relevant documents. Frequency of a word - Its Importance. Neighborhood of a word - Context

Methodology Question posed is a new document. How close is this document with respect to all documents in KB? –Naïve Bayes: Probabilistic Approach (C#) –LSA: Dimensionality Reduction (MATLAB) Closest document has the answer.

Naïve Bayes v MAP = argmax P(v j | a 1, a 2, a 3,…, a n ) v j ε V As all documents are possible target documents, P(v 1 )=P(v 2 )=…=P(v j )=constant v NB = argmax  P(a i |v j ) v j ε V i Words are independent and identically distributed.

Naïve Bayes - Algorithm Pre-process all documents. Store number of unique words in each document (N i ). Concatenate all documents and store words that occur at least 2 times as unique words. Count the number of such unique words as the ‘Vocabulary’. For each of these unique words for each document, estimate P(word|document) using the formula, (Freq of the word in doc ‘i’ + 1)/(N i + Vocabulary) Store (word, doc, probability/frequency) to a file.

Contd… Obtain an input query from the user. Retrieve individual words after pre-processing. Penalize if words are not one amongst the unique ones. For each doc estimate the product of the probabilities of all the retrieved words given this document from the file. P(input|v i )=P(w 1 |v i )*P(w 2 |v i )*P(w 3 |v i )*…*P(w n |v i ) The document having the maximum P(input|v i ) is the document having the answer. WORDNET: Resolve unknown input words

LSA: Latent Semantic Analysis Method to extract & represent contextual- usage meaning of words. Set of words are points in a very high dimensional “semantic space”. Uses SVD to reduce dimensionality. Application of correlation analysis to arrive at results.

LSA: Algorithm Obtain (word, doc, frequency). Basic Matrix: Form the (word x doc) matrix with the frequency entries. Preprocess the input query. Query Matrix: Form the (word x (doc+1) ) matrix with the query as the last column with individual word frequencies. Perform SVD: USV T Select the two largest singular values and reconstruct the matrix.

Contd… Find the document that is maximally correlated to the query document column. This is the document having the answer to the query.

Testing Documents: Basic Electrical Engineering (EXP, Lessons) The documents have an average of app. 250 words and each deal with a new topic (Cannot partition into training and testing docs) – ( = 57 docs) Naïve Bayes: –Automated trivial input testing –Real input testing LSA –Trivial input testing –Real input testing (to be tested for Lesson)

Results Naïve Bayes: –Automated Trivial Input Start PositionNo. of wordsAccuracy 10 48/ / / / / /46

Results Naïve Bayes –Real Input EXP docs (11 docs): Input have less than 10 words: (E.g. “how do i use a dc power supply?”) Accuracy: 8/10 Input 10 to 15 words: (E.g. “what is the law that states that energy is neither created nor destroyed, but just changes from one form to another?”) Accuracy: 8/10 Lesson docs (46 docs): 5 to 15 words Accuracy: 14/20

Results LSA (flawless with trivial input >20 words) –Without SVD (For EXP only) Poor accuracy: 4/10 (<10 words) Good accuracy: 8/10 (10 to 15 words) –With SVD Very poor accuracy: 1/10 (<10 words) Poor accuracy: 2/10 (10 to 15 words)

Comparison Naïve Bayes –Fails for acronyms and irrelevant queries –Indirect references fail - word context –Keywords determine success. –Discrete concept content perform better (EXP) LSA –Fails miserably for small sentences (<15) –Very effective for large sentences (>20) –Insensitive to indirect references or context

Conclusion The Naïve Bayes and the LSA techniques were studied. Software was written to test these methods. Naïve Bayes is found to be very effective for short sentences (Q-A) type with an app. Accuracy of 80%. LSA without SVD is better than with SVD for smaller sentences.