Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Designing Help… Mark Johnson Providing Support Issues –different types of support at different times –implementation and presentation both important.
Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Introduction to machine learning
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Artificial Intelligence (AI) Addition to the lecture 11.
Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
1 Computational Linguistics Ling 200 Spring 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Some Probability Theory and Computational models A short overview.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Introduction to CL & NLP CMSC April 1, 2003.
Chapter 11 Artificial Intelligence Introduction to CS 1 st Semester, 2015 Sanghyun Park.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Chap#11 What is User Support?
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Artificial Intelligence: Natural Language
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Chapter 21 Robotic Perception and action Chapter 21 Robotic Perception and action Artificial Intelligence ดร. วิภาดา เวทย์ประสิทธิ์ ภาควิชาวิทยาการคอมพิวเตอร์
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Introducing Precictive Analytics
WP4 Models and Contents Quality Assessment
CHAPTER 1 Introduction BIC 3337 EXPERT SYSTEM.
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
CS621/CS449 Artificial Intelligence Lecture Notes
CONTEXT DEPENDENT CLASSIFICATION
Natural Language Processing
Command Me Specification
Chapter 11 user support.
LECTURE 23: INFORMATION THEORY REVIEW
Artificial Intelligence 2004 Speech & Natural Language Processing
Statistical NLP : Lecture 9 Word Sense Disambiguation
Machine Learning in Business John C. Hull
Presentation transcript:

Statistical Natural Language Processing

What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical and practical issues in the design and implementation of computer systems for processing human languages  It is an interdisciplinary field which draws on other areas of study such as computer science, artificial intelligence, linguistics and logic

Applications of NLP  natural language interfaces to databases  programs for classifying and retrieving documents by content  explanation generation for expert systems  machine translation  advanced word-processing tools

What makes NLP a computational challenge?  Ambiguous nature of Natural Language.  There are varied applications for language technology  Knowledge representation is a difficult task.  There are different levels of information encoded in our language

What is statistical NLP?  Statistical NLP aims to perform statistical inference for the field of NLP  Statistical inference consists of taking some data generated in accordance with some unknown probability distribution and making inferences.

Motivations for Statistical NLP  Cognitive modeling of the human language processing has not reached a stage where we can have a complete mapping between the language signal and the information contents.  Complete mapping is not always required.  Statistical approach provides the flexibility required for making the modeling of a language more accurate.

Idea behind Statistical NLP  View language processing as a noisy channel information transmission.  The approach requires a model that characterizes the transmission by giving for every message the probability of the observed output

Statistical Modeling and Classification  Primitive acoustic features  Quantization  Maximum likelihood and related rules  Class conditional density function  Hidden Markov Model Methodology

Details…. Primitive acoustic features are used to estimate the speech spectrum on the basis of its statistical properties. By means of quantization a typical speech signal can be represented as a sequence of symbols and can be mapped using statistical decision rules into a multidimensional acoustic feature space, thus classifying the signal.

Maximum Likelihood Although there is no direct method for computing the probability of a phonetic unit given its acoustic features,we can use Bayes rule to estimate the probability of a phonetic class given its features from the likelihood of the features given the class. This method leads to the maximum likelihood classifier which assigns an unknown vector to that class whose probability density function conditioned on the class has the maximum value. Another variant of the maximum likelihood methodology is clustering.

Hidden Markov Models A Hidden Markov Model, is a set of states (lexical categories in our case) with directed edges labeled with transition probabilities that indicate the probability of moving to the state at the end of the directed edge, given that one is now in the state at the start of the edge. The states are also labeled with a function which indicates the probabilities of outputting different symbols if in that state (while in a state, one outputs a single symbol before moving to the next state). In our case, the symbol output from a state/lexical category is a word belonging to that lexical category.

Hidden Markov Models (cont.)

Conditional Class Density Function All statistical methods of speech recognition depend on the class conditional density function. These, in turn, depend on the existence of a sufficiently large, correctly labeled training set and well understood statistical estimation techniques

How does statistics help  Disambiguation may be achieved by using stochastic context free grammars  It helps in providing degrees of grammaticality  Naturalness  Structural preference  Error Tolerance

Example using stochastic CFG for example consider the sentence “ John Walks “ The grammar is as follows : 1 S -> NP V S -> NP NP -> N NP -> N N N -> John N -> Walks V -> Walks 1.0 The numbers on the right represent the weights for each rule.The weight of the analysis is the product of the weights of the rules used in the derivation. Predicting the right sentence that is perceived is based on these weights.

Degrees of grammaticality  Traditional approaches to NLP do not accommodate gradations of grammaticality. A sentence is either correct or not.  In some cases acceptability may vary with the structure and context of the sentence.

Structural Preference Consider the sentence “ The emergency crews hate most is domestic violence.” The correct interpretation is: “The emergency [that the crews hate most] is domestic violence.” These preferences can be seen more as structural preferences rather than parsing preferences. Statistical approaches can easily handle such structural preferences.

Error Tolerance  A remarkable property of human language comprehension is error tolerance.  Many sentences that the traditional approach classifies as ungrammatical can actually be interpreted by statistical NLP techniques.

Conclusions  Free and commercial software is now available that provides a lot of NLP features. (e.g. Microsoft XP has a speech recognition software by which users can control menus and execute commands)  A lot of research is going into developing new applications and investigating new techniques and approaches that will make Statistical NLP more feasible in the near future.