CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Chapter 4 Syntax.
Statistical NLP: Lecture 3
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Scalable Text Mining with Sparse Generative Models
Introduction to Machine Learning Approach Lecture 5.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
CAS LX 502 8b. Formal semantics A fragment of English.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone
Writing an ERG mal-rule David Mott IBM Emerging Technology Services.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Chapter 23: Probabilistic Language Models April 13, 2004.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Rules, Movement, Ambiguity
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
 2003 CSLI Publications Ling 566 Oct 20, 2015 How the Grammar Works.
Click to edit Master title style Instructor: Nick Cercone CSEB - CSE Introduction to Computational Linguistics Tuesdays,
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Natural Language Processing Vasile Rus
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Instructor: Nick Cercone CSEB -
Statistical Models for Automatic Speech Recognition
Machine Learning in Natural Language Processing
Instructor: Nick Cercone CSEB -
Statistical NLP: Lecture 9
N-Gram Model Formulas Word sequences Chain rule of probability
CS4705 Natural Language Processing
Ling 566 Oct 14, 2008 How the Grammar Works.
Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Final HPSGs Cleaning up and final aspects, semantics, overview to statistical NLP

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs An Overlooked Topic: Complements vs. Modifiers Intuitive idea: Complements introduce essential participants in the situation denoted; modifiers refine the description. Generally accepted distinction, but disputes over individual cases. Linguists rely on heuristics to decide how to analyze questionable cases (usually PPs).

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs Heuristics for Complements vs. Modifiers Obligatory PPs are usually complements. Temporal & locative PPs are usually modifiers. An entailment test: If X Ved (NP) PP does not entail X did something PP, then the PP is a complement. Examples – Pat relied on Chris does not entail Pat did something on Chris – Pat put nuts in a cup does not entail Pat did something in a cup – Pat slept until noon does entail Pat did something until noon – Pat ate lunch at Bytes does entail Pat did something at Bytes

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs Agreement Two kinds so far (namely?) Both initially handled via stipulation in theHead- Specifier Rule But if we want to use this rule for categories that don’t have the AGR feature (such as PPs and APs, in English), we can’t build it into the rule.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs The Specifier-Head Agreement Constraint (SHAC) Verbs and nouns must be specified as:

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs The Count/Mass Distinction Partially semantically motivated – mass terms tend to refer to undifferentiated substances (air, butter, courtesy, information) – count nouns tend to refer to individuatable entities (bird, cookie, insult, fact) But there are exceptions: – succotash (mass) denotes a mix of corn & lima beans, so it’s not undifferentiated. – furniture, footwear, cutlery, etc. refer to individuatable artifacts with mass terms – cabbage can be either count or mass, but many speakers get lettuce only as mass.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics The Linguist’s stance: Building a precise model Some statements are statements about how the model works: “[prep] and [AGR 3sing] cannot be combined because AGR is not a feature of the type prep.” Some statements are statements about how (we think) English or language in general works. “The determiners a and many only occur with count nouns, the determiner much only occurs with mass nouns, and the determiner the occurs with either.” Some are statements about how we code a particular linguistic fact within the model. “All count nouns are [SPR ].”

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics The Linguist’s stance:A Vista on the Set of Possible English Sentences... as a background against which linguistic elements (words, phrases) have a distribution... as an arena in which linguistic elements “behave” in certain ways

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Semantics So far, our “grammar” has no semantic representations. We have, however, been relying on semantic intuitions in our argumentation, and discussing semantic contrasts where they line up (or don't) with syntactic ones. Examples? structural ambiguity S/NP parallelism count/mass distinction complements vs. modifiers

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Semantics Aspects of meaning we won’t account for Pragmatics Fine-grained lexical semantics:

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Semantics Our Slice of a World of Meanings “... the linguistic meaning of Chris saved Pat is a proposition that will be true just in case there is an actual situation that involves the saving of someone named Pat by someone named Chris.”

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Semantics Our Slice of a World of Meanings What we are accounting for is the compositionality of sentence meaning. How the pieces fit together Semantic arguments and indices How the meanings of the parts add up to the meaning of the whole. Appending RESTR lists up the tree

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics in Constraint-based grammar Constraints as generalized truth conditions proposition: what must be the case for a proposition to be true directive: what must happen for a directive to be fulfilled question: the kind of situation the asker is asking about reference: the kind of entity the speaker is referring to Syntax/semantics interface: Constraints on how syntactic arguments are related to semantic ones, and on how semantic information is compiled from different parts of the sentence.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – Feature Geometry

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – How the pieces fit together

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – How the pieces fit together

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – How the pieces fit together

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics (pieces together)

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics (more detailed view of same tree)

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics To Fill in Semantics for the S-node, we need the Semantics Principles The Semantic Inheritance Principle: In any headed phrase, the mother's MODE and INDEX are identical to those of the head daughter. The Semantic Compositionality Principle: In any well-formed phrase structure, the mother's RESTR value is the sum of the RESTR values of the daughter.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – semantics inheritance illustrated

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics - semantic compositionality illustrated

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – what identifies indices

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – summary words contribute predications ‘expose’ one index in those predications, for use by words or phrases relate syntactic arguments to semantic arguments

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters Head Specifier Rule Head Complement Rule Head Modifier Rule

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters license trees which are subject to the semantic principles - SIP ‘passes up’ MODE and INDEX from head daughter

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters license trees which are subject to the semantic principles -SIP ‘passes up’ MODE and INDEX from head daughter -SCP: ‘gathers up’ predications (RESTR list) from all daughters

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – other aspects of semantics Tense, Quantification (only touched on here) Modification Coordination Structural Ambiguity

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – what were are trying to do Objectives Develop a theory of knowledge of language Represent linguistic information explicitly enough to distinguish well-formed from ill-formed expressions Be parsimonious, capturing linguistically significant generalizations. Why Formalize? To formulate testable predictions To check for consistency To make it possible to get a computer to do it for us

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs –how we construct sentences The Components of Our Grammar Grammar rules Lexical entries Principles Type hierarchy (very preliminary, so far) Initial symbol (S, for now) We combine constraints from these components. Question: What says we have to combine them?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – an example A cat slept. Can we build this with our tools? Given the constraints our grammar puts on well-formed sentences, is this one?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – lexical entry for “a” Is this a fully specified description? What features are unspecified? How many word structures can this entry license?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – lexical entry for “cat” Which feature paths are abbreviated and Is this fully specified? What features are unspecified? How many word structures can this entry license?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Effect of Principles: the SHAC

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Description of Word Structures for cat

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Description of Word Structures for a

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Building a Phrase

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Constraints Contributed by Daughter Subtrees

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Constraints Contributed by the Grammar Rule

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - A Constraint Involving the SHAC

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Effects of the Valence Principle

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Effects of the Head Feature Principle

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Effects of the Semantic Inheritance Principle

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Effects of the Semantic Compositionality Principle

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Is the Mother Node Now Completely Specified?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Lexical Entry for slept

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Another Head-Specifier Phrase

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Is this description fully specified?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Does the top node satisfy the initial symbol?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - RESTR of the S node

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Another example

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Head Features from Lexical Entries

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Head Features from Lexical Entries, plus HFP

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Valence Features:Lexicon, Rules, and the Valence Principle

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Required Identities: Grammar Rules

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - Two Semantic Features: the Lexicon & SIP

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - RESTR Values and the SCP

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs - An Ungrammatical Example What’s wrong with this sentence? The Valence Principle, Head Specifier Rule

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - HPSGs – Overview Information movement in trees Exercise in critical thinking SPR and COMPS Technical details (lexical entries, trees) Analogies to other systems you might know, e.g., How is the type hierarchy like an ontology?

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP – Introduction NLP as we have examined thus far can be contrasted with statistical NLP. For example, statistical parsing researchers assue that there is a continuum and that the only distinction to be drawn is between the correct parse and all the rest. The “parse” given by the parse tree on the right would support this continuum view. For statistical NLP researchers, there is no Difference between parsing and syntactic Disambiguation: its parsing all the way!

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP – Statistical NLP is normally taught in 2 parts: Part I lays out the mathematical and linguistic foundation that the other parts build on. These include concepts and techniques normally referred to throughout the course. Part II covers word-centered work in Statistical NLP. There is a natural progression from simple to complex linguistic phenomena in collocations, n-gram models, word sense disambiguation, and lexical acquisition. This work is followed by techniques such as Markov Models, tagging, probabilistic context free grammars, and probabilistic parsing, which build on each other. Finally other applications and techniques are introduced: statistical alignment and machine translation, clustering, information retrieval, and text categorization.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP – What we will discuss 1. Information Retrieval and the Vector Space Model Typical IR system architecture, steps in document and query processing in IR, vector space model, tfidf - term frequency inverse document frequency weights, term weighting formula, cosine similarity measure, term-by- document matrix, reducing the number of dimensions, Latent Semantic Analysis, IR evaluation

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 2. Text Classification Text classification and text clustering, Types of text classification, evaluation measures in text classification, F-measure, Evaluation methods for classification: general issues - over fitting and under fitting, methods: 1. training error, 2. train and test, 3. n-fold cross-validation

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 3. Parser Evaluation, Text Clustering and CNG Classification Parser evaluation: PARSEVAL measures, labeled and unlabeled precision and recall, F-measure; Text clustering: task definition, the simple k-means method, hierarchical clustering, divisive and agglomerative clustering; evaluation of clustering: inter-cluster similarity, cluster purity, use of entropy or information gain; CNG -- Common N-Grams classification method

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 4. Probabilistic Modeling and Joint Distribution Model Elements of probability theory, Generative models, Bayesian inference, Probabilistic modeling: random variables, random configurations, computational tasks in probabilistic modeling, spam detection example, joint distribution model, drawbacks of joint distribution model

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 5. Fully Independent Model and Naive Bayes Model Fully independent model, example, computational tasks, sum-product formula; Naive Bayes model: motivation, assumption, computational tasks, example, number of parameters, pros and cons; N-gram model, language modeling in speech recognition

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 6. N-gram Model N-gram model: n-gram model assumption, graphical representation, use of log probabilities; Markov chain: stochastic process, Markov process, Markov chain; Perplexity and evaluation of N-gram models, Text classification using language models

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 7. Hidden Markov Model Smoothing: Add-one (Laplace) smoothing, Bell-Witten smoothing; Hidden Markov Model, graphical representations, assumption, HMM POS example, Viterbi algorithm -- use of dynamic programming in HMMs.

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Statistical NLP - – What we will discuss 8. Bayesian Networks Bayesian Networks, definition, example, Evaluation tasks in Bayesian Networks: evaluation, sampling, inference in Bayesian Networks by brute force, general inference in Bayesian Networks is NP-hard, efficient inference in Bayesian Networks,

CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone CSEB - Other Concluding Remarks ATOMYRIADES Nature, it seems, is the popular name for milliards and milliards and milliards of particles playing their infinite game of billiards and billiards and billiards.