Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
FSG, RegEx, and CFG Chapter 2 Of Sag’s Syntactic Theory.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Features and Unification
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
Part of speech (POS) tagging
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Syntax Construction of phrases and sentences from morphemes and words. Usually the word syntax refers to the way words are arranged together. Syntactic.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING COMP3310 Natural Language Processing Eric Atwell, Language Research Group.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin and Rada Mihalcea.
Grammars.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Ling 570 Day 17: Named Entity Recognition Chunking.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Chapter 12: FORMAL GRAMMARS OF ENGLISH Heshaam Faili University of Tehran.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
1 LIN6932 Spring 2007 LIN6932 Topics in Computational Linguistics Lecture 6: Grammar and Parsing (I) February 15, 2007 Hana Filip.
Chapter 12: Context-Free Grammars Heshaam Faili University of Tehran.
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NATURAL LANGUAGE PROCESSING
Context Free Grammars. Slide 1 Syntax Syntax = rules describing how words can connect to each other * that and after year last I saw you yesterday colorless.
Natural Language Processing Vasile Rus
CSCI 5832 Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Probabilistic and Lexicalized Parsing
Parsing and More Parsing
CSCI 5832 Natural Language Processing
David Kauchak CS159 – Spring 2019
Dekai Wu Presented by David Goss-Grubbs
David Kauchak CS159 – Spring 2019
Presentation transcript:

Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today. Friday (10/6) each team must me: –who is on the team –ideas for project, with scale up/down plans Get together as a team to work when I am away: –10/12–10/13: Thursday – Friday next week –10/23–10/27: Monday – Friday

Part of speech (POS) tagging Tagging of words in a corpus with the correct part of speech, drawn from some tagset. Early automatic POS taggers were rule- based. Stochastic POS taggers are reasonably accurate.

Applications of POS tagging Parsing –recovering syntactic structure requires correct POS tags –partial parsing refers to and syntactic analysis which does not result in a full syntactic parse (e.g. finding noun phrases) - “parsing by chunks”

Applications of POS tagging Information extraction –fill slots in predefined templates with information –full parse is not needed for this task, but partial parsing results (phrases) can be very helpful –information extraction tags with grammatical categories to find semantic categories

Applications of POS tagging Question answering –system responds to a user question with a noun phrase Who shot JR? (Kristen Shepard) Where is Starbucks? (UB Commons) What is good to eat here? (pizza)

Background on POS tagging How hard is tagging? –most words have just a single tag: easy –some words have more than one possible tag: harder –many common words are ambiguous Brown corpus: –10.4% of word types are ambiguous –40%+ of word tokens are ambiguous

Disambiguation approaches Rule-based –rely on large set of rules to disambiguate in context –rules are mostly hand-written Stochastic –rely on probabilities of words having certain tags in context –probabilities derived from training corpus Combined –transformation-based tagger: uses stochastic approach to determine initial tagging, then uses a rule-based approach to “clean up” the tags

Determining the appropriate tag for an untagged word Two types of information can be used: syntagmatic information –consider the tags of other words in the surrounding context –tagger using such information correctly tagged approx. 77% of words –problem: content words (which are the ones most likely to be ambiguous) typically have many parts of speech, via productive rules (e.g. N  V)

Determining the appropriate tag for an untagged word use information about word (e.g. usage probability) –baseline for tagger performance is given by a tagger that simply assigns the most common tag to ambiguous words –correctly tags 90% of words modern taggers use a variety of information sources

Note about accuracy measures Modern taggers claim accuracy rates of around 96% to 97%. This sounds impressive, but how good are they really? This is a measure of correctness at the level of individual words, not whole corpora. With a 96% accuracy, 1 word out of 25 is tagged incorrectly. This represents roughly one tagging error per sentence.

Rule-based POS tagging Two-stage design: –first stage looks up individual words in a dictionary and tags words with sets of possible tags –second stage uses rules to disambiguate, resulting in singleton tag sets

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers generally maximize probabilities for tag sequences for sentences.

Bigram stochastic tagger This kind of tagger “…chooses tag t i for word w i that is most probable given the previous tag t i-1 and the current word w i : t i = argmax j P(t j | t i-1, w i ) (8.2)” [page 303] Bayes law says: P(T|W) = P(T)P(W|T)/P(W) P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) / P(t I-1, w i ) Since we take the argmax of this over the t i s, result is the same as using: P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) Rewriting: t i = argmax j P(t j | t i-1 )P(w i | t j )

Example (page 304) What tag to we assign to race? –to/TO race/?? –the/DT race/?? In the first case, if we are choosing between NN and VB as tags for race, the equations are: –P(VB|TO)P(race|VB) –P(NN|TO)P(race|NN) Tagger will choose tag for NN which maximizes the probability.

Example For first part – look at tag sequence probability: –P(NN|TO) = –P(VB|TO) = 0.34 For second part – look at lexical likelihood: –P(race|NN) = –P(race|VB) = Combining these: –P(VB|TO)P(race|VB) = –P(NN|TO)P(race|NN) =

English syntax What are some properties of English syntax we might want our formalism to capture? This depends on our goal: –processing written or spoken language? –modeling human behavior or not? Context-free grammar formalism

Things a grammar should capture As we have mentioned repeatedly, human language is an amazingly complex system of communication. Some properties of language which a (computational) grammar should reflect include: –Constituency –Agreement –Subcategorization / selectional restrictions

Constituency Phrases are syntactic equivalence classes: –they can appear in the same contexts –they are not semantic equivalence classes: they can clearly mean different things Ex (noun phrases) –Clifford the big red dog –the man from the city –a lovable little kitten

Constituency tests Can appear before a verb: –a lovable little kitten eats food –the man from the city arrived yesterday Other arbitrary word groupings cannot: –*from the arrived yesterday A string of words which is starred, like the one above, is considered ill-formed. Various gradations can occur, such as ‘?’, ‘?*’, ‘*’, ‘**’. Judgements are subjective.

More tests of constituency They also function as a unit with respect to syntactic processes: –On September seventeenth, I’d like to fly from Atlanta to Denver. –I’d like to fly on September seventeenth from Atlanta to Denver. –I’d like to fly from Atlanta to Denver on September seventeenth. Other groupings of words don’t behave the same: –* On September, I’d like to fly seventeenth from Atlanta to Denver. –* On I’d like to fly September seventeenth from Atlanta to Denver. –* I’d like to fly on September from Atlanta to Denver seventeenth. –* I’d like to fly on from Atlanta to Denver September seventeenth.

Agreement English has subject-verb agreement: –The cats chase that dog all day long. –* The cats chases that dog all day long. –The dog is chased by the cats all day long. –* The dog are chased by the cats all day long. Many languages exhibit much more agreement than English.

Subcategorization Verbs (predicates) require arguments of different types: – The mirage disappears daily. –NPI prefer ice cream. –NP PPI leave Boston in the morning. –NP NPI gave Mary a ticket. –PPI leave on Thursday.

Alternations want can take either an NP and an infinitival VP: –I want a flight … –I want to fly … find cannot take an infinitival VP: –I found a flight … –* I found to fly …

How can we encode rules of language? There are many grammar formalisms. Most are variations on context-free grammars. Context-free grammars are of interest because they –have well-known properties (e.g. can be parsed in polynomial time) –can capture many aspects of language

Basic context-free grammar formalism A CFG is a 4-tuple (N, ,P,S) where –N is a set of non-terminal symbols –  is a set of terminal symbols –P is a set of productions, P  N X (   N)* –S is a start symbol and   N =  Each production is of the form A  , where A is a non-terminal and  is drawn from (   N)*

Problems with basic formalism Consider a grammar rule like S  Aux NP VP To handle agreement between subject and verb, we could replace that rule with two new ones: S  3SgAux 3SgNP VP S  Non3SgAux Non3SgNP VP Need rules like the following too: 3SgAux  does | has | can | … Non3SgAux  do | have | can | …

Extensions to formalism Feature structures and unification –feature structures are of the form [ f 1 =v 1, f 2 =v 2, …, f n =v n ] –feature structures can be partially specified: (a) [ Number = Sg, Person = 3, Category = NP ] (b) [ Number = Sg, Category = NP ] (c) [ Person = 3, Category = NP ] –(b) unified with (c) is (a) Feature structures can be used to express feature- value constraints across constituents without rule multiplication.

Other formalisms More powerful: tree adjoining grammars –trees, not rules, are fundamental –trees are either initial or auxiliary –two operations: substitution and adjunction Less powerful: finite-state grammars –cannot handle general recursion –can be sufficient to handle real-world data –recursion spelled out explicitly to some level (large grammar)