Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
THE PARTS OF SYNTAX Don’t worry, it’s just a phrase ELL113 Week 4.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
LI 2013 NATHALIE F. MARTIN S YNTAX. Grammatical vs Ungrammatical.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Some Advances in Transformation-Based Part of Speech Tagging
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Syntax I: Constituents and Structure Gareth Price – Duke University.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
Ling 570 Day 17: Named Entity Recognition Chunking.
Phrases -They are group of words that do not have a subject and consisting of a head determining the category of the phrase. -There are different types.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
SYNTAX Lecture -1 SMRITI SINGH.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Lecture E: Phrase functions and clause functions
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
GrammaticalHierarchy in Information Flow Translation Grammatical Hierarchy in Information Flow Translation CAO Zhixi School of Foreign Studies, Lingnan.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Linguistic Essentials
LING 388: Language and Computers Sandiway Fong Lecture 12.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with.
Supertagging CMSC Natural Language Processing January 31, 2006.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Part-of-speech tagging
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
SYNTAX.
1 Chapter 4 Syntax Part III. 2 The infinity of language pp The number of sentences in a language is infinite. 2. The length of sentences is.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
KNN & Naïve Bayes Hongning Wang
Phrases and Clauses. Noun phrases Expressions in which nouns form the principal or main element (e.g. a chair, the university, my car) are called noun.
Chunk Parsing CS1573: AI Application Development, Spring 2003
David Kauchak CS159 – Spring 2019
Presentation transcript:

Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short man the short man with the large hat went home to his house out of the car with her }{

Phrase Structure S NP PN VP VBDNPPP PRPNP Shesawa tall manwitha telescope

Noun Phrases Contains a noun plus descriptors, including: –Determiner: the, a, this, that –Adjective phrases: green, very tall –Head: the main noun in the phrase –Post-modifiers: prepositional phrases or relative clauses That old green couch of yours that I want to throw out detadj headPPrelative clause

Verb Phrases Contains a verb (the head) with modifiers and other elements that depend on the verb want to throw out headPP previously saw the man in the park with her telescope advheaddirect objectPP might have showed his boss the code yesterday indirect object DObjheadauxmodaladverb

Prepositional Phrases Preposition as head and NP as complement with her grey poodle headcomplement Adjective Phrases Adjective as head with modifiers extremely sure that he would win headrelative clauseadv

Shallow Parsing Extract phrases from text as ‘chunks’ Flat, no tree structures Usually based on patterns of POS tags Full parsing conceived of two steps: –Chunking / Shallow parsing –Attachment of chunks to each other

Noun Phrases Base Noun Phrase: A noun phrase that does not contain other noun phrases as a component Or, no modification to the right of the head a large green cow The United States Government every poor shop-owner’s dream? other methods and techniques?

Manual Methodology Build a regular-expression over POS E.g: DT? (ADJ | VBG)* (NN)+ Very hard to do accurately Lots of manual labor Cannot be easily tuned to a specific corpus

Chunk Tags Represent NPs by tags: [thetallman]ranwith [blindingspeed] DTADJNN1VBDPRPVBGNN0 IIIOO II Need B tag for adjacent NPs: On[Tuesday][thecompany]wentbankrupt O I BIOO

Transformational Learning Baseline tagger: –Most frequent chunk tag for POS or word Rule templates (100 total): current word/POScurrent ctag word/POS 1 on left/rightcurrent and left ctag current and left/right word/POScurrent and right ctag word/POS on left and on rightin two ctags to left in two words/POSs on left/rightin two ctags to right in three words/POSs on left/right

Some Rules Learned 1.(T 1 =O, P 0 =JJ) I  O 2.(T -2 =I, T -1 =I, P 0 =DT)  B 3.(T -2 =O, T -1 =I, P -1 =DT)  I 4.(T -1 =I, P 0 =WDT)I  B 5.(T -1 =I, P 0 =PRP)I  B 6.(T -1 =I, W 0 =who)I  B 7.(T -1 =I, P 0 =CC, P 1 =NN)O  I

Results TrainingPrec.RecallTag Acc. Baseline K K K K nolex K Precision = fraction of NPs predicted that are correct Recall = fraction of actual NPs that are found

Memory-Based Learning Match test data to previously seen data and classify based on the most similar previously seen instances E.g: { the saw was she saw the boy saw three boy saw the boy ate the

k-Nearest Neighbor (kNN) Find k most similar training examples Let them ‘vote’ on the correct class for the test example –Weight neighbors by distance from test Main problem: defining ‘similar’ –Shallow parsing – overlap of words and POS –Use feature weighting...

Information Gain Not all features are created equal (e.g. saw in previous example is more important) Weight the features by information gain = how much does f distinguish different classes

C1 C2 C3 C4 high information gain low information gain

Base Verb Phrase Verb phrase not including NPs or PPs [ NP Pierre Vinken NP ], [ NP 61 years NP ] old, [ VP will soon be joining VP ] [ NP the board NP ] as [ NP a nonexecutive director NP ].

Results Context: 2 words and POS on left and 1 word and POS on right TaskContextPrec.RecallAcc. bNPcurr. word curr. POS – bVPcurr. word curr. POS –

Efficiency of MBL Finding the neighbors can be costly Possibility: Build decision tree based on information gain of features to index data = approximate kNN W0W0 P -2 P -1 W -1 saw the boy

MBSL Memory-based technique relying on sequential nature of the data –Use “tiles” of phrases in memory to “cover” a new candidate (and context), and compute a tiling score wenttothewhitehousefordinner VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NN1 NP ] PRP PRP [ NP DT ADJ ADJ NN1 NP ]

Tile Evidence Memory: [ NP DT NN1 NP ] VBD [ NP DT NN1 NN1 NP ] [ NP NN2 NP ]. [ NP ADJ NN2 NP ] AUX VBG PRP [ NP DT ADJ NN1 NP ]. Some tiles: [ NP DTpos=3neg=0 [ NP DT NN1pos=2neg=0 DT NN1 NP ]pos=1neg=1 NN1 NP ]pos=3neg=1 NN1 NP ] VBDpos=1neg=0 Score tile t by f t (t) = pos / total, Only keep tiles that pass a threshhold f t (t) > 

Covers Tile t 1 connects to t 2 in a candidate if: –t 2 starts after t 1 –there is no gap between them (may be overlap) –t 2 ends after t 1 VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NP ] PRP A sequence of tiles covers a candidate if –each tile connects to the next –the tiles collectively match the entire candidate including brackets and maybe some context

Cover Graph VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NN1 NP ] PRP PRP [ NP DT ADJ ADJ NN1 NP ] STARTEND

Measures of ‘Goodness’ Number of different covers Size of smallest cover (fewest tiles) Maximum context in any cover (left + right) Maximum overlap of tiles in any cover Grand total positive evidence divided by grand total positive+negative evidence Combine these measures by linear weighting

Scoring a Candidate CandidateScore(candidate,  T ) G  CoverGraph(candidate,  T ) Compute statistics by DFS on G Compute candidate score as linear function of statistics Complexity (O(l) tiles in candidate of length l): –Creating the cover graph is O(l 2 ) –DFS is O(V+E)=O(l 2 )

Full Algorithm MBSL(sent,  C,  T ) 1.For each subsequence of sent, do: 1.Construct a candidate s by adding brackets [[ and ]] before and after the subsequence 2. f C (s)  CandidateScore(s,  T ) 3.If f C (s) >  C, then add s to candidate-set 2.For each c in candidate-set in decreasing order of f C (c), do: 1.Remove all candidates overlapping with c from candidate-set 3.Return candidate-set as target instances

Results Target Type Context size TT Prec.Recall NP SV VO