Statistical Natural Language Parsing Parsing: The rise of data and statistics.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

LING 388: Language and Computers
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Introduction to treebanks Session 1: 7/08/
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1/17 Probabilistic Parsing … and some other approaches.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 1.
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Parsing Based on presentations from Chris Manning’s course on Statistical Parsing (Stanford)
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
Using Minimum Description Length to make Grammatical Generalizations Mike Dowman University of Tokyo.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CSA2050 Introduction to Computational Linguistics Parsing I.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Syntax Sudeshna Sarkar 25 Aug Top-Down and Bottom-Up Top-down Only searches for trees that can be answers (i.e. S’s) But also suggests trees.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSCE 590 Web Scraping - NLTK
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
Vamshi Ambati 14 Sept 2007 Student Research Symposium
LING/C SC 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Translingual Knowledge Projection and Statistical Machine Translation
Linguistic Essentials
Natural Language Processing
CSCE 590 Web Scraping - NLTK
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Statistical Natural Language Parsing Parsing: The rise of data and statistics

Christopher Manning Pre 1990 (“Classical”) NLP Parsing Wrote symbolic grammar (CFG or often richer) and lexicon S  NP VPNN  interest NP  (DT) NNNNS  rates NP  NN NNSNNS  raises NP  NNPVBP  interest VP  V NPVBZ  rates Used grammar/proof systems to prove parses from words This scaled very badly and didn’t give coverage. For sentence: Fed raises interest rates 0.5% in effort to control inflation Minimal grammar:36 parses Simple 10 rule grammar:592 parses Real-size broad-coverage grammar: millions of parses

Christopher Manning Classical NLP Parsing: The problem and its solution Categorical constraints can be added to grammars to limit unlikely/weird parses for sentences But the attempt make the grammars not robust In traditional systems, commonly 30% of sentences in even an edited text would have no parse. A less constrained grammar can parse more sentences But simple sentences end up with ever more parses with no way to choose between them We need mechanisms that allow us to find the most likely parse(s) for a sentence Statistical parsing lets us work with very loose grammars that admit millions of parses for sentences but still quickly find the best parse(s)

Christopher Manning The rise of annotated data: The Penn Treebank ( (S (NP-SBJ (DT The) (NN move)) (VP (VBD followed) (NP (NP (DT a) (NN round)) (PP (IN of) (NP (NP (JJ similar) (NNS increases)) (PP (IN by) (NP (JJ other) (NNS lenders))) (PP (IN against) (NP (NNP Arizona) (JJ real) (NN estate) (NNS loans)))))) (,,) (S-ADV (NP-SBJ (-NONE- *)) (VP (VBG reflecting) (NP (NP (DT a) (VBG continuing) (NN decline)) (PP-LOC (IN in) (NP (DT that) (NN market))))))) (..))) [Marcus et al. 1993, Computational Linguistics]

Christopher Manning The rise of annotated data Starting off, building a treebank seems a lot slower and less useful than building a grammar But a treebank gives us many things Reusability of the labor Many parsers, POS taggers, etc. Valuable resource for linguistics Broad coverage Frequencies and distributional information A way to evaluate systems

Christopher Manning Statistical parsing applications Statistical parsers are now robust and widely used in larger NLP applications: High precision question answering [Pasca and Harabagiu SIGIR 2001] Improving biological named entity finding [Finkel et al. JNLPBA 2004] Syntactically based sentence compression [Lin and Wilbur 2007] Extracting opinions about products [Bloom et al. NAACL 2007] Improved interaction in computer games [Gorniak and Roy 2005] Helping linguists find data [Resnik et al. BLS 2005] Source sentence analysis for machine translation [Xu et al. 2009] Relation extraction systems [Fundel et al. Bioinformatics 2006]

Statistical Natural Language Parsing Parsing: The rise of data and statistics