Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Chunk/Shallow Parsing Miriam Butt October PP-Attachment Recall the PP-Attachment Problem (demonstrated with XLE ): The ambiguity increases exponentially.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Midterm Review CS4705 Natural Language Processing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Information extraction 2 Day 37 LING Computational Linguistics Harry Howard Tulane University.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Tokenization & POS-Tagging
CSA2050 Introduction to Computational Linguistics Parsing I.
CPSC 503 Computational Linguistics
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Supertagging CMSC Natural Language Processing January 31, 2006.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
POS Tagger and Chunker for Tamil
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
Natural Language Processing Vasile Rus
Raymond J. Mooney University of Texas at Austin
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
LING 581: Advanced Computational Linguistics
CSCI 5832 Natural Language Processing
Chunk Parsing CS1573: AI Application Development, Spring 2003
CSCI 5832 Natural Language Processing
Presentation transcript:

Shallow Parsing CS 4705 Julia Hirschberg 1

Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question answering But we would like more than simple POS sequences 2

Chunking Find major but unembedded constituents like NPs, VPs, AdjPs, PPs –Most common task: NP chunking of base NPs –[NP I] saw [NP the man] on [NP the hill] with [NP a telescope] –No attempt to identify full NPs – no recursion, no post-head words –No overlapping constituents –E.g., if we add PPs or VPs, they may consist only of their heads, e.g. [PP on]

Approaches: FST Chunking Use regexps to identify constituents, e.g. –NP  (DT) NN* NN –Find longest matching chunk –Hand-built rules –No recursion but can cascade FSTs to approximate true CF parser, aggregating larger and larger constituents –E.g. Steve Abney’s Cass chunker

Figure 13.18

Approaches: ML Chunking Require annotated corpus Train classifier to classify each element of input in sequence (e.g. IOB Tagging) –B (beginning of sequence) –I (internal to sequence) –O (outside of any sequence) –No end-of-chunk coding – it’s implicit –Easier to detect the beginning than the end Book/B_VP that/B_NP flight/I_NP quickly/O

Train classifier using features from context and current word such as word identity, POS, chunk tags Penn Treebank provides a good training corpus, since it has hand-labeled full parses which can be converted to chunks

Figure 13.19

Chunk Evaluation Compare system output with Gold Standard using –Precision: # correct chunks retrieved/total hypothesized chunks retrieved by system –Recall: # correct chunks retrieved/total actual chunks in test corpus –F-measure: (weighted) Harmonic Mean of Precision and Recall –Best: F 1 =.96 for NP chunking

Limitations POS tagging accuracy Ambiguities and errors in labels of training corpus

Distribution of Chunks in CONLL Shared Task

Summary Sometimes shallow parsing is enough for task Performance quite accurate Next: Midterm