School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Statistical NLP: Lecture 3
Used in place of a noun pronoun.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
MORPHOLOGY - morphemes are the building blocks that make up words.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
NLP and Speech 2004 English Grammar
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Ling 570 Day 17: Named Entity Recognition Chunking.
Natural Language Processing Lecture 6 : Revision.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Linguistic Essentials
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
CPSC 503 Computational Linguistics
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntax and Grammars.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
NATURAL LANGUAGE PROCESSING
Parts of Speech Review.
Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.
Statistical NLP: Lecture 3
Formal Language Theory
CS 388: Natural Language Processing: Syntactic Parsing
Natural Language - General
FIRST SEMESTER GRAMMAR
Chunk Parsing CS1573: AI Application Development, Spring 2003
PREPOSITIONAL PHRASES
Linguistic Essentials
CSCI 5832 Natural Language Processing
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

Shallow Parsing Break text up into non-overlapping contiguous subsets of tokens. Also called chunking, partial parsing, light parsing. What is it useful for? – semantic patterns Finding key meaning-elements: Named Entity Recognition people, locations, organizations Studying linguistic patterns, e.g. semantic patterns of verbs gave NP gave up NP in NP gave NP NP gave NP to NP Can ignore complex structure when not relevant

A Relationship between Segmenting and Labeling Tokenization segments the text Tagging labels the text Shallow parsing does both simultaneously.

Chunking vs. Full Syntactic Parsing G.K. Chesterton, author of The Man who was Thursday

Representations for Chunks IOB tags Inside, outside, and begin In English, the start of a phrase is often marked by a function-word

Representations for Chunks Trees Chunk structure is a two-level tree that spans the entire text, containing both chunks and non-chunks

CONLL Corpus: training data for Machine Learning of chunking From the Conference on Natural Language Learning Competition from 2000 Goal: create machine learning methods to improve on the chunking task

CONLL Corpus Data in IOB format from WSJ Wall Street Journal: Word POS-tag IOB-tag Training set: 8936 sentences Test set: 2012 sentences Tags from the Brill tagger Penn Treebank Tags Evaluation measure: F-score 2*precision*recall / (recall+precision) Baseline was: select the chunk tag that is most frequently associated with the POS tag, F =77.07 Best score in the contest was F=94.13

Chunking with Regular Expressions This time we write regexs over TAGS rather than characters ? + Compile them with parse.ChunkRule() rule = parse.ChunkRule( +) chunkparser = parse.RegexpChunk([rule], chunk_node = NP) Resulting object is a (sort-of) parse tree Top-level node called S Chunks are labelled NP

Chunking with Regular Expressions

Rule application is sensitive to order

Chinking Specify what does not go into a chunk. Kind of like specifying punctuation as being not alphanumeric and spaces. Can be more difficult to think about.

Simple chink-chunk approach: function v content word-class Regular expressions for chunks and chinks CAN get complex BUT the whole point is to be simpler than full parsing! SO: use a simple model which works reasonably well (then tidy up afterwards…) Chunk = nominal content-word (noun) Chink = others (verb, pronoun, determiner, preposition, conjunction) (+adjective, adverb as a borderline category)

Example Fruit flies like a banana fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana] [S [NP fruit\N flies\N NP] [VP like\V [NP a\A banana\N NP] VP] S]

An alternative parse This sentence is grammatically ambiguous: Fruit flies like a banana fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana] fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana] cf: bank robbers like a chase v bread bakes in an oven [S [NP fruit\N NP] [VP flies\V [PP like\I [NP a\A banana\N NP] PP] VP] S]

Ambiguity leads to more rules fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana] fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana] BUT what about: Time flies like an arrow - time\N, time\V time\N flies\N like\V an\A arrow\N [time flies] like an [arrow] time\N flies\V like\I an\A arrow\N [time] flies like an [arrow] time\V flies\N like\I an\A arrow\N time [flies] like an [arrow] 3 rd PoS-tagging gives ambiguous parse

Chunking can predict prosodic breaks An Approach for Detecting Prosodic Phrase Boundaries in Spoken English An Approach for Detecting Prosodic Phrase Boundaries in Spoken English by Claire Brierley and Eric AtwellClaire BrierleyEric Atwell

Summary Shallow parsing is useful for: Entity recognition people, locations, organizations Studying linguistic patterns gave NP gave up NP in NP gave NP NP gave NP to NP Prosodic phrase breaks – pauses in speech Can ignore complex structure when not relevant Chink-chunk approach: quick-and-dirty chunking, content v function PoS Chink-chunk parsing is simpler than context-free grammar parsing!