1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
LING 388: Language and Computers Sandiway Fong Lecture 2.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
1 A Sentence Boundary Detection System Student: Wendy Chen Faculty Advisor: Douglas Campbell.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Stemming, tagging and chunking Text analysis short of parsing.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Tagging, Partial Parsing Context.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Part of speech (POS) tagging
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Partial Parsing CSCI-GA.2590 – Lecture 5A Ralph Grishman NYU.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CPSC 503 Computational Linguistics
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Supertagging CMSC Natural Language Processing January 31, 2006.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
1 Introduction to NLTK part 2 Euromasters SS Trevor Cohn Euromasters summer school 2005 Introduction to NLTK Part II Trevor Cohn July 12, 2005.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Word classes and part of speech tagging Chapter 5.
NATURAL LANGUAGE PROCESSING
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing Vasile Rus
Natural Language Processing (NLP)
Advanced Higher Computing Based on Heriot-Watt University Scholar Materials Applications of AI – Vision and Languages 1.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Lecture 7: Introduction to Parsing (Syntax Analysis)
Chunk Parsing CS1573: AI Application Development, Spring 2003
CS246: Information Retrieval
CSCI 5832 Natural Language Processing
Natural Language Processing (NLP)
CSCE 590 Web Scraping – NLTK IE
Natural Language Processing (NLP)
Presentation transcript:

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004

2 Today Handout: basic English grammar Determine time for a one-time lab Begin chunking/shallow parsing

3 Slide modified from Steven Bird's Shallow (Chunk) Parsing Goal: divide a sentence into a sequence of chunks. Chunks are non-overlapping regions of a text [I] saw [a tall man] in [the park]. Chunks are non-recursive A chunk can not contain other chunks Chunks are non-exhaustive Not all words are included in chunks

4 Slide modified from Steven Bird's Chunk Parsing Examples Noun-phrase chunking: [I] saw [a tall man] in [the park]. Verb-phrase chunking: The man who [was in the park] [saw me]. Prosodic chunking: [I saw] [a tall man] [in the park]. Question answering: What [Spanish explorer] discovered [the Mississippi River]?

5 Slide modified from Steven Bird's Shallow Parsing: Motivation Locating information e.g., text retrieval –Index a document collection on its noun phrases Ignoring information Generalize in order to study higher-level patterns –e.g. phrases involving “gave” in Penn treebank:  gave NP; gave up NP in NP; gave NP up; gave NP help; gave NP to NP Sometimes a full parse has too much structure –Too nested –Chunks usually are not recursive

6 Slide modified from Steven Bird's Representation BIO (or IOB) Trees

7 Slide modified from Steven Bird's Comparison with Full Syntactic Parsing Parsing is usually an intermediate stage Builds structures that are used by later stages of processing Full parsing is a sufficient but not necessary intermediate stage for many NLP tasks Parsing often provides more information than we need Shallow parsing is an easier problem Less word-order flexibility within chunks than between chunks More locality: –Fewer long-range dependencies –Less context-dependence –Less ambiguity

8 Slide modified from Steven Bird's Chunks and Constituency Constituents: [[a tall man] [ in [the park]]]. Chunks: [a tall man] in [the park]. A constituent is part of some higher unit in the hierarchical syntactic parse Chunks are not constituents Constituents are recursive But, chunks are typically subsequences of constituents Chunks do not cross major constituent boundaries

9 Slide modified from Steven Bird's Chunk Parsing in NLTK Chunk parsers usually ignore lexical content Only need to look at part-of-speech tags Possible steps in chunk parsing Chunking, unchunking Chinking Merging, splitting Evaluation Compare to a Baseline Evaluate in terms of –Precision, Recall, F-Measure –Missed (False Negative), Incorrect (False Positive)

10 Slide modified from Steven Bird's Chunking Define a regular expression that matches the sequences of tags in a chunk A simple noun phrase chunk regexp: (Note that matches any tag starting with NN) ? * Chunk all matching subsequences: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] If matching subsequences overlap, first 1 gets priority

11 Unchunking Remove any chunk with a given pattern e.g., unChunkRule(‘ +’, ‘Unchunk NNDT’) Combine with Chunk Rule + Chunk all matching subsequences: Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN Apply chunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] Apply unchunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN

12 Slide modified from Steven Bird's Chinking A chink is a subsequence of the text that is not a chunk. Define a regular expression that matches the sequences of tags in a chink A simple chink regexp for finding NP chunks: ( | )+ First apply chunk rule to chunk everything Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN ChunkRule(' +', ‘Chunk everything’) [ the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN ] Apply Chink rule above: [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] Chink Chunk

13 Slide modified from Steven Bird's Merging Combine adjacent chunks into a single chunk Define a regular expression that matches the sequences of tags on both sides of the point to be merged Example: Merge a chunk ending in JJ with a chunk starting with NN MergeRule(‘ ’, ‘ ’, ‘Merge adjs and nouns’) [ the/DT little/JJ ] [ cat/NN ] sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN Splitting is the opposite of merging

14 Tokens and Labels in NLTK Tokens are at many levels of description Document Sentence Word Can have multiple representations at the same level A sentence can be marked up with TREE and WORDS simultaneously A word can have both TEXT and POS (or TAG)

15 Applying Chunking to Treebank Data

16

17

18 Usually resolve this kind of problem by checking out the API: But not all that helpful in this case. Tutorial has the answer.

19

20 Slide modified from Steven Bird's Cascaded Chunking

21 Next Time and Upcoming Finish Shallow Parsing Evaluating Shallow Parsing Results More examples of chunk/chink/unchunk rules Revisit topics from previous week Shallow Parsing Assignment Sent out Tues or Wed Due on Wed Sept 29 Next week: Read paper on end-of-sentence disambiguation Presley and Barbara lecturing on categorization