Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.

Similar presentations


Presentation on theme: "1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004."— Presentation transcript:

1 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004

2 2 Today Handout: basic English grammar Determine time for a one-time lab Begin chunking/shallow parsing

3 3 Slide modified from Steven Bird's Shallow (Chunk) Parsing Goal: divide a sentence into a sequence of chunks. Chunks are non-overlapping regions of a text [I] saw [a tall man] in [the park]. Chunks are non-recursive A chunk can not contain other chunks Chunks are non-exhaustive Not all words are included in chunks

4 4 Slide modified from Steven Bird's Chunk Parsing Examples Noun-phrase chunking: [I] saw [a tall man] in [the park]. Verb-phrase chunking: The man who [was in the park] [saw me]. Prosodic chunking: [I saw] [a tall man] [in the park]. Question answering: What [Spanish explorer] discovered [the Mississippi River]?

5 5 Slide modified from Steven Bird's Shallow Parsing: Motivation Locating information e.g., text retrieval –Index a document collection on its noun phrases Ignoring information Generalize in order to study higher-level patterns –e.g. phrases involving “gave” in Penn treebank:  gave NP; gave up NP in NP; gave NP up; gave NP help; gave NP to NP Sometimes a full parse has too much structure –Too nested –Chunks usually are not recursive

6 6 Slide modified from Steven Bird's Representation BIO (or IOB) Trees

7 7 Slide modified from Steven Bird's Comparison with Full Syntactic Parsing Parsing is usually an intermediate stage Builds structures that are used by later stages of processing Full parsing is a sufficient but not necessary intermediate stage for many NLP tasks Parsing often provides more information than we need Shallow parsing is an easier problem Less word-order flexibility within chunks than between chunks More locality: –Fewer long-range dependencies –Less context-dependence –Less ambiguity

8 8 Slide modified from Steven Bird's Chunks and Constituency Constituents: [[a tall man] [ in [the park]]]. Chunks: [a tall man] in [the park]. A constituent is part of some higher unit in the hierarchical syntactic parse Chunks are not constituents Constituents are recursive But, chunks are typically subsequences of constituents Chunks do not cross major constituent boundaries

9 9 Slide modified from Steven Bird's Chunk Parsing in NLTK Chunk parsers usually ignore lexical content Only need to look at part-of-speech tags Possible steps in chunk parsing Chunking, unchunking Chinking Merging, splitting Evaluation Compare to a Baseline Evaluate in terms of –Precision, Recall, F-Measure –Missed (False Negative), Incorrect (False Positive)

10 10 Slide modified from Steven Bird's Chunking Define a regular expression that matches the sequences of tags in a chunk A simple noun phrase chunk regexp: (Note that matches any tag starting with NN) ? * Chunk all matching subsequences: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] If matching subsequences overlap, first 1 gets priority

11 11 Unchunking Remove any chunk with a given pattern e.g., unChunkRule(‘ +’, ‘Unchunk NNDT’) Combine with Chunk Rule + Chunk all matching subsequences: Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN Apply chunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] Apply unchunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN

12 12 Slide modified from Steven Bird's Chinking A chink is a subsequence of the text that is not a chunk. Define a regular expression that matches the sequences of tags in a chink A simple chink regexp for finding NP chunks: ( | )+ First apply chunk rule to chunk everything Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN ChunkRule(' +', ‘Chunk everything’) [ the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN ] Apply Chink rule above: [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] Chink Chunk

13 13 Slide modified from Steven Bird's Merging Combine adjacent chunks into a single chunk Define a regular expression that matches the sequences of tags on both sides of the point to be merged Example: Merge a chunk ending in JJ with a chunk starting with NN MergeRule(‘ ’, ‘ ’, ‘Merge adjs and nouns’) [ the/DT little/JJ ] [ cat/NN ] sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN Splitting is the opposite of merging

14 14 Tokens and Labels in NLTK Tokens are at many levels of description Document Sentence Word Can have multiple representations at the same level A sentence can be marked up with TREE and WORDS simultaneously A word can have both TEXT and POS (or TAG)

15 15 Applying Chunking to Treebank Data

16 16

17 17

18 18 Usually resolve this kind of problem by checking out the API: http://nltk.sourceforge.net/api-1.4/index.html But not all that helpful in this case. Tutorial has the answer.

19 19

20 20 Slide modified from Steven Bird's Cascaded Chunking

21 21 Next Time and Upcoming Finish Shallow Parsing Evaluating Shallow Parsing Results More examples of chunk/chink/unchunk rules Revisit topics from previous week Shallow Parsing Assignment Sent out Tues or Wed Due on Wed Sept 29 Next week: Read paper on end-of-sentence disambiguation Presley and Barbara lecturing on categorization


Download ppt "1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004."

Similar presentations


Ads by Google