Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas.

Similar presentations


Presentation on theme: "Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas."— Presentation transcript:

1 Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas at Austin

2 Introduction Grammar Induction –Based on gold standard POS Foundamental one: Constituent Context Model (CCM) –Based on raw texts Common cover links parser: CCL This paper: cascaded chunking.

3 Motivation of this paper CCL depends on low-level constituents very much: –Simply extracting non-hierarchical multiword constituents from CCL’s output and putting a right branching structure over them actually works better than CCL’s own higher level predictions. –Suggestion: improvements to low-level constituent prediction will ultimately lead to further gains in overall constituent parsing

4 Two Investigations Unsupervised partial parsing or unsupervised chunking Full parsing via cascaded chunking (explain later)

5 Data of Unsupervised Chunking Two kinds of data: –Constituent chunks Multiword Non-hierarchical (do not contain sub constituents) –Base NP: NPs that do not contain nested NPs

6 Method of Unsupervised Chunking BIO tagging, and STOP for sentence boundaries and phrasal punctuations. Model: –HMM –PRLG (probabilistic right-linear grammar)

7 Finite States State transitions Uniform initialization

8 Chunking Results

9 Full parsing via cascaded chunking Pseudoword: the term in the chunk with the highest corpus frequency

10 Full Parsing Results No length limit <=10 words


Download ppt "Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas."

Similar presentations


Ads by Google