Download presentation
Presentation is loading. Please wait.
Published byHester Wilkerson Modified over 9 years ago
1
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert, Jason Baldridge and Katrin Erk The University of Texas at Austin
2
Introduction Grammar Induction –Based on gold standard POS Foundamental one: Constituent Context Model (CCM) –Based on raw texts Common cover links parser: CCL This paper: cascaded chunking.
3
Motivation of this paper CCL depends on low-level constituents very much: –Simply extracting non-hierarchical multiword constituents from CCL’s output and putting a right branching structure over them actually works better than CCL’s own higher level predictions. –Suggestion: improvements to low-level constituent prediction will ultimately lead to further gains in overall constituent parsing
4
Two Investigations Unsupervised partial parsing or unsupervised chunking Full parsing via cascaded chunking (explain later)
5
Data of Unsupervised Chunking Two kinds of data: –Constituent chunks Multiword Non-hierarchical (do not contain sub constituents) –Base NP: NPs that do not contain nested NPs
6
Method of Unsupervised Chunking BIO tagging, and STOP for sentence boundaries and phrasal punctuations. Model: –HMM –PRLG (probabilistic right-linear grammar)
7
Finite States State transitions Uniform initialization
8
Chunking Results
9
Full parsing via cascaded chunking Pseudoword: the term in the chunk with the highest corpus frequency
10
Full Parsing Results No length limit <=10 words
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.