An Attempt at Unsupervised Learning of Hierarchical Dependency Parsing via the Dependency Model with Valence (DMV)
Motivation Dependency Parsing: Search Query Refinement Statistical Machine Translation Unsupervised Learning: Availability of Large Quantities of Data
DMV Pick a Direction (left or right) Generate the first child, or stop; Generate more children, until stop. Repeat in the other direction. Recurse… Porder Pstop Pattach
EM Inside-Outside Algorithm: Inside: Pi(i,X,j) = P(X derives i…j) Outside: Po(i,X,j) = P(S derives 0…iXj…l) Re-Estimation: Frequency of sub-tree (i,X,j)=Pi(i,X,j)*Po(i,X,j)
Evaluation Head-percolation of Penn Treebank parses; % edges correct (directed or undirected) in the best (P)CFG parse… Zero Knowledge: 14.4 (29.9) Adjacent Word Heuristic: 33.6 Klein & Manning: 43.2 (63.7) Oracle: 75.5 (77.5) - Pattach: 60.0 (63.3) - Pstop: 53.9 (57.7) - PstopA: 50.0 (54.8) - PstopN: 12.5 (30.8)
EM Didn’t work out… always made things worse, even when initialized with very good solutions. If started using Zero Knowledge, then after 1 iteration already gets 18.4 (38.4), then worsens. If started using an Ad-Hoc Harmonic for Pattach, then 21.5 (47.1) after 1 iteration, then worse, and similarly even for the Oracle solution… Summary: - DMV – useful, simple, extensible model; - EM – more thorough debugging needed.