An SVMs Based Multi-lingual Dependency Parsing Yuchang CHENG, Masayuki ASAHARA and Yuji MATSUMOTO Nara Institute of Science and Technology
Approaches to Dependency Parsing Bottom-up deterministic (local discrimination) –Iterative, projective [Kudo & Matsumoto 02][Yamada & Matsumoto 03][Cheng, Asahara, Matsumoto 04] –Shift-reduce, projective [Nivre, Scholz 04] –Shift-reduce, pseudo-projective [Nivre, Nilsson 05] N-best + Large margin discrimination (global discrimination) –Projective [McDonald, Crammer, Pereira 05] –Non-projective[McDonald, Pereira, Ribarow, Hajic 05]
Comparison between Iterative and Shift-reduce methods Nivre algorithm (Shift-reduce) –depth first –O(n) Iterative –breadth first –O(n 2 ):worst case, empirically near linear + efficient - limited look-ahead Training and parsing are done in the same process ⇒ Number of training instances = Number of parsing steps
consulted context Limited right-side contextual info. saw girl with telescope. I saw a girl with a telescope. I a a A configuration in Nivre method A configuration in Y&M method
Preliminary comparison English dependency parsing (Penn Treebank 02-06:training, 23:test) –right context = 2 –right context = 4 IterativeNivre Dep. Acc Root Acc IterativeNivre Dep. Acc Root Acc Chinese case: Almost no difference/ a little better result in Nivre method
Common Disadvantage Local discrimination Single model throughout whole sentence –local (near leaves) and long-distance (near top) parsing should be different models Distinct model at the lowest level –dependency between adjacent words –implemented as a pre-processing
consulted context Shallow pre-processing + Nivre method I saw a girl with a telescope. saw girl with telescope. I a a Preprocessing of adjacent words Then, apply Nivre method Labels are decided by MaxEnt classifiers
Language: with preprocessingwithout preprocessing LAS:UAS:LAcc.LAS:UAS:LAcc. Arabic Chinese Czech Danish Dutch German Japanese Portugese Slovene Spanish Swedish Turkish AV: Bulgarian
Speed-up of Kernel SVM Fast methods for kernel-based text analysis [Kudo & Matsumoto 04] Training with 3 rd degree polynomial Kernel Mining of feature combinations in positive/negative support vectors Linearization with obtained feature combinations ( times speed up)