Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University
Outline FBackground FProbability model for estimating dependency likelihood FExperiments and discussion FConclusion
Background l Preparing a dependency matrix l Finding an optimal set of dependencies for the entire sentence dependency 太郎は赤いバラを買いました。 Taro bought a red rose. 太郎は 赤い バラを 買いました。 Taro_wabara_wokai_mashita Tarorosebought 太郎 はバラ を買い ました。 赤 い Aka_i red bunsetsu FJapanese dependency structure analysis
Background (2) FApproaches to preparing a dependency matrix l Rule-based approach Several problems with handcrafted rules –Coverage and consistency –The rules have to be changed according to the target domain. l Corpus-based approach
Background (3) FCorpus-based approach l Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) l Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) FMaximum Entropy model l learns the weights of given features from a training corpus
Probability model FAssigning one of two tags l Whether or not there is a dependency between two bunsetsus l Probabilities of dependencies are estimated from the M. E. model. FOverall dependencies in a sentence l Product of probabilities of all dependencies Assumption: Dependencies are independent of each other. or :bunsetsu dependency
M. E. model
Feature sets FBasic features (expanded from Haruno’s list (Haruno, 1998)) l Attributes on a bunsetsu itself Character strings, parts of speech, and inflection types of bunsetsu l Attributes between bunsetsus Existence of punctuation, and the distance between bunsetsus FCombined features
abcd e Anterior bunsetsu Posterior bunsetsu Taro_wabara_wokai_mashita Tarorosebought 太郎 はバラ を買い ました。 dependency 赤 い Aka_i red Feature sets FBasic features: a, b, c, d, e FCombined features l Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) “Head” “Type”
Algorithm FDetect the dependencies in a sentence by analyzing it backwards (from right to left). l Characteristics of Japanese dependencies Dependencies are directed from left to right Dependencies do not cross A bunsetsu, except for the rightmost one, depends on only one bunsetsu In many cases, the left context is not necessary to determine a dependency FBeam search
Experiments FUsing the Kyoto University text corpus (Kurohashi and Nagao, 1997) l a tagged corpus of the Mainichi newspaper l Training: 7,958 sentences (Jan. 1st to 8th) l Testing: 1,246 sentences (Jan. 9th) FThe input sentences were morphologically analyzed and their bunsetsus were identified correctly.
Results of dependency analysis When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.
Relationship between the number of bunsetsus and accuracy The accuracy does not significantly degrade with increasing sentence length.
abcd e Anterior bunsetsu Posterior bunsetsu “Head” “Type” Features and accuracy FExperiments without the feature sets l Useful basic features Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) è preferential rules with respect to the features
Features and accuracy FExperiments without the feature sets l Combined features are useful (-18.31%). è Basic features are related to each other.
Lexical features and accuracy FExperiment with the lexical features of the head word l Better accuracy than that without them (-0.84%) l Many idiomatic expressions They had high dependency probabilities. –“ 応じて (oujite, according to)--- 決める (kimeru, decide)” –“ 形で (katachi_de, in the form of) --- 行われる (okonawareru, be held)” More training data è Expect to collect more of such expressions
Number of training data and accuracy Accuracy of 81.84% even with 250 sentences M. E. framework has suitable characteristics for overcoming the data sparseness problem.
Comparison with related works
Comparison with related works (2) FCombining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) l Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. FAccuracy achieved by our model was about 3% higher than that of Shirai’s model. l Using a much smaller set of training data.
Comparison with related works (3) FM. E. model (Ehara, 1998) l Set of similar kinds of features to ours Only the combination of two features l Using TV news articles for training and testing Average sentence length = 17.8 bunsetsus cf. 10 in the Kyoto University corpus FDifference in the combined features l We also use triplet, quadruplet, and quintuplet features (+5.86%). l Accuracy of our system was about 10% higher than Ehara’s system.
Comparison with related works (4) FMaximum Likelihood model (Fujio, 1998) FDecision tree models and a boosting method (Haruno, 1998) l Set of similar kinds of features to ours l Using the EDR corpus for training and testing EDR corpus is ten times as large as our corpus. l Accuracy was around 85%, which is slightly worse than ours.
Comparison with related works (5) FExperiments with Fujio’s and Haruno’s feature sets l The important factor in the statistical approaches is feature selection.
Future work FFeature selection l Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) FConsidering new features l How to deal with coordinate structures Taking into account a wide range of information
Conclusion FJapanese dependency structure analysis based on the M. E. model. l Dependency accuracy of our system 87.2% using the Kyoto University corpus l Experiments without feature sets Some basic and combined features strongly contribute to improve the accuracy. l Number of training data and accuracy Good accuracy even with a small set of training data M. E. framework has suitable characteristics for overcoming the data sparseness problem.