Presentation is loading. Please wait.

Presentation is loading. Please wait.

Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.

Similar presentations


Presentation on theme: "Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN."— Presentation transcript:

1 Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN

2 Motivation Kudo, Matsumoto 2000 (VLC) Presented a state-of-the-art Japanese dependency parser using SVMs ( 89.09 % for standard dataset) Could show the high generalization performance and feature selection abilities of SVMs Problems Not scalable 2 weeks training using 7,958 sentences Hard to train with larger data Slow in Parsing 2 ~ 3 sec./sentence Too slow to use it for actual NL applications

3 Goal Improve the scalability and the parsing efficiency without loosing accuracy ! How? Apply Cascaded Chunking model to dependency parsing and the selection of training examples Reduce the number of times SVMs are consulted in parsing Reduce the number of negative examples learned

4 Outline Japanese dependency analysis Two models Probabilistic model (previous) Cascaded Chunking model (new!) Features used for training and classification Experiments and results Conclusion and future work

5 Japanese Dependency Analysis (1/2) Analysis of relationship between phrasal units called bunsetsu (segments), base phrases in English Two Constraints Each segment modifies one of the right-side segments (Japanese is head final language) Dependencies do not cross each other

6 Japanese Dependency Analysis (2/2) Morphological analysis and Bunsetsu identification 私は / 彼女と / 京都に / 行きます I with her to Kyoto-loc go 私は彼女と京都に行きます I go to Kyoto with her. Raw text 私は / 彼女と / 京都に / 行きます Dependency Analysis

7 Probabilistic Model 私は 1 / 彼女と 2 / 京都に 3 / 行きます 4 I-top / with her / to Kyoto-loc / go Input 1.03 0.80.22 0.70.20.11 432 Dependency Matrix Modifiee Modifier 1. Build a Dependency Matrix ME, DT or SVMs (How probable one segment modifies another) 2. Search the optimal dependencies which maximize the sentence probabilities using CYK or Chart Output 私は 1 / 彼女と 2 / 京都に 3 / 行きます 4

8 Problems of Probabilistic model(1/2) Selection of training examples: All candidates of two segments which have Dependency relation → positive No dependency relation → negative This straightforward way of selection requires a total (where n is # of segments in a sentence) training examples per sentence Difficult to combine probabilistic model with SVMs which require polynomial computational cost

9 Problems of Probabilistic model(2/2) parsing time is necessary with CYK or Chart Even if beam-search is applied, parsing time is always necessary The classification cost of SVMs is much more expensive than other ML algorithms such as ME and DT

10 Cascaded Chunking Model English parsing [Abney 1991] Parses a sentence deterministically only deciding whether the current segment modifies the segment on its immediate right hand side Training examples are extracted using this algorithm itself

11 Example: Training Phase 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5 He her warm heart be moved (He was moved by her warm heart.) Annotated sentence SVMs Training Data Pairs of tag (D or O) and context(features) are stored as training data for SVMs Tag is decided by annotated corpus 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5 O O D D O ? ?? ? 彼は 1 彼女の 2 真心に 4 感動した。 5 O D D O ? ? ? 彼は 1 真心に 4 感動した。 5 ? ? O D O 彼は 1 感動した。 5 D O ? 感動した。 5 finish 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5

12 Example: Test Phase 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5 He her warm heart be moved (He was moved by her warm heart.) Test sentence SVMs Tag is decided by SVMs built in training phase 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5 O O D D O ? ?? ? 彼は 1 彼女の 2 真心に 4 感動した。 5 O D D O ? ? ? 彼は 1 真心に 4 感動した。 5 ? ? O D O 彼は 1 感動した。 5 D O ? 感動した。 5 finish 彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5

13 Advantages of Cascaded Chunking model Simple and Efficient Prob.: v.s. cascaded chunking: Lower than since most of segments modify segment on its immediate right- hand-side Training examples is much smaller Independent from ML algorithm Can be combined with any ML algorithms which work as a binary classifier Probabilities of dependency are not necessary

14 Features 彼の 1 友人は 2 この本を 3 持っている 4 女性を 5 探している 6 His friend-top this book-acc have lady-acc be looking for modifier modifiee Static Features modifier/modifiee Head/Functional Word: (surface,POS,POS-subcategory,inflection- type,inflection-form), brackets, quotations, punctuations, position Between segments: distance, case-particles, brackets, quotations, punctuations Dynamic Features [Kudo, Matsumoto 2000] A,B : Static features of Functional word C: Static features of Head word BAC Modify or not? His friend is looking for a lady who has this book.

15 Experimental Setting Kyoto University Corpus 2.0/3.0 Standard Data Set Training: 7,958 sentences / Test: 1,246 sentences Same data as [Uchimoto et al. 98, Kudo, Matsumoto 00] Large Data Set 2-fold Cross-Validation using all 38,383 sentences Kernel Function: 3 rd polynomial Evaluation method Dependency accuracy Sentence accuracy

16 Results Data SetStandardLarge ModelCascaded Chunking ProbabilisticCascaded Chunking Probabilistic Dependency Acc. (%) 89.29 89.09 90.04 N/A Sentence Acc. (%) 47.53 46.17 53.16 N/A # of training sentences 7,956 19,191 # of training examples 110,355 459,105 251,254 1,074,316 Training time (hours) 8 336 48 N/A Parsing time (sec./sent.) 0.5 2.1 0.7 N/A

17 Effect of Dynamic Features(1/2)

18 Effect of Dynamic Features (2/2) Deleted type of dynamic features Difference from the model with all dynamic features Dependency Acc. Sentence Acc. A -0.28 % -0.89 % B -0.10% -0.89 % C -0.28 % -0.56 % AB -0.33 % -1.21 % AC -0.55 % -0.97 % BC -0.54 % -1.61 % ABC -0.58 % -2.34 % 彼の 1 友人は 2 この本を 3 持っている 4 女性を 5 探している 6 His Friend-top this book-acc have lady-acc be looking for modifier modifiee BAC Modify or not?

19 Probabilistic v.s. Cascaded Chunking (1/2) 彼は 1 この本を 2 持っている 3 女性を 4 探している 5 He-top this book-acc have lady-acc be looking for modifier modifiee (He is looking for a lady who has this book.) Positive: この本を 2 → 持っている 3 Negative: この本を 2 → 探している 5 Probabilistic models commit a number of unnecessary examples unnecessary Probabilistic Model uses all candidates of dependency relation as training data

20 Probabilistic v.s. Cascaded Chunking (2/2) ProbabilisticCascaded Chunking StrategyMaximize sentence probability Shift-Reduce Deterministic MeritCan see all candidates of dependency Simple, efficient and scalable Accurate as Prob. model DemeritNot efficient, Commit to unnecessary training examples Cannot see the all (posterior) candidates of dependency

21 Conclusion A new Japanese dependency parser using a cascaded chunking model It outperforms the previous probabilistic model with respect to accuracy, efficiency and scalability Dynamic features significantly contribute to improve the performance

22 Future Work Coordinate structure analysis Coordinate structures frequently appear in Japanese long sentences and make analysis hard Use posterior context Hard to parse the following sentence only using cascaded chunking model 僕の 母の ダイヤの 指輪 My mother ’ s diamond ring

23 Comparison with Related Work ModelTraining Corpus (# of sentences) Acc. (%) Our ModelCascaded Chunking + SVMs Kyoto Univ. (19,191)90.46 Kyoto Univ. (7,956)89.29 Kudo et al. 00Prob. + SVMsKyoto Univ. (7,956)89.09 Uchimoto et al. 00 Prob. + MEKyoto Univ. (7,956)87.93 Kanayama et al. 00 Prob. + ME + HPSG EDR (192,778)88.55 Haruno et al. 98Prob. + DT + Boosting EDR (50,000)85.03 Fujio et al. 98Prob. + MLEDR (190,000)86.67

24 Support Vector Machines [Vapnik] Maximize the margin d Min. : s.t. : Soft Margin Kernel Function


Download ppt "Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN."

Similar presentations


Ads by Google