Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao.

Similar presentations


Presentation on theme: "June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao."— Presentation transcript:

1 June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

2 June 5, 2009Automated Suggestions for Miscollocations 2 Overview Introduction Methodology Experimental Results Conclusion

3 June 5, 2009Automated Suggestions for Miscollocations 3 Introduction Our study focuses on how to find suggestions for miscollocations automatically. In this paper, only verb-noun collocations and miscollocations are considered.

4 June 5, 2009Automated Suggestions for Miscollocations 4 Introduction Howarth’s (1998) investigation of collocations found in L1 and L2 writers’ writing. Granger’s analysis on adverb-adjective collocation (1998). Liu’s (2002) lexical semantic analysis on the verb-noun miscollocations in English Taiwanese Learner Corpus.

5 June 5, 2009Automated Suggestions for Miscollocations 5 Introduction Projects using learner corpora in analyzing and categorizing learner errors: NICT JLE (Japanese Learner English) Corpus The Chinese Learner English Corpus (CLEC) English Taiwan Learner Corpus (or TLC) (Wible et al., 2003).

6 June 5, 2009Automated Suggestions for Miscollocations 6 An example She tries to improve her students’ problems. 1. solve 2. pose 3. tackle 4. grapple 5. alleviate 6. overcome 7. exacerbate 8. compound 9. beset 10. resolve reduce V collocates from Collocation Explorer

7 June 5, 2009Automated Suggestions for Miscollocations 7 Method Three features of collocate candidates are used: 1. Word association strength, 2. Semantic similarity 3. Intercollocability (Cowie and Howarth, 1996).

8 June 5, 2009Automated Suggestions for Miscollocations 8 Resource 84 VN miscollocations in TLC (Liu, 2002). Training data: 42 Testing data: 42 Two knowledge resources: BNC, WordNet Two human evaluators.

9 June 5, 2009Automated Suggestions for Miscollocations 9 Word Association Strength Mutual Information (Church et al. 1991) Two purposes: 1.All suggested correct collocations have to be identified as collocations. 2.The higher the word association strength the more likely it is to be a correct substitute for the wrong collocate.

10 June 5, 2009Automated Suggestions for Miscollocations 10 Semantic Similarity A semantic relation holds between a miscollocate and its correct counterpart (Gitsaki et al., 2000; Liu 2002) The synsets of WordNet to be nodes in a graph.  measure graph-theoretic distance *say a storytell a story Synonymous relation *say a story think of a story Hypernymy relation

11 June 5, 2009Automated Suggestions for Miscollocations 11 Semantic Similarity

12 June 5, 2009Automated Suggestions for Miscollocations 12 Intercollocability Cowie and Howarth (1996) propose that certain collocations form clusters on the basis of the shared meaning. convey pointget across the message express concern convey feeling communicate concern convey message get across point express concern communicate feeling

13 June 5, 2009Automated Suggestions for Miscollocations 13 Intercollocability Collocations in a cluster show a certain degree of intercollocability. express one’s concern condolences convey message get across point express concern communicate feeling express communicate concern feeling ?

14 June 5, 2009Automated Suggestions for Miscollocations 14 Intercollocability She tries to *improve her students’ problems. *improve problem 52 noun collocates improve problem 86 verb collocates resolve/ improve + situation + matter + way reduce/ improve + quality + efficiency + effectiveness resolve reduce Starting point. Does any of the 86 verbs co-occur with the 52 nouns? problem

15 June 5, 2009Automated Suggestions for Miscollocations 15 situation matter problem way quality efficiency effectiveness Intercollocability The cluster is partially created and the link between improve, resolve and reduce is developed by virtue of the overlapping noun collocates. situation matter problem way improve problem resolve reduce

16 June 5, 2009Automated Suggestions for Miscollocations 16 Intercollocability Quantify intercollocability The number of shared collocates

17 June 5, 2009Automated Suggestions for Miscollocations 17 shared collocate (resolve, improve) = 3 shared collocate (reduce, improve) = 3 The more shared collocates a verb has with the wrong verb, the more likely this verb is a good candidate situation matter problem way quality efficiency effectiveness situation matter problem way improve problem resolve reduce

18 June 5, 2009Automated Suggestions for Miscollocations 18 Integrate the 3 features The probabilistic model

19 June 5, 2009Automated Suggestions for Miscollocations 19 Training Probability distribution of word association strength MI value to 5 levels ( 6) P( MI level ) P(MI level | S c )

20 June 5, 2009Automated Suggestions for Miscollocations 20 Training Probability distribution of semantic similarity Similarity score to 5 levels (0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 ) P(SS level ) P(SS level | S c )

21 June 5, 2009Automated Suggestions for Miscollocations 21 Training Probability distribution of intercollocability Normalized shared collocates number to 5 levels (0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 ) P(SC level ) P(SC level | S c )

22 June 5, 2009Automated Suggestions for Miscollocations 22 Experiments Different combinations of the three features. ModelsFeature (s) considered M1MI (Mutual Information) M2SS (Semantic Similarity) M3SC (Shared Collocates) M4MI + SS M5MI + SC M6SS + SC M7MI+ SS + SC

23 June 5, 2009Automated Suggestions for Miscollocations 23 Results K- Best M1 M2 (SS) M3M4M5 M6 (SS+SC) M7 (MI+SS+ SC) 1 16.6740.4822.6248.8129.7655.9553.75 2 36.9053.4538.1060.7144.0563.167.86 3 47.6264.2950.0071.4359.5277.3878.57 4 52.3867.8663.1077.3872.6280.9582.14 5 64.2975.0072.6283.3378.5783.3385.71 6 65.4877.3875.0085.7183.3384.5288.10 7 67.8677.38 86.90 89.29 8 70.2480.9582.1486.9089.2988.1091.67 9 72.6283.3385.7188.1092.8690.4892.86 10 76.1986.9088.10 94.0590.4894.05

24 June 5, 2009Automated Suggestions for Miscollocations 24 Results (cont.) The K-Best suggestions for “get knowledge”. K-BestM2M6M7 1aimobtainacquire 2generateshare 3drawdevelopobtain 4 generatedevelop 5 acquiregain

25 June 5, 2009Automated Suggestions for Miscollocations 25 The K-Best suggestions for *reach purpose. K-BestM2M6M7 1achieve 2teachaccount 3explaintrade 4accounttreatfulfill 5tradeallocateserve

26 June 5, 2009Automated Suggestions for Miscollocations 26 The K-Best suggestions for *pay time. K-BestM2M6M7 1devotespend 2 investwaste 3expenddevote 4sparedateinvest 5 wastedate

27 June 5, 2009Automated Suggestions for Miscollocations 27 Conclusion A probabilistic model to integrate features. The early experimental result shows the potential of this research.

28 June 5, 2009Automated Suggestions for Miscollocations 28 Future works Applying such mechanisms to other types of miscollocations. Miscollocation detection will be one of the main points of this research. A larger amount of miscollocations should be included in order to verify our approach.

29 June 5, 2009Automated Suggestions for Miscollocations 29 Thank you! Q & A Anne Li-E Liu lel29@cam.ac.uklel29@cam.ac.uk David Wible wible45@yahoo.comwible45@yahoo.com Nai-Lung Tsao beaktsao@gmail.com


Download ppt "June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao."

Similar presentations


Ads by Google