Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Similar presentations


Presentation on theme: "Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College."— Presentation transcript:

1 Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College and the Graduate Center

2 Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

3 Introduction (1) Schools may have more after school sports. (2) I went to the dentist after school today. (3) My father like play basketball with me. Missing Hyphens :

4 Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

5 Baselines (1) Collins Dictionary (2) More than 1,000 times in Wikipedia (3) Probability of the hyphenated form as estimated from Wikipedia is greater than 0.66

6 Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

7 System Description Learner text: Schools may have more after school sports.

8 System Description Model: Logistic regression model Probability: Only predict a missing hyphen error when the probability of the prediction is >0.99

9 System Description SJM-trained: - San Jose Mercury News corpus - For training, hyphenated words are automatically split (i.e. well-known becomes well known) - The training data contains 1% of the positive examples and 3% of the negative examples

10 System Description Negative examples selected: Only contexts that occur more than 20 times are selected during training.

11 System Description Wiki-revision-trained: - Wikipedia articles

12 System Description

13

14 Combined: - Combine both data sources

15 Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

16 Evaluation Artificial Data: - Brown corpus - taking 24,243 sentences - 2,072 hyphenated words

17 Evaluation

18

19 Learner Text: - CLC-FCE - The corpus contains 1,244 exam scripts - Totally 173 instances of missing hyphen errors Evaluation 1

20 Evaluation

21

22 There are 131 true positives for the learner data reveal that 87 of these are cases of a single type, the word “make-up”.

23 Evaluation Evaluation 2 Learner Text: - A data set of 1,000 student GRE and TOEFL essays - Drawn from 295 prompts - Ranged in length from 1 to 50 sentences - Average of 378 words per essay

24 Evaluation Learner Text (Cont.): - Manually inspect a random sample of 100 instances where each system detected a missing hyphen - Two native-English speakers judge - Using the Chicago Manual of Style as a guide - High agreement

25 Evaluation

26 Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

27 Conclusions 1 ) Automatically detecting missing hyphen errors in learner text 2 ) The classifiers generally performed better than the baseline systems 3 ) Taking context into account when detecting the errors is important.


Download ppt "Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College."

Similar presentations


Ads by Google