Download presentation
Presentation is loading. Please wait.
1
Some preliminary results
2
Marking interface
3
Highlight unusual phrases / grammatical error
Current corpora: Google ngram, Wikipedia 2007 Too many phrases not in corpus Reasons for not presenting in the corpus: (1) uncommon words (2) small corpus (3) wrong usage Highlighted: trigram not in Google ngram book (from HSMC student’s report)
4
It seems that it is weak in identifying simple subject verb agreement
To rule out false positive highlights, we attempted normalized linear approximation last week: It seems that it is weak in identifying simple subject verb agreement 𝑝′′′ 𝑤 1 , 𝑤 2 , 𝑤 3 = 𝜆 1 𝑝′ 𝑤 1 , 𝑤 2 , 𝑤 3 + 𝜆 2 𝑝′′ 𝑤 1 , 𝑤 2 , 𝑤 3 𝑠𝑐𝑜𝑟𝑒= 𝑝′′′ 𝑤 1 , 𝑤 2 , 𝑤 3 𝑝 𝑤 1 𝑝 𝑤 2 𝑝 𝑤 3
5
TEST 1 O: True Positive, X: False Positive
(c) Normalized Linear approximation (threshold = 0.3e-24, weights = 0.5, 0.5) 1 ,X , O 3, O 1: FN 2: FN 4, O 3: FN 5, O 5: FN 6, O 6: FN 7, X 8, O 9,O 10, X 7: FN 8: FN Precision: 0.7 Recall: 7/((7+8) = 0.47
6
TEST 3, very poor Left column: system Right column: by teacher
(c) Normalized Linear approximation (threshold = 0.3e-24, weights = 0.5, 0.5) 1, X 1, X 2, X X:19 3, O Precision: 1/3=0.3 Recall: 1/20 = 0.05
7
New scores (1) Normalized score without interpolation
normalized raw frequency = 𝑓𝑟𝑒𝑞( 𝑠𝑜 ℎ𝑒 𝑑𝑜) 𝑓𝑟𝑒𝑞( 𝑠𝑜)𝑓𝑟𝑒𝑞( ℎ𝑒)𝑓𝑟𝑒𝑞( 𝑑𝑜) (2) Sore by ratio of inflected form Eg. “So he do not explain what is YouTube” Possible inflected forms: “so he does” Calculate the ratio of “so he does”/”so he do”
8
Sore by ratio of inflected form
Step1: use parser to detect POS tag with 'VBZ','VBP', 'VB','VBD‘ Step2: screening using normalized raw frequency THRESHOLD IS LOWER THAN PURE NORMALIZED RAW FREQUENCY normalized raw frequency = 𝑓𝑟𝑒𝑞( 𝑠𝑜 ℎ𝑒 𝑑𝑜) 𝑓𝑟𝑒𝑞( 𝑠𝑜)𝑓𝑟𝑒𝑞( ℎ𝑒)𝑓𝑟𝑒𝑞( 𝑑𝑜) Step3: ratio Ratio = 𝑓𝑟𝑒𝑞(𝑠𝑜 ℎ𝑒 𝑑𝑜𝑒𝑠) 𝑓𝑟𝑒𝑞(𝑠𝑜 ℎ𝑒 𝑑𝑜) =21.78 Step4: highlight if higher than a threshold "so he do”:3487, “so he does”:75976 “so": , "he": , "do": } “normalized raw frequency": e-24,
9
Pink: due to normalized raw frequency: (threshold = 0.5e-24)
Purple: due to ratio of inflected form: (rawf_threshold, ratio_threshold = 1e-23, 5.5)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.