Download presentation
Presentation is loading. Please wait.
Published byFlorence Booker Modified over 8 years ago
1
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 1 Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar
2
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 2/14 Introduction 1. VALLEX, Valency Lexicon of Czech Verbs 2. Automatic Identification of Verbs of Communication 3. Frame Suggestion 4. Conclusion
3
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 3/14 1. Valency lexicon of Czech Verbs, VALLEX 1.x, and its Verb Classes Verb Classes in VALLEX Verbs of Communication
4
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 4/14 VALLEX Theoretical background: Functional Generative Description (FGD) Valency: “ability of lexical units to bind other lexical units” Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries) Corpus coverage (Czech National corpus): ● about 10% verbs occurrences with low corpus frequency, not covered (cca 28000 lemmas)
5
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 5/14 Verb Entry in VALLEX Verb Entry: set of valency frame(s) Valency frame: sequence of slots (functor, morphemic realization, type of complement) Attributes of valency frames: gloss, example, … class
6
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 6/14 Verb Classes in VALLEX Classification: in progress built from below emphasis on syntactic criteria communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, … VALLEX 1.0VALLEX 1.5 Total Verb Entries 1.4372.476 Total Verb Lemmas 1.0811.844 Total Valency Frames 4.2397.080 Valency Frames with Class 1.591 [37.5%] 3.156 [44.6%] Total Classes Frame Types in Class on Average 16 6.1 23 6.1
7
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 7/14 Communication verbs in VALLEX ‘a speaker conveys information to a recipient’ ACT ADDR PAT/EFF {nom} {gen/dat/acc} {dc,...} simple information: {říci: say, informovat: inform, …} + THAT: že → verbs of announcement question: {ptát se: ask, …} + WHETHER, IF: zda, jestli → interrogative verbs commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET: aby,ať → imperative verbs VALLEX 1.0 VALLEX 1.5 verbs of announce ment: že 191276 interrogati ve verbs: zda 87135 imperative verbs: aby 74105
8
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 8/14 2. Automatic Identification of Verbs Communication Evaluation VALLEX vs. FrameNet
9
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 9/14 Automatic Identification of Verbs Communication Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found. weak points: 1. eliminates nominal structures: ‘He said the truth about the killer.’ ‘He gave her many presents.’ (verb of exchange) 2. ignores examples where a complement was not expressed on the surface layer: ‘He said that …’ 3. homonymy of conjunctions: že (that) and aby (in order to) ‘He has done it in order to make money…’
10
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 10/14 Evaluation against VALLEX and FrameNet golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2 ROC curves TP … true positives (communication verbs according to a golden standard and above the threshold) FP … false positives (non communication verbs and above the given threshold) TPR = TP / P (P the total number of communication verbs) … true positive rate TNR = TN / N (N the total number of verbs with no sense of communication) 40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet) 20% falsely marked
11
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 11/14 3. Frame Suggestion Frame Edit Distance and Verb Entry Similarity Experimental Results
12
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 12/14 Frame Edit Distance and Verb Entry Similarity insert, delete, replace FED (number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame) ES (entry similarity or expected saving) min FED(G,H) ES=1- FED(G,Ø)+FED(H,Ø) G … golden verb entries of this base lemma H … hypothesized entries Ø … blank verb entry ES 0% (suggesting nothing), ES 100% (golden frames)
13
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 13/14 Experimental Results with ES Suggested framesES [%] Specific frame for verbs of communication, default for others 38.00 Baseline 1: ACT(1)26.69 Baseline 2: ACT(1) PAT(4)37.55 Baseline 3: ACT(1) ADDR(3,4) PAT(4) 35.70 Baseline 4: Two typical frames: ACT(1) PAT(4) 39.11
14
TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 14/14 Conclusion Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives) FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.