TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of.

Slides:



Advertisements
Similar presentations
ADVERBIALS PRACTICE CLASS #7 (#8) /25. MORE.
Advertisements

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
A Bilingual Corpus of Inter-linked Events Tommaso Caselli♠, Nancy Ide ♣, Roberto Bartolini ♠ ♠ Istituto di Linguistica Computazionale – ILC-CNR Pisa ♣
A method for unsupervised broad-coverage lexical error detection and correction 4th Workshop on Innovative Uses of NLP for Building Educational Applications.
Albert Gatt LIN3021 Formal Semantics Lecture 5. In this lecture Modification: How adjectives modify nouns The problem of vagueness Different types of.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Semantic Frames: FrameNet. What is FrameNet? FrameNet is an ongoing project at the International Computer Science Institute located in Berkeley California.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Language, Mind, and Brain by Ewa Dabrowska Chapter 9: Syntactic constructions, pt. 2.
Presentation on Formalising Speech Acts (Course: Formal Logic)
Cbio course, spring 2005, Hebrew University (Alignment) Score Statistics.
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Pat-Tree-Based Adaptive keyphrase Extraction for Intelligent Chinese Information Retrieval 出處: institute of information science, academia sinica, taipei,
Validating and Improving Test-Case Effectiveness Author: Yuri Chernak Presenter: Lam, Man Tat.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Determine whether each curve below is the graph of a function of x. Select all answers that are graphs of functions of x:
Modern Information Retrieval Chapter 4 Query Languages.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Near Language Identification Using NooJ Božo Bekavac, Kristina Kocijan, Marko Tadić Faculty of Humanities and Social Sciences University of Zagreb, Croatia.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Chapter 10 Quality Control McGraw-Hill/Irwin
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Proposal Type One: Corpus-Based. The following is a list of items typically included in a Type One research proposal for MA in translation studies. The.
CONTEMPLATION, INQUIRY, AND CREATION: HOW TO TEACH MATH WHILE KEEPING ONE’S MOUTH SHUT Andrew-David Bjork Siena Heights University 13 th Biennial Colloquium.
MA in English Linguistics Experimental design and statistics Sean Wallis Survey of English Usage University College London
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
“An Approach to Identify Duplicated Web Pages” G. Lucca, M. Penta, A. Fasolino Compsac’02 pp Today presented by Kenny Kwok.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Summer Computing Workshop. Introduction  Boolean Expressions – In programming, a Boolean expression is an expression that is either true or false. In.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 12, Feb 13, 2007.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL.
Evaluating Classification Performance
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Genre and cultural purpose We recognize a genre when a text does something with language that we’re familiar with. Very often we are able state what kind.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Significance Tests: The Basics Textbook Section 9.1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Lecture IV. Basic Translation Theories Plan 1. The Transformational Approach 2. The Denotative Approach 3. The Communicational Approach.
Features & Decision regions
RELEVANCE THEORY Group Members Sana saif Huma Wazir Junaid Ahmed
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Psych 231: Research Methods in Psychology
The development of PDT 3.0 Introduction to the discussion
Presentation transcript:

TSD, Brno, Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar

TSD, Brno, Institute of Formal and Applied Linguistics, 2/14 Introduction  1. VALLEX, Valency Lexicon of Czech Verbs  2. Automatic Identification of Verbs of Communication  3. Frame Suggestion  4. Conclusion

TSD, Brno, Institute of Formal and Applied Linguistics, 3/14  1. Valency lexicon of Czech Verbs, VALLEX 1.x, and its Verb Classes  Verb Classes in VALLEX  Verbs of Communication

TSD, Brno, Institute of Formal and Applied Linguistics, 4/14 VALLEX Theoretical background: Functional Generative Description (FGD) Valency: “ability of lexical units to bind other lexical units” Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries) Corpus coverage (Czech National corpus): ● about 10% verbs occurrences with low corpus frequency, not covered (cca lemmas)

TSD, Brno, Institute of Formal and Applied Linguistics, 5/14 Verb Entry in VALLEX Verb Entry: set of valency frame(s) Valency frame: sequence of slots (functor, morphemic realization, type of complement) Attributes of valency frames: gloss, example, … class

TSD, Brno, Institute of Formal and Applied Linguistics, 6/14 Verb Classes in VALLEX  Classification: in progress built from below emphasis on syntactic criteria communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, … VALLEX 1.0VALLEX 1.5 Total Verb Entries Total Verb Lemmas Total Valency Frames Valency Frames with Class [37.5%] [44.6%] Total Classes Frame Types in Class on Average

TSD, Brno, Institute of Formal and Applied Linguistics, 7/14 Communication verbs in VALLEX ‘a speaker conveys information to a recipient’ ACT ADDR PAT/EFF {nom} {gen/dat/acc} {dc,...} simple information: {říci: say, informovat: inform, …} + THAT: že → verbs of announcement question: {ptát se: ask, …} + WHETHER, IF: zda, jestli → interrogative verbs commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET: aby,ať → imperative verbs VALLEX 1.0 VALLEX 1.5 verbs of announce ment: že interrogati ve verbs: zda imperative verbs: aby 74105

TSD, Brno, Institute of Formal and Applied Linguistics, 8/14  2. Automatic Identification of Verbs Communication  Evaluation VALLEX vs. FrameNet

TSD, Brno, Institute of Formal and Applied Linguistics, 9/14 Automatic Identification of Verbs Communication Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found. weak points: 1. eliminates nominal structures: ‘He said the truth about the killer.’ ‘He gave her many presents.’ (verb of exchange) 2. ignores examples where a complement was not expressed on the surface layer: ‘He said that …’ 3. homonymy of conjunctions: že (that) and aby (in order to) ‘He has done it in order to make money…’

TSD, Brno, Institute of Formal and Applied Linguistics, 10/14 Evaluation against VALLEX and FrameNet  golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2  ROC curves TP … true positives (communication verbs according to a golden standard and above the threshold) FP … false positives (non communication verbs and above the given threshold) TPR = TP / P (P the total number of communication verbs) … true positive rate TNR = TN / N (N the total number of verbs with no sense of communication) 40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet) 20% falsely marked

TSD, Brno, Institute of Formal and Applied Linguistics, 11/14  3. Frame Suggestion  Frame Edit Distance and Verb Entry Similarity  Experimental Results

TSD, Brno, Institute of Formal and Applied Linguistics, 12/14 Frame Edit Distance and Verb Entry Similarity insert, delete, replace  FED (number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame)  ES (entry similarity or expected saving) min FED(G,H) ES=1- FED(G,Ø)+FED(H,Ø) G … golden verb entries of this base lemma H … hypothesized entries Ø … blank verb entry ES 0% (suggesting nothing), ES 100% (golden frames)

TSD, Brno, Institute of Formal and Applied Linguistics, 13/14 Experimental Results with ES Suggested framesES [%] Specific frame for verbs of communication, default for others Baseline 1: ACT(1)26.69 Baseline 2: ACT(1) PAT(4)37.55 Baseline 3: ACT(1) ADDR(3,4) PAT(4) Baseline 4: Two typical frames: ACT(1) PAT(4) 39.11

TSD, Brno, Institute of Formal and Applied Linguistics, 14/14 Conclusion  Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives)  FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too