Development of a lexical resource annotated with semantic roles for Portuguese Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto Prof. Dr. Aline Villavicencio
Goals To make available a lexical resource for Portuguese with manually annotated semantic roles To compare the use of verbs in specialized and non-specialized language contexts
Semantic Roles John went to the park. Agent Destination . John opened the door with his key. Agent Patient Instrument The door opens with a key. Patient Instrument
Agent, Patient, Theme, etc. Related Work PropBank1,2 Numbered Roles Arg0 to Arg5 + Adjuncts VerbNet3,4 36 Descriptive Roles Agent, Patient, Theme, etc.
Workflow PALAVRAS Parser5 Dependencies Corpora: Cardiology and Newspaper SCF extraction tool6,7 Selection of verbs to be annotated Database Manually annotate the arguments Transform results from database into XML Make results available
SCF Extractor6,7 1 - Input: corpora analysed with the PALAVRAS parser5 ― Dependency trees
João viu o cachorro. (John saw the dog.) Dependency tree João viu o cachorro. (John saw the dog.) João [João] <hum> PROP M S @SUBJ> #1->2 viu [ver] <vH> <fmc> <mv> V PS 3S IND VFIN @FS-STA #2->0 o [o] <artd> DET M S @>N #3->4 cachorro [cachorro] <Azo> N M S @<ACC #4->2 $. #5->0 </s> Lemma Syntax Extras Dependency Grammar
Dependency tree Root (0) Ver (2) João (1) cachorro (4) o (3)
SCF Extractor6,7 2 - Processing of all sentences in the corpora 3 - Extraction of all dependencies of main verbs 4 – Analysis of the relevant dependencies (exclusion of adverbs) (4.1 – Classification of syntactic elements) 5 – Output: Database of SCFs (SQL file)
Interface
PHP-Interface – List of verbs Frequency Show frames
PHP-Interface – List of examples Syntactic classification Sentence Arguments
Semantic Role Labeling Dropbox with all available semantic roles
The Resource Newspaper Cardiology 191 verbs 77 verbs 5,301 instances 1,931 instances 11,089 arguments 4,192 arguments Availability: up-to-date XML files can be downloaded at the CAMELEON Project website8 under Resources > Semantic Role Labeling
Today’s Stage Analysis of semantic roles for cross-genre comparison Comparison with other resources, like VerbNet.Br4 and PropBank.Br2
Thank you! ziliotradutor@gmail.com
Bibliography 1 = Palmer, Martha, Daniel Gildea e Paul Kingsbury. 2005. The Proposition Bank: A Corpus Annotated with Semantic Roles. In: Computational Linguistics Journal, 31:1. 2 = Duran, Magali Sanches e Sandra Maria Aluísio. 2012. Propbank-Br: a Brazilian treebank annotated with semantic role labels. In: Proceedings of the LREC 2012, May 21-27, Istanbul, Turquia. 3 = Kipper-Schuler, Karin. 2005. VerbNet: a broad-coverage, comprehensive verb lexicon. University of Pennsylvania. 4 = Scarton, Carolina. 2013. VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. NILC/USP. 5 = Bick, E. 2000. The Parsing System"Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework, Volume 202. Aarhus University Press Aarhus. 6 = ZANETTE, Adriano. (2010) Aquisição de Subcategorization Frames para Verbos da Língua Portuguesa. Projeto de Diplomação. UFRGS. Orientadora: Aline Villavicencio. 7 = Zilio, Leonardo, Adriano Zanette and Carolina Scarton. 2014. Automatic extraction of subcategorization frames from portuguese corpora. In: Aluisio, S. M. and Tagnin. S. E. O. (eds.) New Languages Technologies and Linguistic Research: a Two-Way Road. Cambridge Scholars Publishing, pp. 78-96. 8 = http://cameleon.imag.fr/xwiki/bin/view/Main/